Keywords:-

Keywords: Class Imbalance, SMOTE, ADASYN, Random Forest.

Article Content:-

Abstract

Imbalanced class data refers to an imbalance in the amount of training data between two different classes, where one class represents a large amount of data (majority class) while the other class represents a very small amount of data (minority class). Oversampling is a technique of balancing data by generating data in the minority class so that the amount is balanced with the data in the majority class with the aim of improving the classification results for the better. The oversampling method was chosen to avoid losing important information in the imbalanced dataset. Synthetic Minority Over-Sampling Technique (SMOTE) and Adaptive Synthetic Approach (ADASYN) are oversampling techniques that produce synthetic data and use the concept of adjacency in the algorithm. SMOTE and ADASYN can reduce the possibility of overfitting, which is a disadvantage of ordinary oversampling. These two methods will be combined with the Random Forest classification algorithm. This research was conducted to solve the problem of imbalance class data in the dataset of the Indonesian economic financial crisis. The data in this study uses 9 independent variables based on macroeconomic aspects and 1 dependent variable. The dependent variable used is categorized as binary, namely crisis and non-crisis conditions. The results of this study indicate that handling class imbalance data in the random forest (RF) classification algorithm results in better classification performance. ADASYN-RF produces the best performance with Accuracy, recall, precision, F1 Score, and ROC AUC scores of 98.26%, 66.67%, 72.22%, 65.57%, and 82.93%, respectively.

References:-

References

W. Bank, “Indonesia economic prospects: Boosting the recovery,” World Bank Publications, 2021. https://www.worldbank.org/en/country/indonesia/publication/indonesia-economic-prospects

G. L. Kaminsky, S. Lizondo, and C. M. Reinhart, “Leading indicators of currency crises,” IMF Staff Pap., vol. 45, no. 1, pp. 1–48, 2000.

L. Breiman, “Random Forest,” Mach. Learn., vol. 45, no. 1, pp. 5–32, 2001.

O. Sagi and L. Rokach, “Ensemble learning: A survey,” Wiley Interdiscip. Rev. Data Min. Knowl. Discov., vol. 8, no. 4, pp. 1–18, 2018, doi: 10.1002/widm.1249.

V. Y. Kulkarni and P. K. Sinha, “Random forest classifiers: A survey and future research directions,” Int. J. Adv. Comput., vol. 36, no. 1, pp. 1144–1158, 2013.

D. B. Vuković, S. Dekpo-Adza, and S. Matović, “AI integration in financial services: A systematic review of trends and regulatory challenges,” Humanit. Soc. Sci. Commun., vol. 12, no. 562, 2025.

H. He, Y. Bai, E. A. Garcia, and S. Li, “ADASYN: Adaptive synthetic sampling approach for imbalanced learning,” Proc. Int. Jt. Conf. Neural Networks, no. July 2008, pp. 1322–1328, 2008, doi: 10.1109/IJCNN.2008.4633969.

N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic minority over-sampling technique,” J. Artif. Intell. Res., vol. 16, no. February 2017, pp. 321–357, 2002, doi: 10.1613/jair.953.

J. Aizenman, J. Lee, and V. Sushko, “Exchange market pressure and its absorption: From the great moderation, to the global crisis (NBER Working Paper),” 2010.

L. Breiman, J. Friedman, R. Olshen, and C. Stone, Classification and Regression Trees. New York: hapman and Hall, Wadsworth, New York., 1984.

R. Piraei, M. Niazkar, S. H. Afzali, and A. Menapace, “Application of Machine Learning Models to Bridge Afflux Estimation,” Water (Switzerland), vol. 15, no. 12, pp. 1–19, 2023, doi: 10.3390/w15122187.

J. Bergstra and Y. Bengio, “Random search for hyper-parameter optimization,” J. Mach. Learn. Res., vol. 13, pp. 281–305, 2012.

S. Putatunda and K. Rama, “A comparative analysis of hyperopt as against other approaches for hyper-parameter optimization of XGBoost,” ACM Int. Conf. Proceeding Ser., pp. 6–10, 2018, doi: 10.1145/3297067.3297080.

T. Hasanin, T. M. Khoshgoftaar, J. L. Leevy, and R. A. Bauder, “Severely imbalanced Big Data challenges: investigating data sampling approaches,” J. Big Data, vol. 6, no. 1, 2019, doi: 10.1186/s40537-019-0274-4.

S. Ahmed, F. Rayhan, A. Mahbub, M. Rafsan Jani, S. Shatabda, and D. M. Farid, “LIUboost: Locality informed under-boosting for imbalanced data classification,” Adv. Intell. Syst. Comput., vol. 813, pp. 133–144, 2019, doi: 10.1007/978-981-13-1498-8_12.

N. Cahyana, S. Khomsah, and A. S. Aribowo, “Improving Imbalanced Dataset Classification Using Oversampling and Gradient Boosting,” Proceeding - 2019 5th Int. Conf. Sci. Inf. Technol. Embrac. Ind. 4.0 Towar. Innov. Cyber Phys. Syst. ICSITech 2019, pp. 217–222, 2019, doi: 10.1109/ICSITech46713.2019.8987499.

Z. Chen, “ADASYN - Random Forest Based Intrusion Detection Model,” no. April 2020, pp. 1–13, 1999.

J. Han, M. Kamber, and J. Pei, Data Mining Concepts and Techniques. 3rd edn. New York: Waltham: Morgan Kaufmann Publishers., 2012.

Downloads

Citation Tools

How to Cite
Oktari, F., Prahutama, A., ., T., & ., S. (2025). Handling Class Data Imbalance in Random Forest using the SMOTE and ADASYN Methods in Identify Economic Status. International Journal Of Mathematics And Computer Research, 13(11), 5921-5927. https://doi.org/10.47191/ijmcr/v13i11.12