Evaluation metrics and dimensional reduction for binary classification algorithms: a case study on bankruptcy prediction

María E. Pérez-Pons; Javier Parra-Dominguez; Guillermo Hernández; Enrique Herrera-Viedma; Juan M. Corchado; María E. Pérez-Pons; Javier Parra-Dominguez; Guillermo Hernández; Enrique Herrera-Viedma; Juan M. Corchado

doi:10.1017/S026988892100014X

2022 Volume 37

Article Contents

Next Previous

RESEARCH ARTICLE Open Access

Evaluation metrics and dimensional reduction for binary classification algorithms: a case study on bankruptcy prediction

¹BISITE Research Group, University of Salamanca. Edificio I+D+i, Calle Espejo 2, 37007, Salamanca, Spain
²Air Institute, IoT Digital Innovation Hub, Carbajosa de la Sagrada, 37188. Salamanca, Spain
³University of Granada, Colegio Máximo de Cartuja, Campus Universitario de Cartuja C.P. 18071 Granada, Spain
⁴Pusat Komputeran dan Informatik, Universiti Malaysia Kelantan, Karung Berkunci 36, Pengkaan Chepa, 16100 Kota Bharu, Kelantan, Malaysia
⁵Department of Electronics, Information and Communication, Faculty of Engineering, Osaka Institute of Technology, 535-8585 Osaka, Japan

More Information

Received: 07 June 2021
Revised: 30 November 2021
Accepted: 01 December 2021
Published online: 14 January 2022
The Knowledge Engineering Review 37, Article number: e1 (2022) | Cite this article

Abstract

Abstract: This paper presents a methodology that permits to automate binary classification using the minimum possible number of attributes. In this methodology, the success of the binary prediction does not lie in the accuracy of an algorithm but in the evaluation metrics, which give information about the goodness of fit; which is an important factor when the data batch is unbalanced. The proposed methodology assesses the possible biases in identifying one algorithm as the best performer when considering the goodness of fit of an algorithm through evaluation metrics. The dimension of data has been reduced through the cumulative explained variance. Then, the performance of six machine learning classification models has been compared through Matthew correlation coefficient (MCC), area under curve – receiver operating characteristic (ROC-AUC), and area under curve – precision-recall (AUC-PR). The results show graphically and numerically how the evaluation metrics interfere with the most optimal outcome of an algorithm. The algorithms with the best performance in terms of evaluation metrics have been random forest and gradient boosting. In the imbalanced datasets, MCC has provided better prediction results than ROC-AUC or AUC-PR. The proposed methodology is adapted to the case of bankruptcy prediction.
Rights and permissions
© The Author(s), 2022. Published by Cambridge University Press2022The Author(s)

References

Hafiz A Alaka , Lukumon O Oyedele , Hakeem A Owolabi , Vikas Kumar, Saheed O Ajayi , Olugbenga O Akinade , and Muhammad Bilal. Systematic review of bankruptcy prediction models: Towards a framework for tool selection. Expert Systems with Applications, 94: 164–184, 2018.

Google Scholar

Edward I Altman . Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. The journal of finance, 23 (4): 589–609, 1968.

Google Scholar

Flavio Barboza, Herbert Kimura, and Edward Altman. Machine learning models and bankruptcy prediction. Expert Systems with Applications, 83: 405–417, 2017.

Google Scholar

William H Beaver . Financial ratios as predictors of failure. Journal of accounting research, pages 71–111, 1966.

Google Scholar

William H Beaver , Maureen F McNichols , and Jung-Wu Rhie. Have financial statements become less informative? evidence from the ability of financial ratios to predict bankruptcy. Review of Accounting studies, 10 (1): 93–122, 2005.

Google Scholar

Jodi L Bellovary , Don E Giacomino , and Michael D Akers . A review of bankruptcy prediction studies: 1930 to present. Journal of Financial education, pages 1–42, 2007.

Google Scholar

Girish Chandrashekar and Ferat Sahin. A survey on feature selection methods. Computers & Electrical Engineering, 40 (1): 16–28, 2014.

Google Scholar

Davide Chicco and Giuseppe Jurman. The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics, 21 (1): 1–13, 2020.

Google Scholar

Jesse Davis and Mark Goadrich. The relationship between precision-recall and roc curves. In Proceedings of the 23rd international conference on Machine learning, pages 233–240, 2006.

Google Scholar

Abe De Jong, Rezaul Kabir, and Thuy Thu Nguyen. Capital structure around the world: The roles of firm-and country-specific determinants. Journal of Banking & Finance, 32 (9): 1954–1969, 2008.

Google Scholar

S Sarojini Devi and Y Radhika . A survey on machine learning and statistical techniques in bankruptcy prediction. International Journal of Machine Learning and Computing, 8 (2): 133–139, 2018.

Google Scholar

Emil Eirola, Andrey Gritsenko, Anton Akusok, Kaj-Mikael BjÖrk, Yoan Miche, DuŠan Sovilj, Rui Nian, Bo He, and Amaury Lendasse. Extreme learning machines for multiclass classification: refining predictions with gaussian mixture models. In International Work-Conference on Artificial Neural Networks, pages 153–164. Springer, 2015.

Google Scholar

Daryush Foroghi, Amirhassan Monadjemi, et al. Applying decision tree to predict bankruptcy. In 2011 IEEE International Conference on Computer Science and Automation Engineering, volume 4, pages 165–169. IEEE, 2011.

Google Scholar

Haibo He and Edwardo A Garcia. Learning from imbalanced data. IEEE Transactions on knowledge and data engineering, 21 (9): 1263–1284, 2009.

Google Scholar

Stephen A Hillegeist , Elizabeth K Keating , Donald P Cram, and Kyle G Lundstedt. Assessing the probability of bankruptcy. Review of accounting studies, 9 (1): 5–34, 2004.

Google Scholar

Tadaaki Hosaka. Bankruptcy prediction using imaged financial ratios and convolutional neural networks. Expert systems with applications, 117: 287–299, 2019.

Google Scholar

Chih-Wei Hsu, Chih-Chung Chang, Chih-Jen Lin, et al. A practical guide to support vector classification, 2003.

Google Scholar

Win-Bin Huang, Junting Liu, Haodong Bai, and Pengyi Zhang. Value assessment of companies by using an enterprise value assessment system based on their public transfer specification. Information Processing & Management, 57 (5): 102254, 2020.

Google Scholar

Sadegh Bafandeh Imandoust and Mohammad Bolandraftar. Application of k-nearest neighbor (knn) approach for predicting economic events: Theoretical background. International Journal of Engineering Research and Applications, 3 (5): 605–610, 2013.

Google Scholar

Utkarsh Mahadeo Khaire and R Dhanalakshmi . Stability of feature selection algorithm: A review. Journal of King Saud University-Computer and Information Sciences, 2019.

Google Scholar

Hyeongjun Kim, Hoon Cho, and Doojin Ryu. Corporate default predictions using machine learning: Literature review. Sustainability, 12 (16): 6325, 2020.

Google Scholar

Emrehan Kutlug Sahin, Cengizhan Ipbuker, and Taskin Kavzoglu. Investigation of automatic feature weighting methods (fisher, chi-square and relief-f) for landslide susceptibility mapping. Geocarto international, 32 (9): 956–977, 2017.

Google Scholar

Larry Li and Silvia Z Islam. Firm and industry specific determinants of capital structure: Evidence from the australian market. International Review of Economics & Finance, 59: 425–437, 2019.

Google Scholar

Piero Montebruno, Robert J Bennett , Harry Smith, and Carry Van Lieshout. Machine learning classification of entrepreneurs in british historical census data. Information Processing & Management, 57 (3): 102210, 2020.

Google Scholar

OECD. Country statistical profile: Spain 2020. OECD ilibrary, 2018. URL https://www.oecd-ilibrary.org/.

Google Scholar

James A Ohlson . Financial ratios and the probabilistic prediction of bankruptcy. Journal of accounting research, pages 109–131, 1980.

Google Scholar

David L Olson , Dursun Delen, and Yanyan Meng. Comparative analysis of data mining methods for bankruptcy prediction. Decision Support Systems, 52 (2): 464–473, 2012.

Google Scholar

J-P Onnela , Anirban Chakraborti, Kimmo Kaski, and Janos Kertesz. Dynamic asset trees and black monday. Physica A: Statistical Mechanics and its Applications, 324 (1-2): 247–252, 2003.

Google Scholar

Yi Qu, Pei Quan, Minglong Lei, and Yong Shi. Review of bankruptcy prediction using machine learning and deep learning techniques. Procedia Computer Science, 162: 895–899, 2019.

Google Scholar

Mandeep Kaur Saggi and Sushma Jain. A survey towards an integration of big data analytics to big insights for value-creation. Information Processing & Management, 54 (5): 758–790, 2018.

Google Scholar

Takaya Saito and Marc Rehmsmeier. The precision-recall plot is more informative than the roc plot when evaluating binary classifiers on imbalanced datasets. PloS one, 10 (3), 2015.

Google Scholar

M Sharma and Monali Mavani. Development of predictive model in education system: using nave bayes classifier. In Proceedings of the International Conference & Workshop on Emerging Trends in Technology, pages 185–186, 2011.

Google Scholar

Tyler Shumway. Forecasting bankruptcy more accurately: A simple hazard model. The journal of business, 74 (1): 101–124, 2001.

Google Scholar

Saúl Solorio -Fernández, J Ariel Carrasco -Ochoa, and José Fco Martnez -Trinidad. A review of unsupervised feature selection methods. Artificial Intelligence Review, 53 (2): 907–948, 2020.

Google Scholar

David Veganzones and Eric Séverin. An investigation of bankruptcy prediction in imbalanced datasets. Decision Support Systems, 112: 111–124, 2018.

Google Scholar

Robert Wade and Frank Veneroso. The asian crisis: the high debt model versus the wall street-treasury-imf complex. New left review, pages 3–24, 1998.

Google Scholar

Nanxi Wang et al. Bankruptcy prediction using machine learning. Journal of Mathematical Finance, 7 (04): 908, 2017.

Google Scholar

Guoqiu Wen, Xianxian Li, Yonghua Zhu, Linjun Chen, Qimin Luo, and Malong Tan. One-step spectral rotation clustering for imbalanced high-dimensional data. Information Processing & Management, 58 (1): 102388, 2021.

Google Scholar

Feng Yang and KZ Mao. Robust feature selection for microarray data based on multicriterion fusion. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 8 (4): 1080–1092, 2010.

Google Scholar

Wenhao Zhang et al. Machine learning approaches to predicting company bankruptcy. Journal of Financial Risk Management, 6 (04): 364, 2017.

Google Scholar

Maciej Zieba, Sebastian K Tomczak, and Jakub M Tomczak. Ensemble boosted trees with synthetic features generation in application to bankruptcy prediction. Expert systems with applications, 58: 93–101, 2016.

Google Scholar

About this article

Cite this article

María E. Pérez-Pons, Javier Parra-Dominguez, Guillermo Hernández, Enrique Herrera-Viedma, Juan M. Corchado. 2022. Evaluation metrics and dimensional reduction for binary classification algorithms: a case study on bankruptcy prediction. The Knowledge Engineering Review. 37: doi: 10.1017/S026988892100014X

María E. Pérez-Pons, Javier Parra-Dominguez, Guillermo Hernández, Enrique Herrera-Viedma, Juan M. Corchado. 2022. Evaluation metrics and dimensional reduction for binary classification algorithms: a case study on bankruptcy prediction. The Knowledge Engineering Review. 37: doi: 10.1017/S026988892100014X

Download PDF

Article Metrics

Article views(213) PDF downloads(284)

{{lists.name}}

Evaluation metrics and dimensional reduction for binary classification algorithms: a case study on bankruptcy prediction

Abstract

Rights and permissions

References

About this article

Cite this article

Article Metrics

Access History

Other Articles By Authors