BISITE Research Group, University of Salamanca. Edificio I+D+i, Calle Espejo 2, 37007, Salamanca, Spain"/> Air Institute, IoT Digital Innovation Hub, Carbajosa de la Sagrada, 37188. Salamanca, Spain"/> University of Granada, Colegio Máximo de Cartuja, Campus Universitario de Cartuja C.P. 18071 Granada, Spain"/> Pusat Komputeran dan Informatik, Universiti Malaysia Kelantan, Karung Berkunci 36, Pengkaan Chepa, 16100 Kota Bharu, Kelantan, Malaysia"/> Department of Electronics, Information and Communication, Faculty of Engineering, Osaka Institute of Technology, 535-8585 Osaka, Japan"/>
Search
2022 Volume 37
Article Contents
RESEARCH ARTICLE   Open Access    

Evaluation metrics and dimensional reduction for binary classification algorithms: a case study on bankruptcy prediction

More Information
  • Abstract: This paper presents a methodology that permits to automate binary classification using the minimum possible number of attributes. In this methodology, the success of the binary prediction does not lie in the accuracy of an algorithm but in the evaluation metrics, which give information about the goodness of fit; which is an important factor when the data batch is unbalanced. The proposed methodology assesses the possible biases in identifying one algorithm as the best performer when considering the goodness of fit of an algorithm through evaluation metrics. The dimension of data has been reduced through the cumulative explained variance. Then, the performance of six machine learning classification models has been compared through Matthew correlation coefficient (MCC), area under curve – receiver operating characteristic (ROC-AUC), and area under curve – precision-recall (AUC-PR). The results show graphically and numerically how the evaluation metrics interfere with the most optimal outcome of an algorithm. The algorithms with the best performance in terms of evaluation metrics have been random forest and gradient boosting. In the imbalanced datasets, MCC has provided better prediction results than ROC-AUC or AUC-PR. The proposed methodology is adapted to the case of bankruptcy prediction.
  • 加载中
  • Hafiz A Alaka , Lukumon O Oyedele , Hakeem A Owolabi , Vikas Kumar, Saheed O Ajayi , Olugbenga O Akinade , and Muhammad Bilal. Systematic review of bankruptcy prediction models: Towards a framework for tool selection. Expert Systems with Applications, 94: 164–184, 2018.

    Google Scholar

    Edward I Altman . Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. The journal of finance, 23 (4): 589–609, 1968.

    Google Scholar

    Flavio Barboza, Herbert Kimura, and Edward Altman. Machine learning models and bankruptcy prediction. Expert Systems with Applications, 83: 405–417, 2017.

    Google Scholar

    William H Beaver . Financial ratios as predictors of failure. Journal of accounting research, pages 71–111, 1966.

    Google Scholar

    William H Beaver , Maureen F McNichols , and Jung-Wu Rhie. Have financial statements become less informative? evidence from the ability of financial ratios to predict bankruptcy. Review of Accounting studies, 10 (1): 93–122, 2005.

    Google Scholar

    Jodi L Bellovary , Don E Giacomino , and Michael D Akers . A review of bankruptcy prediction studies: 1930 to present. Journal of Financial education, pages 1–42, 2007.

    Google Scholar

    Girish Chandrashekar and Ferat Sahin. A survey on feature selection methods. Computers & Electrical Engineering, 40 (1): 16–28, 2014.

    Google Scholar

    Davide Chicco and Giuseppe Jurman. The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics, 21 (1): 1–13, 2020.

    Google Scholar

    Jesse Davis and Mark Goadrich. The relationship between precision-recall and roc curves. In Proceedings of the 23rd international conference on Machine learning, pages 233–240, 2006.

    Google Scholar

    Abe De Jong, Rezaul Kabir, and Thuy Thu Nguyen. Capital structure around the world: The roles of firm-and country-specific determinants. Journal of Banking & Finance, 32 (9): 1954–1969, 2008.

    Google Scholar

    S Sarojini Devi and Y Radhika . A survey on machine learning and statistical techniques in bankruptcy prediction. International Journal of Machine Learning and Computing, 8 (2): 133–139, 2018.

    Google Scholar

    Emil Eirola, Andrey Gritsenko, Anton Akusok, Kaj-Mikael BjÖrk, Yoan Miche, DuŠan Sovilj, Rui Nian, Bo He, and Amaury Lendasse. Extreme learning machines for multiclass classification: refining predictions with gaussian mixture models. In International Work-Conference on Artificial Neural Networks, pages 153–164. Springer, 2015.

    Google Scholar

    Daryush Foroghi, Amirhassan Monadjemi, et al. Applying decision tree to predict bankruptcy. In 2011 IEEE International Conference on Computer Science and Automation Engineering, volume 4, pages 165–169. IEEE, 2011.

    Google Scholar

    Haibo He and Edwardo A Garcia. Learning from imbalanced data. IEEE Transactions on knowledge and data engineering, 21 (9): 1263–1284, 2009.

    Google Scholar

    Stephen A Hillegeist , Elizabeth K Keating , Donald P Cram, and Kyle G Lundstedt. Assessing the probability of bankruptcy. Review of accounting studies, 9 (1): 5–34, 2004.

    Google Scholar

    Tadaaki Hosaka. Bankruptcy prediction using imaged financial ratios and convolutional neural networks. Expert systems with applications, 117: 287–299, 2019.

    Google Scholar

    Chih-Wei Hsu, Chih-Chung Chang, Chih-Jen Lin, et al. A practical guide to support vector classification, 2003.

    Google Scholar

    Win-Bin Huang, Junting Liu, Haodong Bai, and Pengyi Zhang. Value assessment of companies by using an enterprise value assessment system based on their public transfer specification. Information Processing & Management, 57 (5): 102254, 2020.

    Google Scholar

    Sadegh Bafandeh Imandoust and Mohammad Bolandraftar. Application of k-nearest neighbor (knn) approach for predicting economic events: Theoretical background. International Journal of Engineering Research and Applications, 3 (5): 605–610, 2013.

    Google Scholar

    Utkarsh Mahadeo Khaire and R Dhanalakshmi . Stability of feature selection algorithm: A review. Journal of King Saud University-Computer and Information Sciences, 2019.

    Google Scholar

    Hyeongjun Kim, Hoon Cho, and Doojin Ryu. Corporate default predictions using machine learning: Literature review. Sustainability, 12 (16): 6325, 2020.

    Google Scholar

    Emrehan Kutlug Sahin, Cengizhan Ipbuker, and Taskin Kavzoglu. Investigation of automatic feature weighting methods (fisher, chi-square and relief-f) for landslide susceptibility mapping. Geocarto international, 32 (9): 956–977, 2017.

    Google Scholar

    Larry Li and Silvia Z Islam. Firm and industry specific determinants of capital structure: Evidence from the australian market. International Review of Economics & Finance, 59: 425–437, 2019.

    Google Scholar

    Piero Montebruno, Robert J Bennett , Harry Smith, and Carry Van Lieshout. Machine learning classification of entrepreneurs in british historical census data. Information Processing & Management, 57 (3): 102210, 2020.

    Google Scholar

    OECD. Country statistical profile: Spain 2020. OECD ilibrary, 2018. URL https://www.oecd-ilibrary.org/.

    Google Scholar

    James A Ohlson . Financial ratios and the probabilistic prediction of bankruptcy. Journal of accounting research, pages 109–131, 1980.

    Google Scholar

    David L Olson , Dursun Delen, and Yanyan Meng. Comparative analysis of data mining methods for bankruptcy prediction. Decision Support Systems, 52 (2): 464–473, 2012.

    Google Scholar

    J-P Onnela , Anirban Chakraborti, Kimmo Kaski, and Janos Kertesz. Dynamic asset trees and black monday. Physica A: Statistical Mechanics and its Applications, 324 (1-2): 247–252, 2003.

    Google Scholar

    Yi Qu, Pei Quan, Minglong Lei, and Yong Shi. Review of bankruptcy prediction using machine learning and deep learning techniques. Procedia Computer Science, 162: 895–899, 2019.

    Google Scholar

    Mandeep Kaur Saggi and Sushma Jain. A survey towards an integration of big data analytics to big insights for value-creation. Information Processing & Management, 54 (5): 758–790, 2018.

    Google Scholar

    Takaya Saito and Marc Rehmsmeier. The precision-recall plot is more informative than the roc plot when evaluating binary classifiers on imbalanced datasets. PloS one, 10 (3), 2015.

    Google Scholar

    M Sharma and Monali Mavani. Development of predictive model in education system: using nave bayes classifier. In Proceedings of the International Conference & Workshop on Emerging Trends in Technology, pages 185–186, 2011.

    Google Scholar

    Tyler Shumway. Forecasting bankruptcy more accurately: A simple hazard model. The journal of business, 74 (1): 101–124, 2001.

    Google Scholar

    Saúl Solorio -Fernández, J Ariel Carrasco -Ochoa, and José Fco Martnez -Trinidad. A review of unsupervised feature selection methods. Artificial Intelligence Review, 53 (2): 907–948, 2020.

    Google Scholar

    David Veganzones and Eric Séverin. An investigation of bankruptcy prediction in imbalanced datasets. Decision Support Systems, 112: 111–124, 2018.

    Google Scholar

    Robert Wade and Frank Veneroso. The asian crisis: the high debt model versus the wall street-treasury-imf complex. New left review, pages 3–24, 1998.

    Google Scholar

    Nanxi Wang et al. Bankruptcy prediction using machine learning. Journal of Mathematical Finance, 7 (04): 908, 2017.

    Google Scholar

    Guoqiu Wen, Xianxian Li, Yonghua Zhu, Linjun Chen, Qimin Luo, and Malong Tan. One-step spectral rotation clustering for imbalanced high-dimensional data. Information Processing & Management, 58 (1): 102388, 2021.

    Google Scholar

    Feng Yang and KZ Mao. Robust feature selection for microarray data based on multicriterion fusion. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 8 (4): 1080–1092, 2010.

    Google Scholar

    Wenhao Zhang et al. Machine learning approaches to predicting company bankruptcy. Journal of Financial Risk Management, 6 (04): 364, 2017.

    Google Scholar

    Maciej Zieba, Sebastian K Tomczak, and Jakub M Tomczak. Ensemble boosted trees with synthetic features generation in application to bankruptcy prediction. Expert systems with applications, 58: 93–101, 2016.

    Google Scholar

  • Cite this article

    María E. Pérez-Pons, Javier Parra-Dominguez, Guillermo Hernández, Enrique Herrera-Viedma, Juan M. Corchado. 2022. Evaluation metrics and dimensional reduction for binary classification algorithms: a case study on bankruptcy prediction. The Knowledge Engineering Review 37(1), doi: 10.1017/S026988892100014X
    María E. Pérez-Pons, Javier Parra-Dominguez, Guillermo Hernández, Enrique Herrera-Viedma, Juan M. Corchado. 2022. Evaluation metrics and dimensional reduction for binary classification algorithms: a case study on bankruptcy prediction. The Knowledge Engineering Review 37(1), doi: 10.1017/S026988892100014X

Article Metrics

Article views(62) PDF downloads(64)

RESEARCH ARTICLE   Open Access    

Evaluation metrics and dimensional reduction for binary classification algorithms: a case study on bankruptcy prediction

Abstract: Abstract: This paper presents a methodology that permits to automate binary classification using the minimum possible number of attributes. In this methodology, the success of the binary prediction does not lie in the accuracy of an algorithm but in the evaluation metrics, which give information about the goodness of fit; which is an important factor when the data batch is unbalanced. The proposed methodology assesses the possible biases in identifying one algorithm as the best performer when considering the goodness of fit of an algorithm through evaluation metrics. The dimension of data has been reduced through the cumulative explained variance. Then, the performance of six machine learning classification models has been compared through Matthew correlation coefficient (MCC), area under curve – receiver operating characteristic (ROC-AUC), and area under curve – precision-recall (AUC-PR). The results show graphically and numerically how the evaluation metrics interfere with the most optimal outcome of an algorithm. The algorithms with the best performance in terms of evaluation metrics have been random forest and gradient boosting. In the imbalanced datasets, MCC has provided better prediction results than ROC-AUC or AUC-PR. The proposed methodology is adapted to the case of bankruptcy prediction.

    • This research has been supported by the project “INTELFIN: Artificial Intelligence for investment and value creation in SMEs through competitive analysis and business environment,” Reference: RTC-2017-6536-7, funded by the Ministry of Science, Innovation and Universities (Challenges-Collaboration 2017), the State Agency for Research (AEI) and the European Regional Development Fund (ERDF).

    • None.

    • María E. Pérez-Pons: Data curation, Formal analysis, Investigation, Writing - original draft, Writing - review and editing. Javier Parra-Dominguez: Conceptualization, Investigation, Methodology, and Writing - review and editing. Guillermo Hernández: Data curation, Formal analysis, Methodology, Writing - review and editing, and Software Supervision. Enrique Herrera-Viedma: Writing - review and editing and Supervision. Juan M.Corchado: Conceptualization, writing - review and editing and Supervision.

    • Orbis database belongs to Bureau Van Dijk and contains real Business information from many companies orbis.bvdinfo.com

    • © The Author(s), 2022. Published by Cambridge University Press2022The Author(s)
References (41)
  • About this article
    Cite this article
    María E. Pérez-Pons, Javier Parra-Dominguez, Guillermo Hernández, Enrique Herrera-Viedma, Juan M. Corchado. 2022. Evaluation metrics and dimensional reduction for binary classification algorithms: a case study on bankruptcy prediction. The Knowledge Engineering Review 37(1), doi: 10.1017/S026988892100014X
    María E. Pérez-Pons, Javier Parra-Dominguez, Guillermo Hernández, Enrique Herrera-Viedma, Juan M. Corchado. 2022. Evaluation metrics and dimensional reduction for binary classification algorithms: a case study on bankruptcy prediction. The Knowledge Engineering Review 37(1), doi: 10.1017/S026988892100014X
  • Catalog

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return