Development of a Binary Classification Model Based on Small Data Using Machine Learning Methods
( Pp. 129-140)

More about authors
Mikhaiylova Svetlana S. Dr. Sci. (Econ.), Associate Professor; Professor, Department of Data Analysis and Machine Learning, Faculty of Information Technology and Big Data Analysis
Financial University under the Government of the Russian Federation
Moscow, Russian Federation Grineva Natalia V. Cand. Sci. (Econ.), Associate Professor; associate professor, Department of Data Analysis and Machine Learning; Financial University under the Government of the Russian Federation; Moscow, Russian Federation
Abstract:
Today, solutions to the problem of binary classification using machine learning find applications in a huge number of spheres of life, such as medicine, energy, marketing, agriculture, financial analytics, etc. This is a great opportunity for companies to gain new sources of profit and improve existing processes. Therefore, new solution methods are being actively developed, existing ones are being improved, and research is being conducted on the possibility of using machine learning to solve classification problems in various fields. The study of the effectiveness of using various machine learning methods, taking into account the existing problems of small data in solving the problem of binary classification, is very relevant due to the significant preponderance of developments towards Big Data. For small data, possible problems that affect the effectiveness of the trained model have been identified, and various options for solving these problems have been proposed. To assess the impact of small data problems on the quality of the trained model, a comparative analysis of the quality metrics of models trained on different variations of data processing was carried out. It is concluded that correct work with small data requires timely elimination of such data defects as class imbalance, outliers, etc. In the course of the study, the most significant quality metrics were selected to obtain a model for analyzing medical parameters. A comparative analysis of diabetes detection models based on preprocessed small data has been carried out. For the task under consideration, the stacking model was chosen as the best option for medical use. The results of the analysis showed that machine learning is able to show high efficiency in solving real problems of binary classification.
How to Cite:
Mikhaylova S. S., Grineva N. V. Development of a Binary Classification Model Based on Small Data Using Machine Learning Methods // ECONOMIC PROBLEMS AND LEGAL PRACTICE. 2024. Vol. 20. № 1. P. 129-140. (in Russ.) DOI: 10.33693/2541-8025-2024-20-1-129-140. EDN: WFJKOK
Reference list:
Fahad B. Mostafa, Easin Hasan Machine Learning Approaches for Binary Classification to Discover Liver Diseases using Clinical Data : diss. Texas, 2021. —23 p.
Bashayer Fouad Marghalani, Muhammad Arif Automatic Classification of Brain Tumor and Alzheimer’s Disease in MRI // Procedia Computer Science. —2019. —№163. —P. 78–84.
Enrique Peláez, Ricardo Serrano, Geancarlo Murillo, Washington Cárdenas A Comparison of Deep Learning Models for Detecting COVID-19 in Chest X-ray Images // IFAC-PapersOnLine. —2021. —№54. —P. 358–363.
Lamir Shkurti, Faton Kabashi, Vehebi Sofiu, Arsim Susuri Performance Comparison of Machine Learning Algorithms for Albanian News articles // IFAC-PapersOnLine. —2022. —№55. —P. 292–295.
I.-M. Sarivan, Johannes N. Greiner, D. Díez Álvarez, F. Euteneuer, M. Reichenbach, O. Madsen, S. Bøgh Enabling Real-Time Quality Inspection in Smart Manufacturing Through Wearable Smart Devices and Deep Learning // Procedia Manufacturing. —2020. —№51. —P. 373–380.
Qingqing Zhang, Jiyang Zhang, Jianxiao Zou, Shicai Fan A Novel Fault Diagnosis Method based on Stacked LSTM // IFAC-PapersOnLine. —2020. —№53. —P. 790–795.
Grineva N.V., Mikhailova S.S. Application of machine learning for modeling borrower default // Innovations and investments. 2023. No. 4. pp. 254–262. EDN: MWZQEK.
Grineva N.V., Mikhailova S.S., Kontsevaya N.V., Econometric modeling of the company's intellectual capital in the context of digitalization// In the collection: Management of large-scale system development. 2023. EDN: EKPRPM.
Krinichansky K., Grineva N. Dynamic approach to the analysis of financial structure: overcoming the bank-based vs market-based dichotomy// In the collection: 2023 16th International Conference Management of large-scale system development (MLSD). 2023. EDN: RSHSND, DOI: 10.1109/MLSD58227.2023.10303933.
Semyonova P.A., Grineva N.V., Mikhailova S.S. Preliminary data analysis and construction of features in the problem of forecasting supply volumes // Problems of economics and legal practice. 2023. T. 19. No. 3. P. 141–152. EDN: CALJPF.
Strzelecka, A. Application of logistic regression models to assess household financial decisions regarding debt / A. Strzelecka, A. Kurdyś-Kujawska, D. Zawadzka // Procedia Computer Science —2022. —№176.
Application of Support Vector Machine for Prediction of Medication Adherence in Heart Failure Patients / S. Youn-Jung, K. Hong-Gee, K. Eung-Hee, C. Sangsup // Healthc Inform Res. —16(4). —Korea : The Korean Society of Medical Informatics, 2010. —P. 253–259.
Analysis of Image Classification using SVM / G. Sai Surya Teja, G. Yogeshwara Sai Varun, G. Bhanu Rama Ravi Teja [и др.] // 12th International Conference on Computing Communication and Networking Technologies (ICCCNT). —Kharagpur, India : IEEE, 2021. —P. 1–6.
Pengcheng Xu, Xiaobo Ji, Minjie Li & Wencong Lu Small data machine learning in materials science // npj Computational Materials. —2023. —№9.
Hui Wang, Ivo Duentsch, Gongde Guo & Sadiq Ali Khan Special issue on small data analytics // International Journal of Machine Learning and Cybernetics. —2023. —№14.
Keywords:
machine learning, small data, classification tasks, medical data, sampling, ensemble, stacking algorithm..


Related Articles

Multiscale Modeling for Information Control and Processing Pages: 11-20 DOI: 10.33693/2313-223X-2022-9-2-11-20 Issue №21224
Finding the Optimal Machine Learning Model for Flood Prediction on the Amur River
disaster management floods forecasting Amur River machine learning
Show more
Artificial intelligence and machine learning Pages: 19-31 DOI: 10.33693/2313-223X-2022-9-3-19-31 Issue №21873
Identification Algorithm Faces and Criminal Actions
Kaggle machine learning deep convolutional neural network Kaggle landmarks
Show more
Mathematical and Software of Computеrs, Complexes and Computer Networks Pages: 26-35 DOI: 10.33693/2313-223X-2023-10-2-26-35 Issue №23034
Analysis of the Algorithms of the Constituent Parts of the Compiler and its Optimization
compiler program code optimization algorithm analysis
Show more
Artificial intelligence and machine learning Pages: 35-44 DOI: 10.33693/2313-223X-2022-9-2-35-44 Issue №21224
Elements of artificial intelligence in solving problems of text analysis
sentiment analysis artificial neural networks machine learning recurrent neural networks long short-term memory
Show more
System Analysis, Information Management and Processing, Statistics Pages: 78-84 DOI: 10.33693/2313-223X-2024-11-1-78-84 Issue №95355
Algebraic Models for Data and Knowledge Representation in Modern Database Management Systems
SQL algebraic models database management systems machine learning artificial intelligence
Show more
Mathematical and Software of Computеrs, Complexes and Computer Networks Pages: 83-91 DOI: 10.33693/2313-223X-2023-10-3-83-91 Issue №23683
Determination of Parameters of Hidden Threats of Early Detection in Information Systems for Machine Learning Tasks
Anylogic machine learning corporate information systems (CIS) simulation modeling data analysis
Show more
5.2.2. MATHEMATICAL, STATISTICAL AND INSTRUMENTAL METHODS OF ECONOMICS Pages: 75-79 Issue №21250
Modern Directions of Research in the Field of Recommender Systems
recommender system collaborative filtering content-based filtering cold start machine learning
Show more
4. MATHEMATICAL AND INSTRUMENTAL METHODS OF ECONOMICS 08.00.13 Pages: 65-72 Issue №19146
FORECASTING FINANCIAL MARKETS USING CONVENTIONAL NEURAL NETWORK
financial market forecasting machine learning convolutional neural network mathematical model algorithm
Show more
4. MATHEMATICAL AND INSTRUMENTAL METHODS OF ECONOMICS 08.00.13 Pages: 132-138 Issue №17852
Strategy for finding an effective machine learning method based on the example of credit scoring
credit scoring machine learning feature selection random forest ensemble of models
Show more
MATHEMATICAL, STATISTICAL AND INSTRUMENTAL METHODS OF ECONOMICS Pages: 167-178 Issue №24067
Café’s Performance Modeling with Spatial Data
Python. spatial data economic indicators machine learning Python.
Show more