Comparing Different Machine Learning Algorithms for Predicting Coronavirus (Covid-19) Disease

Authors

  • Zhyan M. Omer Department of Statistics and Informatics, University of Sulaimani, Sulaimaniya, Iraq
  • Nzar A. Ali Department of Statistics and Informatics, University of Sulaimani, Sulaimaniya, Iraq , Department of Computer Science, Cihan University Sulaymaniya, Sulaimaniya, Iraq
  • Rezhna S. Mohammed Permanent of Family Medicine at Azadi Teaching Hospital-Kirkuk, Kirkuk, Iraq

DOI:

https://doi.org/10.25098/6.1.28

Keywords:

COVID-19, Mortality, SMOTE, Machine Learning, Classification

Abstract

COVID-19 is a viral and pandemic disease faced the whole world between the end of year 2019 and the beginning of year 2022. firstly, appeared in China and then spread out all over the world and became a global threat to human health patients that had COVID-19 caused serious symptoms thus several of them die due to a part of their organ failure especially liver. This paper used algorithms of the machine learning to construct the COVID-19 severe ness apprehension model. Four machine learning classification techniques were evaluated: Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF) and Extreme Gradient Boosting (GXB). In aiming to treat the imbalance classification, Synthetic Minority Oversampling Technique (SMOTE) was utilized. Two set of investigation have been built with original dataset and with the SMOTE sampling technique. Based on several metrics for evaluation, Random Forest and Support Vector Machine Classifier has shown the highest performance for both datasets without and with SMOTE while the minimum result achieved by Logistic regression also for both datasets. Furthermore, the achievement performance of the four machine learning models experienced with SMOTE is strongly preferable than performance of classifiers competent without SMOTE. Furthermore, Top 25, 20 and 15 features importance was conducted using ExtraTree-classifiers, the variation between the accuracy for different features selection were very small.

References

N.Zhu, D.Zhang, W.Wang, X.Li, B.Yang, , J. Songet al. (2020). “A novel coronavirus from patients with pneumonia in China”, 2019. N. Engl. J. Med. Vol. 382, No. 8, 727-733.

A. Sharma, S.Tiwari, M. K.Deb, & J. L. Marty, (2020). “Severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2): a global pandemic and treatment strategies”. International journal of antimicrobial agents, Vol.56, No. 2, PP.1-13.

K. C. Y. Wong, Y. XIANG and H.C. So, (2020), “Uncovering clinical risk factors and prediction of severe COVID-19: a machine learning approach based on UK biobank data”, JMIR Public Health Surveill, Vol. 7, No. 9, PP.

A. F. Aljouie, A. Almazroa and Y. Bokhari et al., (2021), “Early Prediction of COVID-19 Ventilation Requirement and Mortality from Routinely Collected Baseline Chest Radiographs, Laboratory, and Clinical Data with Machine Learning”, Journal of Multidisciplinary Healthcare, Vol. 14, PP. 2017-2033.

M. A. Quiroz-Jua´rez, A.T.-Go´mez, I. H.-Ulloa, et al., (2021), “Identification of high-risk COVID-19 patients using machine learning”, PLOS ONE, Vol. 16, No. 9, PP. 1-21.

H. Yao, N. Zhang, R. Zhang et al., (2020), “Severity detection for the coronavirus disease 2019 (COVID-19) patients using a machine learning model based on the blood and urine tests” Frontiers in Cell and Developmental Biology, Vol. 8, PP. 1-10.

A. S. Albahri, R. A. Hamid, J. K. Alwan et al., (2020), “Role of biological data mining and machine learning techniques in detecting and diagnosing the novel Coronavirus (COVID-19): a systematic review” Journal of Medical Systems, Vol. 44, No. 122, PP. 2-11.

R. X. S. D. W. Hosmer and S. Lemeshow, (2013), “Applied Logistic Regression”, John Wiley & Sons, Toronto, Canada, Third Edition, 528 Pages.

V. Vapnik, (2013), “The nature of statistical learning theory”, Berlin: Springer-Verlag, Springer Science & Business Media, ISBN: 978-1-4757-3264-1, PP. XX-314.

J-L. Chen, G-S. Li, S-J. Wu, (2013), “Assessing the potential of support vector machine for estimating daily solar radiation using sunshine duration”, Energy Convers Manage, Vol. 75, PP. 311-8.

J-L. Chen, G-S. Li, (2014), “Evaluation of support vector machine for estimation of solar radiation from measured meteorological variables”, Theor Appl Climatol, Vol. 15, PP.627-38.

V.H.Quej, J.Almorox, J.A.Arnaldo, Saito L. ANFIS, (2017), “SVM and ANN soft-computing techniques to estimate daily global solar radiation in a warm sub-humid environment”, J Atmos Sol-Terrestrial Phys, vol. 155, pp. 62–70.

Y. M. C. Zhang, (2012), “Ensemble Machine Learning”, Springer, New York, NY, USA, ISBN: 978-1-4419-9326-7, PP. VIII-332.

T. Chen, C. Guestrin (2016), “XGBoost: a scalable tree boosting system”. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, PP. 785-94.

I. Ullah Khan, N. Aslam, M. Aljabri, E. S. Alsulmi (2021) “Machine Learning-Based Model to Predict the Disease Severity and Outcome in COVID-19 Patients”, Hindawi, Scientific Programming, Vol. 2021, PP.1-10.

Published

2022-05-30

How to Cite

Omer , Z. M. ., Ali, N. A. . ., & Mohammed , R. S. . (2022). Comparing Different Machine Learning Algorithms for Predicting Coronavirus (Covid-19) Disease. The Scientific Journal of Cihan University– Sulaimaniya, 6(1), 52-67. https://doi.org/10.25098/6.1.28

Issue

Section

Articles Vol6 Issue1