Kurdish Sorani Natural Language Processing Text Classification Using Different Machine Learning Algorithms
DOI:
https://doi.org/10.25098/9.1.32Keywords:
News Classification, Natural Language Processing, Machine LearningAbstract
Researchers assessed machine learning models on Kurdish Sorani natural language progression classification utilizing dataset information obtained from different Kurdish news sites. The collected dataset contains five categories which are business, sport, health, social and technology. The research tested Extreme Gradient Boosting (XGBoost) as well as Random Forest (RF) alongside Support Vector Machine (SVM) algorithms and Light Gradient Boosting Machine (LGBM). The results show that (SVM) prove the most effective performance reaching the maximum accuracy and F1-scores across all classification categories. Although XGBoost displayed good performance in classifying technology and culture news Random Forest and LGBM struggled to realize accurate outcomes for the social class category. The study highlights the potential of modern machine learning models for attractive text classification tasks.
References
K. M. Awlla, H. Veisi & A. A. Abdullah. “ Sentiment analysis in low-resource contexts: BERT’s impact on Central Kurdish. Language Resources and Evaluation ". springer, available at: https://www.scilit.com/publications/3c7581659f5c9f6424340b1709f55970. 2025.
N. A. Atadoga, E. O. Sodiya, U. J. Umoga, & O. O. Amoo. “A comprehensive review of machine learning’s role in enhancing network security and threat detection”. World Journal of Advanced Research and Reviews, vol.21, no.2. pp877-886, 2024.
K. Taha, P. D. Yoo, C. Yeun, D. Homouz & A. Taha. “A comprehensive survey of text classification techniques and their research applications: Observational and experimental insights”. Computer Science Review, vol.54, 2024.
A. M Saeed, S. Badawi, S.A. Ahmed, & D. A Hassan. “Comparison of feature selection methods in Kurdish text classification”. Iran Journal of Computer Science. Vol.7, no.1, pp 55-64, 2023.
H. Allam, L. Makubvure, B. Gyamfi, K. N. Graham, & K. Akinwolere. “Text classification: How machine learning is revolutionizing text categorization”. Vol.16, no.2, 2025.
A. A. Abdullah, S. S. Muhamad, & H. Veisi. “Enhancing Kurdish Text-to-Speech with Native Corpus Training: A High-Quality WaveGlow Vocoder Approach”, 2024.
S. S. Badawi. Bridging the gap. ARO-The Scientific Journal of Koya University. Vol.12, no.1,pp.100-107, 2024.
H. A. Ahmad,T. A. Rashid. Central Kurdish Text-to-Speech Synthesis with Novel End-to-End Transformer Training. Algorithms, vol.17, no.7, pp.2-19, 2024.
K. M. H. Rawf, S. H. T. Karim, A. O., Abdulrahman& K. J Ghafoor. Dataset for the recognition of Kurdish sound dialects. Data in Brief, Vol.53, 2024.
A. M. Saeed. “AN AUTOMATED NEW APPROACH IN FAST TEXT CLASSIFICATION: a CASE STUDY FOR KURDISH TEXT”. Science Journal of University of Zakho, vol.12, no.3, pp. 329-335, 2024.
SH. M. Shareef, & A.M. Ali. “Deep learning-based digitization of Kurdish text handwritten in the e-government system”. Indonesian Journal of Electrical Engineering and Computer Science, vol.35, no.3, pp.1865-1875, 2024.
H.A. Ahmad, & T.A. Rashid. “Gigant-KTTS dataset: Towards building an extensive gigant dataset for Kurdish text-to-speech systems”. Data in Brief,vol 55, 2024.
A. A. Abdullah, S. H. Abdulla, D.M. Toufiq, H. S. Maghdid, T. A. Rashid, P. F. Farho, S.S Sabr, A. H. Taher, D. S. Hamad, H. Veisi & A.T. Asaad. “ NER- RoBERTa: Fine-Tuning RoBERTa for Named Entity Recognition (NER) within low-resource languages”. arXiv, 2024.
D. Mahmud, B. A. Abdalla & A. Faraj. “Twitter Sentiment analysis for Kurdish language. Qalaai Zanist Scientific Journal”. vol. 8, no.4, pp. 1132-1144, 2023.
S. Badawi. “KurdSum: A new benchmark dataset for the Kurdish text summarization”. Natural Language Processing Journal, vol. 5, 2023.
S. Badawi, A. M. Saeed, S. A. Ahmed, P. A. Abdalla & D.A. Hassan. “ Kurdish News Dataset Headlines (KNDH) through multiclass classification”. Data in Brief,vol. 48, 2023.
R. Filippis, A. Al Foysal. “Predicting Bipolar Disorder Treatment Outcomes with Machine Learning: A Comprehensive Evaluation of Random Forest, Gradient Boosting, and Ensemble Approaches”. . Open Access Library Journal, vol.12, no.2, 1–18, 2025.
D. L. Garcia, B. J. Kotzian, J. Yang, B. Mwangi, B. Cao, L. N. P. Lima, M. B. Bermudez,
M. V. Boeira, F. Kapczinski, I. C. Passos. “The impact of machine learning techniques in the study of
bipolar disorder: A systematic review”. Neuroscience & Biobehavioral Reviews, vol. 80, pp. 538-554, 2023.
G. Khyathi, K. P. Indumathi, H. A. Jumana, F. J. M. Lisa, S. Siluvari, G. Krishnaprakash , Support
Vector Machines A Literature Review on Their Application in Analyzing Mass Data for Public Health,
Cureus, vol.17, no.1,2025.
X. Zhang, Y. Wang, Z. Zhuang, Y. Liu, Ch Yuan, L. Su, J. Shaou & P. W. Chan. “Comparison of simulating visibility using XGBoost and IMPROVE method: a case study in East China”. Forntiers in Environmental Science, vol.12, 2025.
Kh. A. Ben Hamou, Zahi Jarir, Selwa Elfirdoussi. “Application of LightGBM Algorithm in Production Scheduling Optimization on Non-Identical Parallel Machines”. Engineering, Technology & Applied Science Research. Vol. 14,no. 6 , pp. 17973-17978, 2024.
N. Klingler. “Confusion Matrix in Machine Learning – A complete guide (2025)”. Available at : https://viso.ai/deep-learning/confusion-matrix/, 2024
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
SJCUS's open access articles are published under a Creative Commons Attribution CC-BY-NC-ND 4.0 license.
