Diagnosing the Origin of Machine-Generated Arabic Texts through Self-Stylistic Analysis: Testing the Capability of Large Language Models: تشخيص منشأ النصوص العربية الآلية عبر التحليل الذاتي للسمات الأسلوبية: اختبار قدرة نماذج اللغة الكبير ة

Alya Al-Rubai'i

doi:10.61856/j0d5tj70

Authors

Alya Al-Rubai'i Department of Translation - Faculty of Languages - University of Duhok - Kurdistan of Iraq Author

DOI:

https://doi.org/10.61856/j0d5tj70

Keywords:

stylistic features of AI writing, large language models, automated writing in Arabic

Abstract

With the increasing use of large language models in text generation, there is a need to explore their inherent capacity to diagnose the origin of texts generated by other models. This study examines the diagnostic capacity of four large language models (ChatGPT-4.5, Claude, Copilot, and Gemini) to determine the origin of Arabic texts (human or machine), relying on a modern methodology based on intrinsic stylistic analysis of Arabic texts generated by these models in the literary field. Through stylistic analysis, the study aims to uncover the stylistic features that these large language models themselves associate with automated writing. The research addresses two main questions: (1) To what extent are large language models capable of diagnosing automated writing in Arabic? And (2) What stylistic features do these models consider indicative of automated writing in Arabic? The importance of the study lies in its methodology, which offers a new insight into the capacity of AI to diagnose machine-generated Arabic texts through stylistic analysis of the language written by other models. It also identifies the Arabic stylistic features that AI uses to indicate AI-generated text without comparison with human texts. It also establishes a preliminary reference baseline for researchers investigating the stylistic features of machine-generated Arabic writing, which can be used later to formulate criteria for classifying the style of machine-generated Arabic texts. The study found discrepancies between the models. ChatGPT-4.5 and Gemini were the most accurate in diagnosing machine-generated texts, with Claude coming in second, and Copilot in last place. This indicates that models' diagnosis of machine-generated Arabic writing relies on non-agreed upon stylistic criteria. The study also categorizes the results of the stylistic analysis into formal, syntactic and organizational structure, lexical, rhetorical, discursive, and cognitive levels. These are the features that large language models use to diagnose AI-generated Arabic texts.

References

English References:

AlAfnan, M. A., & MohdZuki, S. F. (2023). Do Artificial Intelligence Chatbots Have a Writing Style? An Investigation into the Stylistic Features of ChatGPT-4. Journal of Artificial Intelligence and Technology, 3, 85–94. https://doi.org/10.37965/jait.2023.0267

Al-Khayyat, H. A., & Osman, M. M. (2025). Unmasking AI-generated texts using linguistic and stylistic features. International Journal of Advanced Computer Science and Applications, 16(3), 179–185. https://thesai.org/Downloads/Volume16No3/Paper_21-Unmasking_AI_Generated_Texts.pdf

Al-Smadi, M. (2025). IntegrityAI at GenAI Detection Task 2: Detecting Machine-Generated Academic Essays in English and Arabic Using ELECTRA and Stylometry. arXiv. https://doi.org/10.48550/arXiv.2501.05476

Al-Shaibani, M. S., & Ahmed, M. (2025). The Arabic AI Fingerprint: Stylometric Analysis and Detection of Large Language Models Text. arXiv. https://doi.org/10.48550/arXiv.2505.23276

Amirjalili, F., Neysani, M., & Nikbakht, A. (2024). Exploring the boundaries of authorship: a comparative analysis of AI-generated text and human academic writing in English literature. Frontiers in Education, 9, 1347421. https://doi.org/10.3389/feduc.2024.1347421

Chaka, C. (2024). Reviewing the performance of AI detection tools in differentiating between AI-generated and human-written texts: A literature and integrative hybrid review. Journal of Applied Learning and Teaching, 7(1), 45–62. https://doi.org/10.37074/jalt.2024.7.1.14

Devitska, A., & Horvat-Choblya, A. (2024). Linguistic domains: Comparison of texts written by human and artificial intelligence. Věda a perspektivy, 11(42), 358-365. https://doi.org/10.52058/2695-1592-2024-11(42)-358-365

Elkhattat, A. M., Elsaid, K., & Almeer, S. (2023). Evaluating the efficacy of AI content detection tools in differentiating between human and AI-generated text. International Journal for Educational Integrity, 19(17). https://doi.org/10.1007/s40979-023-00140-5

Fraser, K. C., Dawkins, H., & Kiritchenko, S. (2025). Detecting ai-generated text: Factors influencing detectability with current methods. Journal of Artificial Intelligence Research, 82, 2233-2278. https://doi.org/10.1613/jair.1.16665.

Georgiou, G. P. (2024). Differentiating between human-written and AI-generated texts using linguistic features automatically extracted from an online computational tool. arXiv. https://doi.org/10.48550/arXiv.2407.03646

Opara, C. (2024). StyloAI: Distinguishing AI-Generated Content with Stylometric Analysis. In Artificial Intelligence in Education. Posters and Late Breaking Results, Workshops and Tutorials, Industry and Innovation Tracks, Practitioners, Doctoral Consortium and Blue Sky (pp. 105–114). Cham: Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-64312-5_13

Rujeedawa, M. I. H., Pudaruth, S., & Malele, V. (2025). Unmasking AI-Generated Texts Using Linguistic and Stylistic Features. International Journal of Advanced Computer Science and Applications, 16(3). https://doi.org/10.14569/IJACSA.2025.0160321

Salman, S., & Purshotam, B. (2025). How to Detect AI-Generated Texts. International Journal of Engineering Technology and Management Science, 9(2). https://doi.org/10.46647/ijetms.2025.v09i02.109

Tang, R., Chuang, Y. N., & Hu, X. (2024). The science of detecting LLM-generated text. Communications of the ACM, 67(4), 50-59. https://doi.org/10.1145/3624725.

Arabic References:

Al-Dahshan, Jamal. (2020). Al-Lughah al-‘Arabiyyah wa al-Dhaka’ al-Istina‘i. Al-Majallah al-Tarbawiyyah, Jami‘at Sohag, 73(January), 155–178.

https://www.academia.edu/42864863/

Al-Saeed, Ahmed bin Badi, & Abdulrahman, Ahmed Saleh, & Abdullah, Muhammad Jamil. (2019). Al-‘Arabiyyah wa al-Dhaka’ al-Istina‘i. Riyadh: Markaz al-Malik Abdullah bin Abdulaziz al-Duwali li-Khidmat al-Lughah al-‘Arabiyyah.

https://www.academia.edu/42299293/

Uqailan, Abdulaziz Aayidh. (2024). Al-Ta‘bir al-Lughawi fi al-Dhaka’ al-Istina‘i: Al-Lughah al-‘Arabiyyah Unmudhajan. Majallat al-Jami‘ah al-Qasîmiyyah lil-Lughah al-‘Arabiyyah wa Adabiha, 3(2), 163–202.

https://doi.org/10.52747/aqujall.3.2.349

Markaz Jil lil-Bahth al-‘Ilmi. (2025, March). Al-Lughah al-‘Arabiyyah bayn Tahaddiyat al-‘Asr al-Raqmi wa Mutatalabatih (Vol. 2) [Special Issue]. Silsilat Kitab A‘mal al-Mu’tamarat, 12(42).

http://www.jilrc.com

Al-Maisawi, Khalid bin Hilal. (2021). Al-Dhaka’ al-Istina‘i wa Hawsabat al-Lughah al-‘Arabiyyah: al-Waqi‘ wa al-Afaq. Majallat Madarat fi al-Lughah wa al-Adab, 5, 95–112.

https://asjp.cerist.dz/en/article/162702

large language models, automated writing in Arabic, stylistic features of AI writing

تشخيص منشأ النصوص العربية الآلية عبر التحليل الذاتي للسمات الأسلوبية: اختبار قدرة نماذج اللغة الكبير ة

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite

Similar Articles

Submit Article

ADDITIONAL MENU

Latest publications

Information