Comparative Analysis of Morphological Approaches for Low-Resource Indo-Aryan Languages: Case Studies in Angika, Maithili, and Hindi
DOI:
https://doi.org/10.70454/IJMRE.2025.05037Keywords:
Morphological Analysis, Low-Resource Indo-Aryan Languages, Finite-State Trans- ducer, Rule-Based NLP, Hybrid Morphological ModelsAbstract
Morphological analysis is important and challenging sub-task in Natural Language Processing (NLP), particularly for morphologically rich Indo-Aryan languages. Yet, some regional languages as Angika and Maithili are still under-resourced due to the absence of annotated corpora and compu- tational tools. This work is a comparative study of morphological analysis in the context of Angika, Maithili and Hindi, covering low-and little-resource scenarios. The work compares rule-based, finite-state transducer and hybrid methods and concentrates on the treatment of inflectional and derivational morphology. Linguistically motivated rules and small lexical re- sources are used for low resource languages, while Hindi is used as a reference language. Rule-based and hybrid models perform more robustly and have better interpretable results in low-resource settings than the purely data-driven models. The study demonstrates the need for LingA (linguistic analyzers) to seamlessly combine linguistic knowl- edge with computational techniques, in order to develop efficient morpho-analysis tools for Indo-Aryan lan- guages that are still under-represented typologically.
References
[1] Ankita Agarwal, Shashi Pal Singh, Ajai Kumar, Hemant Darbari, et al. Morphological analyser for hindi-a rule based implementation. International Journal of Advanced Computer Research, 4(1):19, 2014.
[2] Kenneth R. Beesley and Lauri Karttunen. Finite State Morphology. CSLI Publications, 2003.
[3] Laurent Besacier, Etienne Barnard, Alexey Karpov, and Tanja Schultz. Automatic speech recognition for under-resourced languages. Speech Communication, 56:85–100, 2014.
[4] Miriam Butt. The Structure of Complex Predicates in Urdu. CSLI Publications, Stanford, 1995.
[5] Amit Kumar Chandrana and Neha Garg. Number and gender agreement: A comparative study of angika and maithili. Anukriti: An International Peer Reviewed Refereed Research Journal, 11(6):47–52, 2021.
[6] Suniti Kumar Chatterji. The Origin and Development of the Bengali Language. George Allen & Unwin, London, 1926.
[7] Suniti Kumar Chatterji. Indo-Aryan and Hindi. Motilal Banarsidass, Delhi, 1960.
[8] John Goldsmith. Unsupervised learning of the morphology of a natural language. Computa- tional Linguistics, 27(2):153–198, 2001.
[9] George Abraham Grierson. Linguistic Survey of India, Vol. V: Indo-Aryan Languages. Gov- ernment of India, Calcutta, 1903.
[10] Nizar Habash. Introduction to Arabic Natural Language Processing. Morgan & Claypool, 2010.
[11] Lauri Karttunen. Constructing lexical transducers. Proceedings of the ACL Workshop on Computational Morphology, pages 1–10, 1997.
[12] Kimmo Koskenniemi. Two-Level Morphology: A General Computational Model for Word-Form Recognition and Production. PhD thesis, University of Helsinki, 1983.
[13] Taku Kudo and Yukio Matsumoto. Applying conditional random fields to japanese morpho- logical analysis. In ACL Workshop on Morphological and Phonological Processing, 2004.
[14] Ishan Kumar, Renu Dhir, Gurpreet S Lehal, and Sanjeev Kumar Sharma. Design of dy- namic morphological analyser for hindi nouns using rule based approach. Recent Advances in Computer Science and Communications (Formerly: Recent Patents on Computer Science), 13(6):1152–1157, 2020.
[15] John D. Lafferty, Andrew McCallum, and Fernando C. N. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the Eigh- teenth International Conference on Machine Learning (ICML), pages 282–289. Morgan Kauf- mann, 2001.
[16] Colin P. Masica. The Indo-Aryan Languages. Cambridge University Press, Cambridge, 1991.
[17] Siddhesh Pawar and Pushpak Bhattacharyya. Neural morphology analysis – a survey. Technical report, CFILT, IIT Bombay, 2022. survey; available as CFILT technical report.
[18] L. R. Rabiner. A tutorial on hidden markov models and selected applications in speech recog- nition. Proceedings of the IEEE, 77(2):257–286, 1989.
[19] Raza Rahi, Sumant Pushp, Arif Khan, and Smriti Kumar Sinha. A finite state transducer based morphological analyzer of maithili language. arXiv preprint arXiv:2003.00234, 2020.
[20] Mayuri Rastogi and Pooja Khanna. Development of morphological analyzer for hindi. Inter- national Journal of Computer Applications, 95(17):1–5, 2014.
[21] Teemu Ruokolainen and Mikko Kurimo. Neural network morphological analyzers for highly in- flecting languages. In Proceedings of the Workshop on Computational Morphology and Phonol- ogy, 2016.
[22] Helmut Schmid. Efficient parsing of highly ambiguous context-free grammars with bit vectors.
Proceedings of COLING, 2004.
[23] Linlin Wang, Zhu Cao, Yu Xia, and Gerard de Melo. Morphological segmentation with win- dow lstm neural networks. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16), 2016.
[24] Ramawatar Yadav. A Reference Grammar of Maithili. Mouton de Gruyter, Berlin, 1996.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Alok Kumar (Author)

This work is licensed under a Creative Commons Attribution 4.0 International License.
This is an Open Access article distributed under the term's of the Creative Common Attribution 4.0 International License permitting all use, distribution, and reproduction in any medium, provided the work is properly cited.
