making Assamese writing easier

To date, there has been very limited research conducted on the computational aspects of the Assamese language in comparison to most other Indian languages. Due to the absence of computational linguistic research of Assamese language, this has hindered the development of quality software for Assamese language. We have produced our research independently with self-funding. Our mission is to continue producing similar work in this field and inspire other to do the same. We believe that if we do not do it for ourselves no one else will. If you would like to contribute some research work, collaborate with us and bring such research to the international community of computational linguists please contact us. If you are interested it is desirable to have some computer science knowledge or linguistic aspects of the Assamese language.

An Improved Stemming Approach Using HMM for a Highly Inflectional Language

​C​omputational Linguistics and Intelligent Text Processing Lecture Notes in Computer Science Volume 7816, 2013, pp 164-173 

Stemming is a common method for morphological normalization of natural language texts. Modern information retrieval systems rely on such normalization techniques for automatic document processing tasks. High quality stemming is difficult in highly inflectional Indic languages. Little research has been performed on designing algorithms for stemming of texts in Indic languages. In this study, we focus on the problem of stemming texts in Assamese, a low resource Indic language spoken in the North-Eastern part of India by approximately 30 million people. Stemming is hard in Assamese due to the common appearance of single letter suffixes as morphological inflections. More than 50% of the inflections in Assamese appear as single letter suffixes. Such single letter morphological inflections cause ambiguity when predicting underlying root word. Therefore, we propose a new method that combines a rule based algorithm for predicting multiple letter suffixes and an HMM based algorithm for predicting the single letter suffixes. The combined approach can predict morphologically inflected words with 92% accuracy

LuitPad: A fully Unicode compatible Assamese writing software -​​​​​

24th International Conference on Computational Linguistics Proceedings of the Second Workshop on Advances in Text Input Methods (WTIM 2)

LuitPad is a stand-alone, fully Unicode compliant software designed for rapid typing of Assamese words and characters. There are two main typing options; one which is based on approximate sound of words and the other based on the sound of characters, both of which are efficient and user-friendly, even for a first-time user. In addition, LuitPad comes with an online spell-checker; on “right-clock” over a misspelt word, presents the user with a listof relevant appropriate corrections for replacements. Assamese is an Indic language, spoken throughout North-Eastern parts of India by approximately 30 million people. There is a severe lack of user-friendly software available for typing Assamese text. This is perhaps the underlying reason for the miniscule presence of Assamese based information storage and retrieval systems, both off-line and on-line. With LuitPad, the user can retrieve Assamese characters and words using an English alphabet based keyboard in an effective and intuitive way. LuitPad is compatible with Windows, Mac and Linux with a GUI. The software can store the contents in LuitPad file format (“.pad” extension) that can store images and text.In addition, .pad files can be easily exported to pdf and html files.

Reg No. – M.G.P – 59/2011-12/4024