IR @ Goa University

Modi document transcription to Devanagari

Show simple item record

dc.contributor.author Kundaikar, T.
dc.contributor.author Fadte, S.S.
dc.contributor.author Karmali, R.
dc.contributor.author Wagh, R.S.
dc.contributor.author Pawar, J.D.
dc.date.accessioned 2024-06-21T11:06:36Z
dc.date.available 2024-06-21T11:06:36Z
dc.date.issued 2024
dc.identifier.citation Ingénierie des Systèmes d’Information. 29(3); 2024; 909-915. en_US
dc.identifier.uri https://doi.org/10.18280/isi.290311
dc.identifier.uri http://irgu.unigoa.ac.in/drs/handle/unigoa/7317
dc.description.abstract Modi [moːɖiː] being ancient script that is not on the list of recognized official scripts for Indian languages; relatively little research has been done to identify handwritten characters in Modi compared to other Indian scripts. Character recognition in Modi script can be difficult because of the cursive, continuous, unconstrained, and numerous strikingly similar shapes of the characters. Other difficulties in the Modi character identification process are segmentation, noise and degradation, the presence of various skews, variations in illumination, uneven alignment, slanting lines, overlapping lines, and contacting lines. Word segmentation or recognition is ineffective for Modi script documents because they do not have any word or sentence ending symbols like other scripts. Another problem is the unavailability of a dataset covering most of the syllables required to automate transcription of Modi documents. The previous work reported on automatic Modi character recognition is on Modi characters dataset, i.e. vowels, consonants and numerals. The dataset used for recognition of characters is handwritten characters. This work did not include consonants with vowel diacritic and conjunct consonants. In 2020 the Word Transcription of Modi script to Devanagari was reported, which considered only 57 character classes in Modi. However, 57 classes are too few to capture the script's characters. We require a dataset that includes vowels, consonants, each consonant with the vowel diacritics and conjunct consonants to cover a wide variety of syllables in Modi. This demands looking at different Modi document recognition approaches and making them available in widely known scripts such as the Devanagari script. This paper presents a model to recognize the Modi text from an input image and make its transcription available in the Devanagari script. In this work, we have also created a dataset that includes Modi vowels, consonants, numerals, consonants with vowel diacritic and conjunct consonants. The dataset created consists of text in Modi and its transcription in Devanagari. Our proposed model (ModiDev_LSTM_Model) for Modi documents transcription to the Devanagari using LSTM Neural Networks showed an encouraging character accuracy of 94.67 percent. Detailed analysis of substitution errors made by the ModiDev_LSTM_Model, showed that there are seven types of error, namely 'Anusuvar' (Bindu), 'Eekar', 'Ookar', 'Ardhacandra', 'Matra', 'Aa' and 'other'. Among these, the highest percentage of substitution error was shown by 'Anusuvar', and the lowest was the 'Aa' error type. en_US
dc.publisher International Information and Engineering Technology Association en_US
dc.subject Computer Science and Technology en_US
dc.title Modi document transcription to Devanagari en_US
dc.type Journal article en_US
dc.identifier.impf cs


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search IR


Advanced Search

Browse

My Account