Modi document transcription to Devanagari

Kundaikar, T.; Fadte, S.S.; Karmali, R.; Wagh, R.S.; Pawar, J.D.

IR Home
→
Business, Commerce, Economics & Computer Sciences
→
Goa Business School
→
View Item

dc.contributor.author	Kundaikar, T.
dc.contributor.author	Fadte, S.S.
dc.contributor.author	Karmali, R.
dc.contributor.author	Wagh, R.S.
dc.contributor.author	Pawar, J.D.
dc.date.accessioned	2024-06-21T11:06:36Z
dc.date.available	2024-06-21T11:06:36Z
dc.date.issued	2024
dc.identifier.citation	Ingénierie des Systèmes d’Information. 29(3); 2024; 909-915.	en_US
dc.identifier.uri	https://doi.org/10.18280/isi.290311
dc.identifier.uri	http://irgu.unigoa.ac.in/drs/handle/unigoa/7317
dc.description.abstract	Modi [moːɖiː] being ancient script that is not on the list of recognized official scripts for Indian languages; relatively little research has been done to identify handwritten characters in Modi compared to other Indian scripts. Character recognition in Modi script can be difficult because of the cursive, continuous, unconstrained, and numerous strikingly similar shapes of the characters. Other difficulties in the Modi character identification process are segmentation, noise and degradation, the presence of various skews, variations in illumination, uneven alignment, slanting lines, overlapping lines, and contacting lines. Word segmentation or recognition is ineffective for Modi script documents because they do not have any word or sentence ending symbols like other scripts. Another problem is the unavailability of a dataset covering most of the syllables required to automate transcription of Modi documents. The previous work reported on automatic Modi character recognition is on Modi characters dataset, i.e. vowels, consonants and numerals. The dataset used for recognition of characters is handwritten characters. This work did not include consonants with vowel diacritic and conjunct consonants. In 2020 the Word Transcription of Modi script to Devanagari was reported, which considered only 57 character classes in Modi. However, 57 classes are too few to capture the script's characters. We require a dataset that includes vowels, consonants, each consonant with the vowel diacritics and conjunct consonants to cover a wide variety of syllables in Modi. This demands looking at different Modi document recognition approaches and making them available in widely known scripts such as the Devanagari script. This paper presents a model to recognize the Modi text from an input image and make its transcription available in the Devanagari script. In this work, we have also created a dataset that includes Modi vowels, consonants, numerals, consonants with vowel diacritic and conjunct consonants. The dataset created consists of text in Modi and its transcription in Devanagari. Our proposed model (ModiDev_LSTM_Model) for Modi documents transcription to the Devanagari using LSTM Neural Networks showed an encouraging character accuracy of 94.67 percent. Detailed analysis of substitution errors made by the ModiDev_LSTM_Model, showed that there are seven types of error, namely 'Anusuvar' (Bindu), 'Eekar', 'Ookar', 'Ardhacandra', 'Matra', 'Aa' and 'other'. Among these, the highest percentage of substitution error was shown by 'Anusuvar', and the lowest was the 'Aa' error type.	en_US
dc.publisher	International Information and Engineering Technology Association	en_US
dc.subject	Computer Science and Technology	en_US
dc.title	Modi document transcription to Devanagari	en_US
dc.type	Journal article	en_US
dc.identifier.impf	cs