dc.contributor.author |
Kundaikar, T. |
|
dc.contributor.author |
Fadte, S.S. |
|
dc.contributor.author |
Karmali, R. |
|
dc.contributor.author |
Pawar, J.D. |
|
dc.date.accessioned |
2024-04-29T07:31:48Z |
|
dc.date.available |
2024-04-29T07:31:48Z |
|
dc.date.issued |
2024 |
|
dc.identifier.citation |
Ingénierie des Systèmes d’Information. 29(2); 2024; 619-626. |
en_US |
dc.identifier.uri |
https://doi.org/10.18280/isi.290223 |
|
dc.identifier.uri |
http://irgu.unigoa.ac.in/drs/handle/unigoa/7297 |
|
dc.description.abstract |
Optical Character Recognition (OCR) systems find it challenging to generate accurate text for highly inflectional Indic languages such as Hindi. Inflectional languages possess an extensive vocabulary. Words in these languages can assume different forms based on factors like gender, meaning, or other contextual cues. To enhance the accuracy of OCR and correct the errors resulting from the inflectional nature of language, it is crucial to perform post-processing on output of the OCR. This work focuses on correcting errors in the OCR output specifically for the Hindi language. To overcome existing challenges, an error correction model has been proposed in this work that uses the Masked-Language Modeling with BERT (MLM-BERT). It utilizes the context to provide accurate word suggestions for the incorrect word or masked word. The proposed model has been tested using the Hindi OCR test dataset from IIITH. It achieved an improvement of 3.58 percent word accuracy over the baseline OCR word accuracy, which demonstrates its effectiveness in enhancing the accuracy of the OCR output text. |
en_US |
dc.publisher |
International Information and Engineering Technology Association (IIETA) |
en_US |
dc.subject |
Computer Science and Technology |
en_US |
dc.title |
Automatic Hindi OCR error correction using MLM-BERT |
en_US |
dc.type |
Journal article |
en_US |
dc.identifier.impf |
cs |
|