IR @ Goa University

Resource creation for training and testing of normalisation systems for Konkani-English code-mixed social media text

Show simple item record

dc.contributor.author Phadte, A.
dc.date.accessioned 2018-05-31T10:37:53Z
dc.date.available 2018-05-31T10:37:53Z
dc.date.issued 2018
dc.identifier.citation Natural Language Processing and Information Systems, Ed. by: Silberztein M., Atigui F., Kornyshova E., M?tais E., Meziane F. NLDB 2018. Lecture Notes in Computer Science. Springer, Cham. 10859; 2018; 264-271. en_US
dc.identifier.uri https://doi.org/10.1007/978-3-319-91947-8_26
dc.identifier.uri http://irgu.unigoa.ac.in/drs/handle/unigoa/5232
dc.description.abstract Code-Mixing is the mixing of two or more languages or language varieties in speech. Apart from the inherent linguistic complexity, the analysis of code-mixed content poses complex challenges owing to the presence of spelling variations and non-adherence to a formal grammar. However, for any downstream Natural Language Processing task, tools that are able to process and analyze code-mixed social media data are required. Currently there is a lack of publicly available resources for code-mixed Konkani-English social media data, while the amount of such text is increasing everyday. The lack of a standard dataset to evaluate these systems makes it difficult to make any meaningful comparisons of their relative accuracies. In this paper, we describe the methodology for the creation of a normalisation dataset for Konkani-English Code-Mixed Social Media Text (CMST). We believe that this dataset will prove useful not only for the evaluation and training of normalisation systems but also help in the linguistic analysis of the process of normalisation Indian languages from native scripts to Roman. Normalisation refers to the process of writing the text of one language using the script of another language whereby the sound of the text is preserved as far as possible. en_US
dc.publisher Springer en_US
dc.subject Computer Science and Technology en_US
dc.title Resource creation for training and testing of normalisation systems for Konkani-English code-mixed social media text en_US
dc.type Conference article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search IR


Advanced Search

Browse

My Account