dc.contributor.author |
Phadte, A. |
|
dc.date.accessioned |
2018-05-31T10:37:53Z |
|
dc.date.available |
2018-05-31T10:37:53Z |
|
dc.date.issued |
2018 |
|
dc.identifier.citation |
Natural Language Processing and Information Systems, Ed. by: Silberztein M., Atigui F., Kornyshova E., M?tais E., Meziane F. NLDB 2018. Lecture Notes in Computer Science. Springer, Cham. 10859; 2018; 264-271. |
en_US |
dc.identifier.uri |
https://doi.org/10.1007/978-3-319-91947-8_26 |
|
dc.identifier.uri |
http://irgu.unigoa.ac.in/drs/handle/unigoa/5232 |
|
dc.description.abstract |
Code-Mixing is the mixing of two or more languages or language varieties in speech. Apart from the inherent linguistic complexity, the analysis of code-mixed content poses complex challenges owing to the presence of spelling variations and non-adherence to a formal grammar. However, for any downstream Natural Language Processing task, tools that are able to process and analyze code-mixed social media data are required. Currently there is a lack of publicly available resources for code-mixed Konkani-English social media data, while the amount of such text is increasing everyday. The lack of a standard dataset to evaluate these systems makes it difficult to make any meaningful comparisons of their relative accuracies. In this paper, we describe the methodology for the creation of a normalisation dataset for Konkani-English Code-Mixed Social Media Text (CMST). We believe that this dataset will prove useful not only for the evaluation and training of normalisation systems but also help in the linguistic analysis of the process of normalisation Indian languages from native scripts to Roman. Normalisation refers to the process of writing the text of one language using the script of another language whereby the sound of the text is preserved as far as possible. |
en_US |
dc.publisher |
Springer |
en_US |
dc.subject |
Computer Science and Technology |
en_US |
dc.title |
Resource creation for training and testing of normalisation systems for Konkani-English code-mixed social media text |
en_US |
dc.type |
Conference article |
en_US |