Resource creation for training and testing of normalisation systems for Konkani-English code-mixed social media text

Phadte, A.

IR Home
→
Business, Commerce, Economics & Computer Sciences
→
Goa Business School
→
View Item

dc.contributor.author	Phadte, A.
dc.date.accessioned	2018-05-31T10:37:53Z
dc.date.available	2018-05-31T10:37:53Z
dc.date.issued	2018
dc.identifier.citation	Natural Language Processing and Information Systems, Ed. by: Silberztein M., Atigui F., Kornyshova E., M?tais E., Meziane F. NLDB 2018. Lecture Notes in Computer Science. Springer, Cham. 10859; 2018; 264-271.	en_US
dc.identifier.uri	https://doi.org/10.1007/978-3-319-91947-8_26
dc.identifier.uri	http://irgu.unigoa.ac.in/drs/handle/unigoa/5232
dc.description.abstract	Code-Mixing is the mixing of two or more languages or language varieties in speech. Apart from the inherent linguistic complexity, the analysis of code-mixed content poses complex challenges owing to the presence of spelling variations and non-adherence to a formal grammar. However, for any downstream Natural Language Processing task, tools that are able to process and analyze code-mixed social media data are required. Currently there is a lack of publicly available resources for code-mixed Konkani-English social media data, while the amount of such text is increasing everyday. The lack of a standard dataset to evaluate these systems makes it difficult to make any meaningful comparisons of their relative accuracies. In this paper, we describe the methodology for the creation of a normalisation dataset for Konkani-English Code-Mixed Social Media Text (CMST). We believe that this dataset will prove useful not only for the evaluation and training of normalisation systems but also help in the linguistic analysis of the process of normalisation Indian languages from native scripts to Roman. Normalisation refers to the process of writing the text of one language using the script of another language whereby the sound of the text is preserved as far as possible.	en_US
dc.publisher	Springer	en_US
dc.subject	Computer Science and Technology	en_US
dc.title	Resource creation for training and testing of normalisation systems for Konkani-English code-mixed social media text	en_US
dc.type	Conference article	en_US