IR @ Goa University

Word Level Language Identification system for Konkani-English Code-Mixed Social Media Text (CMST)

Show simple item record

dc.contributor.author Phadte, A.
dc.contributor.author Wagh, R.S.
dc.date.accessioned 2018-10-09T09:29:20Z
dc.date.available 2018-10-09T09:29:20Z
dc.date.issued 2017
dc.identifier.citation Proc. 10. Annual ACM India Compute Conf., 16-18 Nov 2017. 2017; 103-107. en_US
dc.identifier.uri https://doi.org/10.1145/3140107.3140132
dc.identifier.uri http://irgu.unigoa.ac.in/drs/handle/unigoa/5446
dc.description.abstract In this paper, we present an pure logic study on problem of word- level language identification for Konkani-English Code-Mixed Social Media Text (CMST). we describe a new dataset which contains of more than thousands posts from Facebook posts that exhibit code mixing between Konkani-English. To the best of our knowledge, our work is the first attempt at the creation of a linguistic resource for this language pair which will be made public and developed a language identification System for Konkani-English language pair. Using this Konkani-English tagged dataset we have carried out experiment on language detection at word level. We have used Different ways to solve language detection task, unsupervised dictionary-based detection technique, supervised Language identification of word level using sequence labelling using Conditional Random Fields based models, SVM, Random Forest. en_US
dc.publisher ACM en_US
dc.subject Computer Science and Technology en_US
dc.title Word Level Language Identification system for Konkani-English Code-Mixed Social Media Text (CMST) en_US
dc.type Conference article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search IR


Advanced Search

Browse

My Account