Abstract:
In recent times, code-mixing has become prevalent in social networking as people communicate in multiple languages. This is become a trend and is significantly popular especially in multilingual countries. This has led to the generation of large code-mixed text having useful topics of information dispersed. However, it is very challenging as the code-mixed social media text suffers from its associated linguistic complexities. The main focus of this work is discovery of latent topics indicating useful information from code-mixed social media text overcoming the barriers of random language switch. We evaluate the resulting topic aspect clusters on standard lexical semantic evaluation tasks and show that our method produces substantially better semantic representations than code-mixed counter parts.