Multilingual speakers will often mix languages when they communicate with other multilingual speakers in what is usually known as code-switching (CSW). CSW is typically present on the intersentential, intrasentential and even morphological levels. CSW presents serious challenges for language technologies such as Machine Translation (MT), Automatic Speech Recognition (ASR), language generation (LG), information retrieval (IR) and extraction (IE), and semantic processing. Traditional techniques trained for one language quickly break down when there is input mixed in from another. Recent work has shown that even powerful multilingual models, such as multilingual BERT, yield subpar performance on CSW data (cf. Aguilar and Solorio, 2020).
Considering the ubiquitous nature of CSW in informal text communication such as newsgroups, tweets, blogs, and other social media, and the number of multilingual speakers worldwide that use these platforms, addressing the challenge of processing CSW data continues to be of great practical value. This workshop aims to bring together researchers interested in technology for mixed language data, in either spoken or written form, and increase community awareness of the different efforts developed to date in this space.
Topics of Interest
The workshop will invite contributions from researchers working in NLP and speech approaches for the analysis and processing of mixed-language data. Topics of relevance to the workshop will include the following:
- Development of linguistic resources to support research on code-switched data;
- NLP approaches for any of language identification/named entity recognition/sentiment analysis/machine translation/language generation in code-switched data;
- NLP techniques for the syntactic analysis of code-switched data;
- Domain/dialect/genre adaptation techniques applied to code-switched data processing;
- Language modeling approaches to code-switched data processing;
- Crowdsourcing approaches for the annotation of code-switched data;
- Position papers discussing the challenges of code-switched data to NLP techniques;
- Methods for improving ASR in code switched data;
- Survey papers of NLP research for code-switched data;
- Sociolinguistic and/or sociopragmatic aspects of code-switching.
- Workshop submission deadline (long, short and special track): March 15th, 2021
- Notification of acceptance: April 15th, 2021
- Workshop date: June 11th, 2021
Additional invited speakers will be added soon.