First Call for Papers
This edition will be the sixth edition of the workshop collocated with EMNLP 2023, marking a 10-year anniversary.
Bilingual and multilingual speakers often mix languages when they communicate with other multilingual speakers in what is usually known as code-switching (CS). CS can occur on various language levels including inter-sentential, intra-sentential, and even morphological. Practically, it presents long-standing challenges for language technologies, such as machine translation, ASR, language generation, information retrieval and extraction, and semantic processing. Models trained for one language can quickly break down when there is input mixed in from another. The recent breakthough on using multilingual pre-trained language models (LMs) have shown possibility to yield subpar performance on CS data.
Considering the ubiquitous nature of CSW in informal text communication such as newsgroups, tweets, blogs, and other social media, and the number of multilingual speakers worldwide that use these platforms, addressing the challenge of processing CSW data continues to be of great practical value. This workshop aims to bring together researchers interested in technology for mixed language data, in either spoken or written form, and increase community awareness of the different efforts developed to date in this space.
Topics of Interest
The workshop will invite contributions from researchers working in NLP and speech approaches for the analysis and processing of mixed-language data. Topics of relevance to the workshop will include the following:
- Development of resources (dataset/test bed/pre-trained language models) to support research on code-switched data;
- New data augmentation techniques for improving robustness on code-switched data;
- New approaches for NLP downstream tasks: language identification/named entity recognition/sentiment analysis/machine translation/language generation/ASR in code-switched data
- NLP techniques for the syntactic analysis of code-switched data;
- Domain/dialect/genre adaptation techniques applied to code-switched data processing;
- Language modeling approaches to code-switched data processing;
- Survey and position papers discussing the challenges of code-switched data to NLP techniques;
- Sociolinguistic and/or sociopragmatic aspects of code-switching;
- Ethical issues and consideration on code-switching applications.
Submissions
The workshop accepts three categories of papers: regular workshop papers, non-archival and cross-submissions. Only regular workshop papers will be included in the proceedings as archival publications. The regular workshop papers are eligible for the best paper award. All three categories of papers may be long (maximum 8 pages plus references) or short (maximum 4 pages plus references), with unlimited additional pages for references, following the EMNLP 2023 formatting requirements. The reported research should be substantially original. Accepted papers will be presented as posters and orals. Reviewing will be double-blind, and thus, no author information should be included in the papers; self-reference that identifies the authors should be avoided or anonymized. Accepted regular workshop papers will appear in the workshop proceedings. We welcome papers with a maximum of 2 pages for non-archival submission. Please send us an email if you are submitting the non-archival submission. The limitation section is optional and will not be counted in the page limit.
Accepted Papers
- TongueSwitcher: Fine-Grained Identification of German-English Code-Switching
- Towards Real-World Streaming Speech Translation for Code-Switched Speech
- Language Preference for Expression of Sentiment for Nepali-English Bilingual Speakers on Social Media
- Text-Derived Language Identity Incorporation for End-to-End Code-Switching Speech Recognition
- Prompting Multilingual Large Language Models to Generate Code-Mixed Texts: The Case of South East Asian Languages
- CONFLATOR: Incorporating Switching Point based Rotatory Positional Encodings for Code-Mixed Language Modeling
- Unified Model for Code-Switching Speech Recognition and Language Identification Based on Concatenated Tokenizer
- Multilingual self-supervised speech representations improve the speech recognition of low-resource African languages with codeswitching
Program
CALCS @ EMNLP 2023 Thursday, December 7, 2023 (Singapore Time, UTC+8) |
|
---|---|
09:00 - 09:10 | Welcome Remarks Thamar Solorio, Sudipta Kar, Marina Zhukova |
09:10 - 10:25 | Morning Session I |
09:10 - 09:25 | Towards Real-World Streaming Speech Translation for Code-Switched Speech Belen Alastruey, Matthias Sperber, Christian Gollan, Dominic C Telaar, Tim Ng, Aashish Agarwal |
09:25 - 09:40 | Text-Derived Language Identity Incorporation for End-to-End Code-Switching Speech Recognition Qinyi Wang, Haizhou Li |
09:40 - 09:55 | TongueSwitcher: Fine-Grained Identification of German-English Code-Switching Igor Sterner, Simone Teufel |
09:55 - 10:10 | CONFLATOR: Incorporating Switching Point based Rotatory Positional Encodings for Code-Mixed Language Modeling Mohsin Ali Mohammed, Sai Teja Kandukuri, Neeharika Gupta, Parth Patwa, Anubhab Chatterjee, Vinija Jain, Aman Chadha, Amitava Das |
10:10 - 10:25 | Prompting Multilingual Large Language Models to Generate Code-Mixed Texts: The Case of South East Asian Languages Zheng Xin Yong, Ruochen Zhang, Jessica Forde, Skyler Wang, Arjun Subramonian, Samuel Cahyawijaya, Holy Lovenia, Genta Winata, Lintang Sutawika, Jan Christian Blaise Cruz, Yin Lin Tan, Long Phan, Rowena Garcia, Thamar Solorio, Alham Aji |
10:25 - 11:00 | Coffee Break |
11:00 - 12:30 | Morning Session II |
11:00 - 11:45 | Invited Talk: Resource-efficient Computational Models for Code-switched Speech and Text Preethi Jyothi |
11:45 - 12:00 | Multilingual self-supervised speech representations improve the speech recognition of low-resource African languages with codeswitching Tolulope Ogunremi, Christopher Manning, Dan Jurafsky |
12:00 - 12:15 | Language Preference for Expression of Sentiment for Nepali-English Bilingual Speakers on Social Media Niraj Pahari, Kazutaka Shimada |
12:15 - 12:30 | Code-Switching with Word Senses for Pretraining in Neural Machine Translation Vivek Iyer, Edoardo Barba, Alexandra Birch, Jeff Z. Pan, Roberto Navigli |
12:30 - 14:00 | Lunch Break |
14:00 - 15:30 | Afternoon Session I |
14:00 - 14:15 | Unified Model for Code-Switching Speech Recognition and Language Identification Based on Concatenated Tokenizer Kunal Dhawan, Dima Rekesh, Boris Ginsburg |
14:15 - 15:30 | Panel Discussion: Code-Switching and LLMs Monojit Choudhury, Sudipta Kar, Genta Winata, Sunayana Sitaram, Marina Zhukova |
15:30 - 16:00 | Coffee Break |
16:00 - 16:45 | Afternoon Session II |
16:00 - 16:45 | Invited Talk: Modeling Code-Switch Languages Using Bilingual Parallel Corpus Haizhou Li |
16:45 - 16:50 | Best Paper Awards Genta Indra Winata |
16:50 - 16:55 | Closing Remarks Sudipta Kar |
Important Dates
The submission portal is open on OpenReview.
Important Dates
- Workshop submission deadline (regular and non-archival submissions): 11 September 2023 23:59 (UTC-0)
- Notification of acceptance: 10 October 2023 (AoE)
- Camera ready papers due: 18 October 2023 (AoE)
- Workshop date: 7 December 2023 (AoE)