CS Workshop

First Call for Papers

This edition will be the sixth edition of the workshop collocated with EMNLP 2023, marking a 10-year anniversary.

Bilingual and multilingual speakers often mix languages when they communicate with other multilingual speakers in what is usually known as code-switching (CS). CS can occur on various language levels including inter-sentential, intra-sentential, and even morphological. Practically, it presents long-standing challenges for language technologies, such as machine translation, ASR, language generation, information retrieval and extraction, and semantic processing. Models trained for one language can quickly break down when there is input mixed in from another. The recent breakthough on using multilingual pre-trained language models (LMs) have shown possibility to yield subpar performance on CS data.

Considering the ubiquitous nature of CSW in informal text communication such as newsgroups, tweets, blogs, and other social media, and the number of multilingual speakers worldwide that use these platforms, addressing the challenge of processing CSW data continues to be of great practical value. This workshop aims to bring together researchers interested in technology for mixed language data, in either spoken or written form, and increase community awareness of the different efforts developed to date in this space.

Topics of Interest

The workshop will invite contributions from researchers working in NLP and speech approaches for the analysis and processing of mixed-language data. Topics of relevance to the workshop will include the following:

Development of resources (dataset/test bed/pre-trained language models) to support research on code-switched data;
New data augmentation techniques for improving robustness on code-switched data;
New approaches for NLP downstream tasks: language identification/named entity recognition/sentiment analysis/machine translation/language generation/ASR in code-switched data
NLP techniques for the syntactic analysis of code-switched data;
Domain/dialect/genre adaptation techniques applied to code-switched data processing;
Language modeling approaches to code-switched data processing;
Survey and position papers discussing the challenges of code-switched data to NLP techniques;
Sociolinguistic and/or sociopragmatic aspects of code-switching;
Ethical issues and consideration on code-switching applications.

Submissions

The workshop accepts three categories of papers: regular workshop papers, non-archival and cross-submissions. Only regular workshop papers will be included in the proceedings as archival publications. The regular workshop papers are eligible for the best paper award. All three categories of papers may be long (maximum 8 pages plus references) or short (maximum 4 pages plus references), with unlimited additional pages for references, following the EMNLP 2023 formatting requirements. The reported research should be substantially original. Accepted papers will be presented as posters and orals. Reviewing will be double-blind, and thus, no author information should be included in the papers; self-reference that identifies the authors should be avoided or anonymized. Accepted regular workshop papers will appear in the workshop proceedings. We welcome papers with a maximum of 2 pages for non-archival submission. Please send us an email if you are submitting the non-archival submission. The limitation section is optional and will not be counted in the page limit.

Accepted Papers

TongueSwitcher: Fine-Grained Identification of German-English Code-Switching
Towards Real-World Streaming Speech Translation for Code-Switched Speech
Language Preference for Expression of Sentiment for Nepali-English Bilingual Speakers on Social Media
Text-Derived Language Identity Incorporation for End-to-End Code-Switching Speech Recognition
Prompting Multilingual Large Language Models to Generate Code-Mixed Texts: The Case of South East Asian Languages
CONFLATOR: Incorporating Switching Point based Rotatory Positional Encodings for Code-Mixed Language Modeling
Unified Model for Code-Switching Speech Recognition and Language Identification Based on Concatenated Tokenizer
Multilingual self-supervised speech representations improve the speech recognition of low-resource African languages with codeswitching

Program

	CALCS @ EMNLP 2023 Thursday, December 7, 2023 (Singapore Time, UTC+8)
09:00 - 09:10	Welcome Remarks Thamar Solorio, Sudipta Kar, Marina Zhukova
09:10 - 10:25	*Morning Session I*
09:10 - 09:25	Towards Real-World Streaming Speech Translation for Code-Switched Speech Belen Alastruey, Matthias Sperber, Christian Gollan, Dominic C Telaar, Tim Ng, Aashish Agarwal
09:25 - 09:40	Text-Derived Language Identity Incorporation for End-to-End Code-Switching Speech Recognition Qinyi Wang, Haizhou Li
09:40 - 09:55	TongueSwitcher: Fine-Grained Identification of German-English Code-Switching Igor Sterner, Simone Teufel
09:55 - 10:10	CONFLATOR: Incorporating Switching Point based Rotatory Positional Encodings for Code-Mixed Language Modeling Mohsin Ali Mohammed, Sai Teja Kandukuri, Neeharika Gupta, Parth Patwa, Anubhab Chatterjee, Vinija Jain, Aman Chadha, Amitava Das
10:10 - 10:25	Prompting Multilingual Large Language Models to Generate Code-Mixed Texts: The Case of South East Asian Languages Zheng Xin Yong, Ruochen Zhang, Jessica Forde, Skyler Wang, Arjun Subramonian, Samuel Cahyawijaya, Holy Lovenia, Genta Winata, Lintang Sutawika, Jan Christian Blaise Cruz, Yin Lin Tan, Long Phan, Rowena Garcia, Thamar Solorio, Alham Aji
10:25 - 11:00	*Coffee Break*
11:00 - 12:30	Morning Session II
11:00 - 11:45	Invited Talk: Resource-efficient Computational Models for Code-switched Speech and Text Preethi Jyothi
11:45 - 12:00	Multilingual self-supervised speech representations improve the speech recognition of low-resource African languages with codeswitching Tolulope Ogunremi, Christopher Manning, Dan Jurafsky
12:00 - 12:15	Language Preference for Expression of Sentiment for Nepali-English Bilingual Speakers on Social Media Niraj Pahari, Kazutaka Shimada
12:15 - 12:30	Code-Switching with Word Senses for Pretraining in Neural Machine Translation Vivek Iyer, Edoardo Barba, Alexandra Birch, Jeff Z. Pan, Roberto Navigli
12:30 - 14:00	*Lunch Break*
14:00 - 15:30	Afternoon Session I
14:00 - 14:15	Unified Model for Code-Switching Speech Recognition and Language Identification Based on Concatenated Tokenizer Kunal Dhawan, Dima Rekesh, Boris Ginsburg
14:15 - 15:30	Panel Discussion: Code-Switching and LLMs Monojit Choudhury, Sudipta Kar, Genta Winata, Sunayana Sitaram, Marina Zhukova
15:30 - 16:00	*Coffee Break*
16:00 - 16:45	Afternoon Session II
16:00 - 16:45	Invited Talk: Modeling Code-Switch Languages Using Bilingual Parallel Corpus Haizhou Li
16:45 - 16:50	*Best Paper Awards* Genta Indra Winata
16:50 - 16:55	*Closing Remarks* Sudipta Kar

Important Dates

The submission portal is open on OpenReview.

Important Dates

Workshop submission deadline (regular and non-archival submissions): 11 September 2023 23:59 (UTC-0)
Notification of acceptance: 10 October 2023 (AoE)
Camera ready papers due: 18 October 2023 (AoE)
Workshop date: 7 December 2023 (AoE)