Singapore
Computational Approaches to Linguistic Code-Switching, CALCS 2023
Singapore

First Call for Papers


This edition will be the sixth edition of the workshop collocated with EMNLP 2023, marking a 10-year anniversary.

Bilingual and multilingual speakers often mix languages when they communicate with other multilingual speakers in what is usually known as code-switching (CS). CS can occur on various language levels including inter-sentential, intra-sentential, and even morphological. Practically, it presents long-standing challenges for language technologies, such as machine translation, ASR, language generation, information retrieval and extraction, and semantic processing. Models trained for one language can quickly break down when there is input mixed in from another. The recent breakthough on using multilingual pre-trained language models (LMs) have shown possibility to yield subpar performance on CS data.

Considering the ubiquitous nature of CSW in informal text communication such as newsgroups, tweets, blogs, and other social media, and the number of multilingual speakers worldwide that use these platforms, addressing the challenge of processing CSW data continues to be of great practical value. This workshop aims to bring together researchers interested in technology for mixed language data, in either spoken or written form, and increase community awareness of the different efforts developed to date in this space.

Topics of Interest


The workshop will invite contributions from researchers working in NLP and speech approaches for the analysis and processing of mixed-language data. Topics of relevance to the workshop will include the following:

  1. Development of resources (dataset/test bed/pre-trained language models) to support research on code-switched data;
  2. New data augmentation techniques for improving robustness on code-switched data;
  3. New approaches for NLP downstream tasks: language identification/named entity recognition/sentiment analysis/machine translation/language generation/ASR in code-switched data
  4. NLP techniques for the syntactic analysis of code-switched data;
  5. Domain/dialect/genre adaptation techniques applied to code-switched data processing;
  6. Language modeling approaches to code-switched data processing;
  7. Survey and position papers discussing the challenges of code-switched data to NLP techniques;
  8. Sociolinguistic and/or sociopragmatic aspects of code-switching;
  9. Ethical issues and consideration on code-switching applications.

Submissions


The workshop accepts three categories of papers: regular workshop papers, non-archival and cross-submissions. Only regular workshop papers will be included in the proceedings as archival publications. The regular workshop papers are eligible for the best paper award. All three categories of papers may be long (maximum 8 pages plus references) or short (maximum 4 pages plus references), with unlimited additional pages for references, following the EMNLP 2023 formatting requirements. The reported research should be substantially original. Accepted papers will be presented as posters and orals. Reviewing will be double-blind, and thus, no author information should be included in the papers; self-reference that identifies the authors should be avoided or anonymized. Accepted regular workshop papers will appear in the workshop proceedings. We welcome papers with a maximum of 2 pages for non-archival submission. Please send us an email if you are submitting the non-archival submission. The limitation section is optional and will not be counted in the page limit.

Accepted Papers


  • TongueSwitcher: Fine-Grained Identification of German-English Code-Switching
  • Towards Real-World Streaming Speech Translation for Code-Switched Speech
  • Language Preference for Expression of Sentiment for Nepali-English Bilingual Speakers on Social Media
  • Text-Derived Language Identity Incorporation for End-to-End Code-Switching Speech Recognition
  • Prompting Multilingual Large Language Models to Generate Code-Mixed Texts: The Case of South East Asian Languages
  • CONFLATOR: Incorporating Switching Point based Rotatory Positional Encodings for Code-Mixed Language Modeling
  • Unified Model for Code-Switching Speech Recognition and Language Identification Based on Concatenated Tokenizer
  • Multilingual self-supervised speech representations improve the speech recognition of low-resource African languages with codeswitching

Program


CALCS @ EMNLP 2023
Thursday, December 7, 2023 (Singapore Time, UTC+8)
09:00 - 09:10 Welcome Remarks
Thamar Solorio, Sudipta Kar, Marina Zhukova
09:10 - 10:25 Morning Session I
09:10 - 09:25 Towards Real-World Streaming Speech Translation for Code-Switched Speech
Belen Alastruey, Matthias Sperber, Christian Gollan, Dominic C Telaar, Tim Ng, Aashish Agarwal
09:25 - 09:40 Text-Derived Language Identity Incorporation for End-to-End Code-Switching Speech Recognition
Qinyi Wang, Haizhou Li
09:40 - 09:55 TongueSwitcher: Fine-Grained Identification of German-English Code-Switching
Igor Sterner, Simone Teufel
09:55 - 10:10 CONFLATOR: Incorporating Switching Point based Rotatory Positional Encodings for Code-Mixed Language Modeling
Mohsin Ali Mohammed, Sai Teja Kandukuri, Neeharika Gupta, Parth Patwa, Anubhab Chatterjee, Vinija Jain, Aman Chadha, Amitava Das
10:10 - 10:25 Prompting Multilingual Large Language Models to Generate Code-Mixed Texts: The Case of South East Asian Languages
Zheng Xin Yong, Ruochen Zhang, Jessica Forde, Skyler Wang, Arjun Subramonian, Samuel Cahyawijaya, Holy Lovenia, Genta Winata, Lintang Sutawika, Jan Christian Blaise Cruz, Yin Lin Tan, Long Phan, Rowena Garcia, Thamar Solorio, Alham Aji
10:25 - 11:00 Coffee Break
11:00 - 12:30 Morning Session II
11:00 - 11:45 Invited Talk: Resource-efficient Computational Models for Code-switched Speech and Text
Preethi Jyothi
11:45 - 12:00 Multilingual self-supervised speech representations improve the speech recognition of low-resource African languages with codeswitching
Tolulope Ogunremi, Christopher Manning, Dan Jurafsky
12:00 - 12:15 Language Preference for Expression of Sentiment for Nepali-English Bilingual Speakers on Social Media
Niraj Pahari, Kazutaka Shimada
12:15 - 12:30 Code-Switching with Word Senses for Pretraining in Neural Machine Translation
Vivek Iyer, Edoardo Barba, Alexandra Birch, Jeff Z. Pan, Roberto Navigli
12:30 - 14:00 Lunch Break
14:00 - 15:30 Afternoon Session I
14:00 - 14:15 Unified Model for Code-Switching Speech Recognition and Language Identification Based on Concatenated Tokenizer
Kunal Dhawan, Dima Rekesh, Boris Ginsburg
14:15 - 15:30 Panel Discussion: Code-Switching and LLMs
Monojit Choudhury, Sudipta Kar, Genta Winata, Sunayana Sitaram, Marina Zhukova
15:30 - 16:00 Coffee Break
16:00 - 16:45 Afternoon Session II
16:00 - 16:45 Invited Talk: Modeling Code-Switch Languages Using Bilingual Parallel Corpus
Haizhou Li
16:45 - 16:50 Best Paper Awards
Genta Indra Winata
16:50 - 16:55 Closing Remarks
Sudipta Kar

Important Dates


The submission portal is open on OpenReview.

Important Dates


  • Workshop submission deadline (regular and non-archival submissions): 11 September 2023 23:59 (UTC-0)
  • Notification of acceptance: 10 October 2023 (AoE)
  • Camera ready papers due: 18 October 2023 (AoE)
  • Workshop date: 7 December 2023 (AoE)

Program Committee


A. Seza Doğruöz   
Abhinav Arora   
Dama Sravani   
David Vilares   
Elena Álvarez-Mellado   
Els Lefever   
Holy Lovenia   
François Yvon   
Ganesh Jawahar   
Gustavo Aguilar   
Kellen Gillespie   
Manuel Mager   
Parth Patwa   
Salim Sazzed   
Segun Aroyehun   
Shuguang Chen   
Suman Dowlagar   
Suraj Maharjan   
Tanya Roosta   
Vivek Srivastava   
Xingzhi Guo   
Yihong Theis   

Organizers


Senior Research Scientist
Bloomberg LP
USA
Applied Scientist
Amazon Alexa AI
USA
Ph.D. Student
Department of Linguistics, University of California, Santa Barbara
USA
Professor
Department of Computer Science, University of Houston
USA
Professor and Director's
Language Technology Institute, Carnegie Mellon University
USA
Senior Researcher
Microsoft Research
India
Principal Data and Applied Scientist
Microsoft Turing
India
Principal Researcher
Microsoft Research
India

Sponsors