Albuquerque, New Mexico, USA
Computational Approaches to Linguistic Code-Switching, CALCS 2025
Albuquerque, New Mexico, USA

First Call for Papers


This edition will be the seventh edition of the workshop collocated with NAACL 2025.

Bilingual and multilingual speakers often engage in code-switching (CS), mixing languages within a conversation, influenced by cultural nuances. CS can occur at inter-sentential, intra-sentential, and morphological levels, posing challenges for language understanding and generation. Models trained for a single language often struggle with mixed-language input. Despite advances in multilingual pre-trained language models (LMs), they may still perform poorly on CS data. Research on LMs' ability to process CS data, considering cultural nuances, reasoning, coverage, and performance biases, remains underexplored.

As CS becomes more common in informal communication like newsgroups, tweets, and social media, research on LMs processing mixed-language data is urgently needed. This workshop aims to unite researchers working on spoken and written CS technologies, promoting collaboration to improve AI's handling of CS across diverse linguistic contexts.

Topics of Interest


The workshop will invite contributions from researchers working in NLP and speech approaches for the analysis and processing of mixed-language data. Topics of relevance to the workshop include the following:

  1. Development of data and model resources to support research on CS data
  2. New data augmentation techniques for improving robustness on CS data
  3. New approaches for NLP downstream tasks: question answering, conversational agents, named entity recognition, sentiment analysis, machine translation, language generation, and ASR in CS data
  4. NLP techniques for the syntactic analysis of CS data
  5. Domain, dialect, genre adaptation techniques applied to CS data processing
  6. Language modeling approaches to CS data processing
  7. Sociolinguistic and/or sociopragmatic aspects of CS
  8. Techniques and metrics for automatically evaluating synthetically generated CS text
  9. Utilization of LLMs and assessment of their performance on NLP tasks for CS data
  10. Survey and position papers discussing the challenges of CS data to NLP techniques
  11. Ethical issues and consideration on CS applications.

Submissions


The workshop accepts three categories of papers: regular workshop papers, non-archival and cross-submissions. Only regular workshop papers will be included in the proceedings as archival publications. The regular workshop papers are eligible for the best paper award. All three categories of papers may be long (maximum 8 pages plus references) or short (maximum 4 pages plus references), with unlimited additional pages for references, following the EMNLP 2023 formatting requirements. The reported research should be substantially original. Accepted papers will be presented as posters and orals. Reviewing will be double-blind, and thus, no author information should be included in the papers; self-reference that identifies the authors should be avoided or anonymized. Accepted regular workshop papers will appear in the workshop proceedings. We welcome papers with a maximum of 2 pages for non-archival submission. Please send us an email if you are submitting the non-archival submission. The limitation section is optional and will not be counted in the page limit.

Shared Task on Automatic Evaluation for CS Text Generation


To enrich the CS community and enhance language inclusivity, we plan to organize a shared-task competition focused on automatically evaluating synthetically generated CS text. Automatic CS text generation is valuable for various tasks, especially given the scarcity of such data. Data augmentation has been effective in improving model performance across tasks and languages. The need for CS text in dialogue systems highlights the benefits of enabling chatbots to produce CS sentences. As the demand for generating CS text increases, robust evaluation methods are essential to assess accuracy and fluency. This area still lacks sufficient research in data and methodologies. Our shared task aims to enable further progress in this field.

Important Dates


The submission portal is open on OpenReview.

Important Dates


  • Workshop submission deadline (regular and non-archival submissions): 7 February 2025
  • Notification of acceptance: 8 March 2025
  • Camera ready papers due: 17 March 2025
  • Workshop date: 3/4 May 2025 All deadlines are 11.59 pm UTC -12h (“anywhere on Earth”).

Program Committee


Ana Valeria González    Novo Nordisk
Astik Biswas    Oracle
Alexander Gelbukh    ICM
Barbara Bullock    UT Austin
Constantine Lignos    Brandeis University
Costin-Gabriel Chiru    Ghent University
Els Lefever    Ghent University
Emre Yılmaz    University of Houston-Downtown
Emily Ahn    University of Washington
Grandee Lee    SUSS
Helena Gómez Adorno    UAM
Jacqueline Toribio    UT Austin
Julia Hirschberg    Columbia University
Khyathi Chandu    Allen AI
Parth Patwa    AWS
Pastor López Monroy    Centro de Investigaciones Matemáticas Avanzadas
Manuel Mager    Amazon
Samson Tan    Amazon AWS
Segun Taofeek Aroyehun    University of Konstanz
Steven Abney    University of Michigan
Yang Liu    Amazon
Yerbolat Kassanov    ByteDance

Organizers


Genta Indra Winata,
Senior Applied Scientist, Capital One AI Foundations, USA
Sudipta Kar,
Senior Applied Scientist, Amazon Alexa AI, USA
Marina Zhukova,
Ph.D. Candidate, Department of Linguistics, University of California, Santa Barbara, USA
Injy Hamed,
Postdoctoral Associate, MBZUAI, UAE
Garry Kuwanto,
Ph.D. Student, Boston University, USA
Mahardika Krisna Ihsani,
M.Sc. Candidate, MBZUAI, UAE
Barid Xi Ai,
Postdoctoral Researcher, NUS, Singapore
Derry Tanti Wijaya,
Associate Professor, Monash University Indonesia, Indonesia
Adjunct Faculty, Boston University, USA
Thamar Solorio,
Professor, MBZUAI, UAE
Professor, University of Houston, USA