First Call for Papers
This edition will be the seventh edition of the workshop collocated with NAACL 2025.
Bilingual and multilingual speakers often engage in code-switching (CS), mixing languages within a conversation, influenced by cultural nuances. CS can occur at inter-sentential, intra-sentential, and morphological levels, posing challenges for language understanding and generation. Models trained for a single language often struggle with mixed-language input. Despite advances in multilingual pre-trained language models (LMs), they may still perform poorly on CS data. Research on LMs' ability to process CS data, considering cultural nuances, reasoning, coverage, and performance biases, remains underexplored.
As CS becomes more common in informal communication like newsgroups, tweets, and social media, research on LMs processing mixed-language data is urgently needed. This workshop aims to unite researchers working on spoken and written CS technologies, promoting collaboration to improve AI's handling of CS across diverse linguistic contexts.
Topics of Interest
The workshop will invite contributions from researchers working in NLP and speech approaches for the analysis and processing of mixed-language data. Topics of relevance to the workshop include the following:
- Development of data and model resources to support research on CS data
- New data augmentation techniques for improving robustness on CS data
- New approaches for NLP downstream tasks: question answering, conversational agents, named entity recognition, sentiment analysis, machine translation, language generation, and ASR in CS data
- NLP techniques for the syntactic analysis of CS data
- Domain, dialect, genre adaptation techniques applied to CS data processing
- Language modeling approaches to CS data processing
- Sociolinguistic and/or sociopragmatic aspects of CS
- Techniques and metrics for automatically evaluating synthetically generated CS text
- Utilization of LLMs and assessment of their performance on NLP tasks for CS data
- Survey and position papers discussing the challenges of CS data to NLP techniques
- Ethical issues and consideration on CS applications.
Submissions
The workshop accepts three categories of papers: regular workshop papers, non-archival and cross-submissions. Only regular workshop papers will be included in the proceedings as archival publications. The regular workshop papers are eligible for the best paper award. All three categories of papers may be long (maximum 8 pages plus references) or short (maximum 4 pages plus references), with unlimited additional pages for references, following the EMNLP 2023 formatting requirements. The reported research should be substantially original. Accepted papers will be presented as posters and orals. Reviewing will be double-blind, and thus, no author information should be included in the papers; self-reference that identifies the authors should be avoided or anonymized. Accepted regular workshop papers will appear in the workshop proceedings. We welcome papers with a maximum of 2 pages for non-archival submission. Please send us an email if you are submitting the non-archival submission. The limitation section is optional and will not be counted in the page limit.
Shared Task on Automatic Evaluation for CS Text Generation
To enrich the CS community and enhance language inclusivity, we plan to organize a shared-task competition focused on automatically evaluating synthetically generated CS text. Automatic CS text generation is valuable for various tasks, especially given the scarcity of such data. Data augmentation has been effective in improving model performance across tasks and languages. The need for CS text in dialogue systems highlights the benefits of enabling chatbots to produce CS sentences. As the demand for generating CS text increases, robust evaluation methods are essential to assess accuracy and fluency. This area still lacks sufficient research in data and methodologies. Our shared task aims to enable further progress in this field.
Important Dates
The submission portal is open on OpenReview.
Important Dates
- Workshop submission deadline (regular and non-archival submissions): 7 February 2025
- Notification of acceptance: 8 March 2025
- Camera ready papers due: 17 March 2025
- Workshop date: 3/4 May 2025 All deadlines are 11.59 pm UTC -12h (“anywhere on Earth”).