First Call for Papers
This edition will be the sixth edition of the workshop collocated with EMNLP 2023, marking a 10-year anniversary.
Bilingual and multilingual speakers often mix languages when they communicate with other multilingual speakers in what is usually known as code-switching (CS). CS can occur on various language levels including inter-sentential, intra-sentential, and even morphological. Practically, it presents long-standing challenges for language technologies, such as machine translation, ASR, language generation, information retrieval and extraction, and semantic processing. Models trained for one language can quickly break down when there is input mixed in from another. The recent breakthough on using multilingual pre-trained language models (LMs) have shown possibility to yield subpar performance on CS data.
Considering the ubiquitous nature of CSW in informal text communication such as newsgroups, tweets, blogs, and other social media, and the number of multilingual speakers worldwide that use these platforms, addressing the challenge of processing CSW data continues to be of great practical value. This workshop aims to bring together researchers interested in technology for mixed language data, in either spoken or written form, and increase community awareness of the different efforts developed to date in this space.
Topics of Interest
The workshop will invite contributions from researchers working in NLP and speech approaches for the analysis and processing of mixed-language data. Topics of relevance to the workshop will include the following:
- Development of resources (dataset/test bed/pre-trained language models) to support research on code-switched data;
- New data augmentation techniques for improving robustness on code-switched data;
- New approaches for NLP downstream tasks: language identification/named entity recognition/sentiment analysis/machine translation/language generation/ASR in code-switched data
- NLP techniques for the syntactic analysis of code-switched data;
- Domain/dialect/genre adaptation techniques applied to code-switched data processing;
- Language modeling approaches to code-switched data processing;
- Survey and position papers discussing the challenges of code-switched data to NLP techniques;
- Sociolinguistic and/or sociopragmatic aspects of code-switching;
- Ethical issues and consideration on code-switching applications.
The workshop accepts three categories of papers: regular workshop papers, non-archival and cross-submissions. Only regular workshop papers will be included in the proceedings as archival publications. The regular workshop papers are eligible for the best paper award. All three categories of papers may be long (maximum 8 pages plus references) or short (maximum 4 pages plus references), with unlimited additional pages for references, following the EMNLP 2023 formatting requirements. The reported research should be substantially original. Accepted papers will be presented as posters and orals. Reviewing will be double-blind, and thus no author information should be included in the papers; self-reference that identifies the authors should be avoided or anonymized. Accepted regular workshop papers will appear in the workshop proceedings.
- Workshop submission deadline (regular and non-archival submissions): 7 September 2023 (Tentative)
- Notification of acceptance: 9 October 2023 (Tentative)
- Camera ready papers due: 16 October 2023 (Tentative)
- Workshop date: November / December 2023 (Tentative)
All deadlines are 11.59 pm UTC -12h (“anywhere on Earth”).