Venue to be defined
Computational Approaches to Linguistic Code-Switching, CALCS 2021
Venue to be defined

Registration to the Workshop


Participants interested to join the workshop program should register at the NAACL main registration site following this link: Registration to NAACL-HLT 2021

First Call for Papers


Multilingual speakers will often mix languages when they communicate with other multilingual speakers in what is usually known as code-switching (CSW). CSW is typically present on the intersentential, intrasentential and even morphological levels. CSW presents serious challenges for language technologies such as Machine Translation (MT), Automatic Speech Recognition (ASR), language generation (LG), information retrieval (IR) and extraction (IE), and semantic processing. Traditional techniques trained for one language quickly break down when there is input mixed in from another. Recent work has shown that even powerful multilingual models, such as multilingual BERT, yield subpar performance on CSW data (cf. Aguilar and Solorio, 2020).

Considering the ubiquitous nature of CSW in informal text communication such as newsgroups, tweets, blogs, and other social media, and the number of multilingual speakers worldwide that use these platforms, addressing the challenge of processing CSW data continues to be of great practical value. This workshop aims to bring together researchers interested in technology for mixed language data, in either spoken or written form, and increase community awareness of the different efforts developed to date in this space.

Topics of Interest


The workshop will invite contributions from researchers working in NLP and speech approaches for the analysis and processing of mixed-language data. Topics of relevance to the workshop will include the following:

  1. Development of linguistic resources to support research on code-switched data;
  2. NLP approaches for any of language identification/named entity recognition/sentiment analysis/machine translation/language generation in code-switched data;
  3. NLP techniques for the syntactic analysis of code-switched data;
  4. Domain/dialect/genre adaptation techniques applied to code-switched data processing;
  5. Language modeling approaches to code-switched data processing;
  6. Crowdsourcing approaches for the annotation of code-switched data;
  7. Position papers discussing the challenges of code-switched data to NLP techniques;
  8. Methods for improving ASR in code switched data;
  9. Survey papers of NLP research for code-switched data;
  10. Sociolinguistic and/or sociopragmatic aspects of code-switching.

Important Dates


  • Workshop submission deadline (long, short and special track): March 29th
  • Notification of acceptance: April 15th
  • **Notification of acceptance: April 19th
  • Camera ready papers due: April 26th
  • Workshop date: June 11th

All deadlines are 11.59 pm UTC -12h (“anywhere on Earth”).

Shared Tasks on Machine Translation in Code-Switching Settings


In the past few years we have organized a series of shared tasks focusing primarily on enabling technology for code-switching, including language identification, part of speech tagging and named entity recognition. This year we are organizing a series of shared tasks involving machine translation for code-switching settings in multiple language combinations and directions.

Task 1. Supervised Setting: MT for English → Hinglish

In this task we provide gold standard data to train and evaluate MT models to take English as input and generate Hinglish data.

Task 2. Unsupervised Setting: MT for multiple language combinations

We provide raw data with no gold label translations. Participants are challenged to work on systems that can generate high quality translations in the pairs shown below. More language directions may be added soon:

  • Spanish-English → English
  • Spanish-English → Spanish
  • English → Spanish-English
  • English → Spanish
  • Modern Standard Arabic-Egyptian Arabic → English
  • Modern Standard Arabic-Egyptian Arabic → Spanish
  • English → Modern Standard Arabic-Egyptian Arabic

For example:

  • [Spanish-English → English]: I’m expecting dos camonietas llenas de rosas This weekend. → I’m expecting two trucks full of roses This weekend.
  • [Spanish-English → Spanish]: Es viernes y el outfit lo sabe → Es viernes y el atuendo lo sabe
  • [English → Spanish]: My goal is to move to my own apartment next year → Mi objetivo es mudarme a mi propio apartamento el próximo año
  • [Spanish → English]: A mi manera o pa la calle!! → My way or the highway!!
Evaluation

We will use Linguistic Code-Switching Evaluation Benchmark. The leaderboard will rank systems based on BLUE scores. We also plan to do a smaller, human evaluation that will be presented at the workshop.

Datasets

To access the data sets go here: Linguistic Code-Switching Evaluation Benchmark

[Update (03/29/2021)]: The usernames are removed from the datasets. Please download the newest version of datasets from Lince.
[Update (04/19/2021)]: The test sets for ENG-HINGLISH, ENG-SPA, and SPANGLISH-ENG have been released! You can downlaod them from Lince now!
[Update (04/20/2021)]: The test sets for ENG-SPANGLISH, MSAEA-ENG, and ENG-MSAEA have been released!
[Update (04/24/2021)]: We will NOT evaluate ENG-SPA and MSAEA-SPA. Please just submit your results for above 5 tasks. Also, please *NOTE** that you need to make your results public before the deadline!
**[Update (04/26/2021)]**: The submission deadline is now postponed to **
*Apr 26th, 2021
!
[Update (04/29/2021)]: All the datasets for the CALCS 2021 shared tasks have been released. Please check them here.

Timeline
  • Shared Task training data release: Feb 26th
  • Shared Task test phase: April 1st - 7th
  • **Shared Task test phase: April 19th - 26th
  • Shared Task System description papers due: April 15th
  • **Shared Task System description papers due: April 30th
  • Shared Task reviews back to authors: April 22nd
  • **Shared Task reviews back to authors: May 8th
  • Shared Task Camera ready papers due: May 15th


Questions about the shared task can be sent to: calcsworkshops@gmail.com

Submission


  • Authors are invited to submit papers describing original, unpublished work in the topic areas listed above. Long papers can contain up to eight pages with unlimited number of pages for references, while short papers can include up to four pages of content and unltimited pages for references.
  • All submissions must be in PDF format and must comply with the official NAACL 2021 style guidelines: https://2021.naacl.org/calls/papers/#submission-types–requirements
The Review Process

The reviewing process will not be blind and papers can include the authors’ names and affiliations. Each submission will be reviewed by at least three members of the program committee. Accepted papers will be published in the workshop proceedings.

Multiple Submission Policy

Papers that have been or will be submitted to other meetings or publications are acceptable, but authors must indicate this information at submission time. If accepted, authors must notify the organizers before the camera-ready deadline as to whether the paper will be presented at the workshop or elsewhere.

Electronic Submission

The papers for both workshop and the shared tasks should be submitted electronically at softconf.

*NEW* Rising Stars Track *NEW*

We also invite non-archival one page abstracts of recently published work highlighting the CSW research by young researchers or early career investigators. The goal is to help increase the visibility of PhD students, Postdocs and early career investigators (loosely defined) working in the space of language technology for CSW. Please note that you should use the anonymized template for submission and you can use unlimited number of pages for references.

Program


CALCS @ NAACL 2021
Friday, June 11, 2021 (Mexico City Time, CDT, GMT-5)
09:00 - 09:05 Welcome Remarks
Victor Soto
09:05 - 09:30 Lighting Talks
09:30 - 09:40 Political Discourse Analysis: A Case Study of Code Mixing and Code Switching in Political Speeches
Dama Sravani, Lalitha Kameswari and Radhika Mamidi
09:40 - 09:50 Challenges and Limitations with the Metrics Measuring the Complexity of Code- Mixed Text
Vivek Srivastava and Mayank Singh
09:50 - 10:00 Translate and Classify: Improving Sequence Level Classification for English-Hindi Code-Mixed Data
Devansh Gautam, Kshitij Gupta and Manish Shrivastava
10:00 - 10:30 Break I
10:30 - 13:00 Morning Session II
10:30 - 10:40 Shared Task Overview
Mona Diab
10:40 - 10:50 Gated Convolutional Sequence to Sequence Based Learning for English-Hingilsh Code-Switched Machine Translation.
Suman Dowlagar and Radhika Mamidi
10:50 - 11:00 IITP-MT at CALCS2021: English to Hinglish Neural Machine Translation using Unsupervised Synthetic Code-Mixed Parallel Corpus
Ramakrishna Appicharla, Kamal Kumar Gupta, Asif Ekbal and Pushpak Bhat- tacharyya
11:00 - 11:10 Exploring Text-to-Text Transformers for English to Hinglish Machine Translation with Synthetic Code-Mixing
Ganesh Jawahar, El Moatez Billah Nagoudi, Muhammad Abdul-Mageed and Laks Lakshmanan, V.S.
11:10 - 11:20 CoMeT: Towards Code-Mixed Translation Using Parallel Monolingual Sentences
Devansh Gautam, Prashant Kodali, Kshitij Gupta, Anmol Goel, Manish Shrivastava and Ponnurangam Kumaraguru
11:20 - 11:30 Investigating Code-Mixed Modern Standard Arabic-Egyptian to English Machine Translation
El Moatez Billah Nagoudi, AbdelRahim Elmadany and Muhammad Abdul-Mageed
11:30 - 12:15 Invited Talk: Computational Structural Analysis of German-Turkish Code-Switching: Experiences and Insights
Özlem Çetinoğlu
12:15 - 13:00 Lunch Break
13:00 - 15:30 Afternoon Session I
13:00 - 13:10 Much Gracias: Semi-supervised Code-switch Detection for Spanish-English: How far can we get?
Dana-Maria Iliescu, Rasmus Grand, Sara Qirko and Rob van der Goot
13:10 - 13:20 A Language-aware Approach to Code-switched Morphological Tagging
Şaziye Betül Özateş and Özlem Çetinoğlu
13:20 - 13:30 Can You Traducir This? Machine Translation for Code-Switched Input
Jitao Xu and François Yvon
13:30 - 13:40 On the logistical difficulties and findings of Jopara Sentiment Analysis
Marvin Agüero-Torales, David Vilares and Antonio López-Herrera
13:40 - 15:00 Panel Discussion Moderated by Mona Diab
Panelists: Kalika Bali, Pushpak Bhattacharyya, Marina Fomicheva, Philipp Koehn, Holger Schwenk
15:00 - 15:30 Midday Short Break
15:30 - 16:45 Afternoon Session II
15:30 - 16:15 Invited Talk: Automatic Speech Recognition for Code-Switching Speech
Ngoc Thang Vu
16:15 - 16:45 Evening Break
16:45 - 18:15 Evening Session
16:45 - 16:55 Unsupervised Self-Training for Sentiment Analysis of Code-Switched Data
Akshat Gupta, Sargam Menghani, Sai Krishna Rallabandi and Alan W Black
16:55 - 17:05 CodemixedNLP: An Extensible and Open NLP Toolkit for Code-Mixing
Sai Muralidhar Jayanthi, Kavya Nerella, Khyathi Raghavi Chandu and Alan W Black
17:05 - 17:15 Normalization and Back-Transliteration for Code-Switched Data
Dwija Parikh and Thamar Solorio
17:15 - 17:25 Abusive content detection in transliterated Bengali-English social media corpus
Salim Sazzed
17:25 - 17:35 Developing ASR for Indonesian-English Bilingual Language Teaching
Zara Maxwelll-Smith and Ben Foley
17:35 - 17:45 Transliteration for Low-Resource Code-Switching Texts: Building an Automatic Cyrillic-to-Latin Converter for Tatar
Chihiro Taguchi, Yusuke Sakai and Taro Watanabe
17:45 - 17:55 Code-Mixing on Sesame Street: Dawn of the Adversarial Polyglots
Samson Tan and Shafiq Joty
17:55 - 18:05 Are Multilingual Models Effective in Code-Switching?
Genta Indra Winata, Samuel Cahyawijaya, Zihan Liu, Zhaojiang Lin, Andrea Madotto and Pascale Fung
18:05 - 18:15 Closing Remarks
Mona Diab

Invited Speakers


Ozlem Cetinoglu    University of Stuttgart
Ngoc Thang Vu    University of Stuttgart
Manish Shrivastava    International Institute of Information Technology Hyderabad

Program Committee


Gustavo Aguilar    University of Houston
Elena Álvarez Mellado    University of Southern California
Segun Aroyehun    Insituto Politécnico Nacional
Kalika Bali    Microsoft Research India
Astik Biswas    Oracle
Monojit Choudhury    Microsoft Research India
Amitava Das    Wipro AI Lab
Indranil Dutta    Jadavpur University
Alexander Gelbukh    Insituto Politécnico Nacional
Genta Indra Winata    Hong Kong University of Science and Technology
Sudipta Kar    Amazon
Grandee Lee    National University of Singapore
Els Lefever    Ghent University
Constantine Lignos    University of Pennsylvania
Yang Liu    Amazon
Manuel Mager    Universität Stuttgart
Parth Patwa    Indian Institute of Information Technology Sri City
Sai Krishna Rallabandi    Carnegie Mellon University
Yihong Theis    Kansas State University
Van Tung Pham    Nanyang Technological University
Khyathi Raghavi Chandu    Carnegie Mellon University
Seza Doğruöz    Ghent University

Organizers


Professor
Department of Computer Science
University of Houston
Ph.D. Student
Department of Computer Science
University of Houston
Professor
Department of Computer Science
Carnegie Mellon University
Research Scientist, Facebook AI
Professor, Department of Computer Science
George Washington University
Senior Researcher
MSR India
Applied Scientist
Amazon Alexa AI
Advanced Computer Scientist
SRI International
Research Fellow
Microsoft Research India