NER Baseline

The baseline system for the CALCS NER shared task uses a recurrent neural network as follows:

Preprocessing

Replace all URLs with the token <URL>
Replace all the usernames with the token <USR>

Model architecture

Randomly initialized embedding vectors of 200 dimensions.
Forward LSTM wrapped with dropout technique (200 hidden units).
Backward LSTM wrapped with dropout technique (200 hidden units).
Concatenation of both LSTM directions.
Softmax layer to output the label probabilities.
Adam optimizer.

Settings and Experiments

Learning rate: 0.01
Learning rate decay: 2^0.5 (lr /= 1 + epoch * lrdecay)
Epochs: 5
Batch size: 64
Dropout probability: 0.5