NER Baseline
The baseline system for the CALCS NER shared task uses a recurrent neural network as follows:
Preprocessing
- Replace all URLs with the token <URL>
- Replace all the usernames with the token <USR>
Model architecture
- Randomly initialized embedding vectors of 200 dimensions.
- Forward LSTM wrapped with dropout technique (200 hidden units).
- Backward LSTM wrapped with dropout technique (200 hidden units).
- Concatenation of both LSTM directions.
- Softmax layer to output the label probabilities.
- Adam optimizer.
Settings and Experiments
- Learning rate: 0.01
- Learning rate decay:
2^0.5
(lr /= 1 + epoch * lrdecay
) - Epochs: 5
- Batch size: 64
- Dropout probability: 0.5