# Transformer Assignment - Instructions ## Setup 1. **Create a virtual environment** (recommended): ```bash python -m venv transformer_env source transformer_env/bin/activate # On Windows: transformer_env\Scripts\activate ``` 2. **Install dependencies**: ```bash pip install torch numpy matplotlib tqdm sacrebleu nltk bert_score ``` 3. **Work with files in the `starter_code/` directory**: - Complete the TODO sections in each file - follow the comments. - to run the code for different problems, change from transformer_model_p3 import Transformer to transformer_model_p3 or transformer_model_p4 and from baseline_models_p1 import create_rnn_model, create_lstm_model to baseline_models_p1 or baseline_models_p2 and in the evaluate.py, change greedy_decode_transformer instances to greedy_decode_transformer_manual for p4. - in the optional LSTM problem, if you don't want to do it, it suffices to just fill the RNN parts in the p2 file, and just train the RNN, ignoring the LSTM. 4. **Optional: load pretrained glove embedding vectors**: ## Running the Code In starter_code, run ### Train Individual Models ```bash # Train Transformer python train.py transformer #~10 minutes train given default configs # Train RNN python train.py rnn #~35 minutes train given default configs # Train LSTM python train.py lstm #~35 minutes train given default configs ``` ## Expected Output After training, you should see: - Training loss - Validation loss - Example outputs - BLEU score #ngram matching - BERT F1 score #more closely aligned with human evaluations Typical results (may vary): - RNN: outputs are nonsensical - LSTM: outputs are nonsensical - Transformer: you can tell it is learning ## Submission Submit: 1. Your completed code files 2. Training curves (PNG files) 3. A brief report with: - Model comparison analysis - BLEU scores - Any challenges faced ## Tips 1. **Start Small**: Use small hidden dimensions(128-256)/num attention(1-4) heads initially. 2. **Start Early**: It might take some time to wrap your head around these things. 3. **Monitor Training**: If it's not improving within 20 iterations there probably is a bug. 4. **Debug Shapes**: Print tensor shapes frequently during debugging 5. **Read online**: There is a lot of resources on youtube, medium, geeksforgeeks, etc. that explain these concepts ## Resources 1. https://medium.com/@yashwanths_29644/deep-learning-series-19-multi-head-attention-vs-self-attention-10c3e0da9925 2. https://youtu.be/eMlx5fFNoYc?si=awvkKXmyWuRuvRAB 3. https://youtu.be/eMlx5fFNoYc?si=ki98U2QCoI_MZT_u Good luck!