# Transformer Assignment - Instructions 

## Setup 

1. **Create a virtual environment** (recommended):
   ```bash
   python -m venv transformer_env
   source transformer_env/bin/activate  # On Windows: transformer_env\Scripts\activate
   ```

2. **Install dependencies**:
   ```bash
   pip install torch numpy matplotlib tqdm sacrebleu nltk bert_score
   ```

3. **Work with files in the `starter_code/` directory**:
   - Complete the TODO sections in each file
   - follow the comments. 
   - to run the code for different problems, change 
   from transformer_model_p3 import Transformer to transformer_model_p3 or transformer_model_p4
   and from baseline_models_p1 import create_rnn_model, create_lstm_model to baseline_models_p1 or baseline_models_p2
   and in the evaluate.py, change greedy_decode_transformer instances to greedy_decode_transformer_manual for p4. 
   - in the optional LSTM problem, if you don't want to do it, 
   it suffices to just fill the RNN parts in the p2 file, and just train the RNN, ignoring the LSTM. 

4. **Optional: load pretrained glove embedding vectors**:


## Running the Code
In starter_code, run 
### Train Individual Models
```bash
# Train Transformer
python train.py transformer #~10 minutes train given default configs

# Train RNN
python train.py rnn #~35 minutes train given default configs

# Train LSTM  
python train.py lstm #~35 minutes train given default configs
```
## Expected Output
After training, you should see:
- Training loss 
- Validation loss 
- Example outputs 
- BLEU score #ngram matching 
- BERT F1 score #more closely aligned with human evaluations

Typical results (may vary):
- RNN: outputs are nonsensical 
- LSTM: outputs are nonsensical  
- Transformer: you can tell it is learning 


## Submission
Submit:
1. Your completed code files
2. Training curves (PNG files)
3. A brief report with:
   - Model comparison analysis
   - BLEU scores
   - Any challenges faced

## Tips 
1. **Start Small**: Use small hidden dimensions(128-256)/num attention(1-4) heads initially. 
2. **Start Early**: It might take some time to wrap your head around these things. 
3. **Monitor Training**: If it's not improving within 20 iterations there probably is a bug. 
4. **Debug Shapes**: Print tensor shapes frequently during debugging
5. **Read online**: There is a lot of resources on youtube, medium, geeksforgeeks, etc. that explain these concepts 

## Resources 
1. https://medium.com/@yashwanths_29644/deep-learning-series-19-multi-head-attention-vs-self-attention-10c3e0da9925
2. https://youtu.be/eMlx5fFNoYc?si=awvkKXmyWuRuvRAB
3. https://youtu.be/eMlx5fFNoYc?si=ki98U2QCoI_MZT_u 


Good luck!