Contents
1 Introduction 3
2 Pretraining 5
2.1 Pretraining Data ............................................. 5
2.2 Training Details ............................................. 5
2.3 Llama 2 Pretrained Model Evaluation ................................ 7
3 Fine-tuning 8
3.1 Supervised Fine-Tuning (SFT) ..................................... 9
3.2 Reinforcement Learning with Human Feedback (RLHF) ..................... 9
3.3 System Message for Multi-Turn Consistency ............................. 16
3.4 RLHF Results .............................................. 17
4 Safety 20
4.1 Safety in Pretraining .......................................... 20
4.2 Safety Fine-Tuning ........................................... 23
4.3 Red Teaming ............................................... 28
4.4 Safety Evaluation of Llama 2-Chat .................................. 29
5 Discussion 32
5.1 Learnings and Observations ...................................... 32
5.2 Limitations and Ethical Considerations ............................... 34
5.3 Responsible Release Strategy ..................................... 35
6 Related Work 35
7 Conclusion 36
A Appendix 46
A.1 Contributions .............................................. 46
A.2 Additional Details for Pretraining ................................... 47
A.3 Additional Details for Fine-tuning .................................. 51
A.4 Additional Details for Safety ...................................... 58
A.5 Data Annotation ............................................. 72
A.6 Dataset Contamination ......................................... 75
A.7 Model Card ............................................... 77
2