Introduction
A list of curated landmark papers in the field of LLMs.
Foundational
- Efficient Estimation of Word Representations in Vector Space (Word2Vec) (2013)
- GloVe: Global Vectors for Word Representation (2014)
- Neural Machine Translation by Jointly Learning to Align and Translate (2014)
- Introduced the concept of attention
Transformer
- Attention Is All You Need (2017)
- Introduced the Transformer architecture
- Self-Attention with Relative Position Representations (2018)
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (2018)
- Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation (2021)
- RoFormer: Enhanced Transformer with Rotary Position Embedding (2021)
Large Language Models
- Universal Language Model Fine-tuning for Text Classification (ULMFiT)(2018)
- Improving Language Understanding by Generative Pre-Training (GPT-1)(2018)
- Language Models are Unsupervised Multitask Learners (GPT-2)(2019)
- Language Models are Few-Shot Learners (GPT-3) (2020)
- What Can Transformers Learn In-Context? A Case Study of Simple Function Classes (2021)
- GPT-4 Technical Report (2023)
Alignment
- Deep reinforcement learning from human preferences (2017)
- Training language models to follow instructions with human feedback (2022)
- Constitutional AI: Harmlessness from AI Feedback (2022)
Scaling Laws, Emergence
- Scaling Laws for Neural Language Models (2020)
- Training Compute-Optimal Large Language Models (2022)
- Emergent Abilities of Large Language Models (2022)
Prompt / Context Engineering
Efficient Transformers
- Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context (2019)
- Reformer: The Efficient Transformer (2019)
- Longformer: The Long-Document Transformer (2020)
- Generating Long Sequences with Sparse Transformers (2020)
- Big Bird: Transformers for Longer Sequences (2020)
- FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness (2022)
Survey Papers
- A Survey of Transformers (2022)
- Efficient Transformers: A Survey (2020)
- A Survey of Large Language Models (2023)
- On the Opportunities and Risks of Foundation Models (2022)
- Pre-train, Prompt, and Predict: A Survey of Prompting Methods in NLP (2021)
- Speed Always Wins: A Survey on Efficient Architectures for Large Language Models (2025)