Landmark LLM Papers

Introduction A list of curated landmark papers in the field of LLMs. Foundational Efficient Estimation of Word Representations in Vector Space (Word2Vec) (2013) GloVe: Global Vectors for Word Representation (2014) Neural Machine Translation by Jointly Learning to Align and Translate (2014) Introduced the concept of attention Transformer Attention Is All You Need (2017) Introduced the Transformer architecture Self-Attention with Relative Position Representations (2018) BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (2018) Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation (2021) RoFormer: Enhanced Transformer with Rotary Position Embedding (2021) Large Language Models Universal Language Model Fine-tuning for Text Classification (ULMFiT)(2018) Improving Language Understanding by Generative Pre-Training (GPT-1)(2018) Language Models are Unsupervised Multitask Learners (GPT-2)(2019) Language Models are Few-Shot Learners (GPT-3) (2020) What Can Transformers Learn In-Context? A Case Study of Simple Function Classes (2021) GPT-4 Technical Report (2023) Alignment Deep reinforcement learning from human preferences (2017) Training language models to follow instructions with human feedback (2022) Constitutional AI: Harmlessness from AI Feedback (2022) Scaling Laws, Emergence Scaling Laws for Neural Language Models (2020) Training Compute-Optimal Large Language Models (2022) Emergent Abilities of Large Language Models (2022) Prompt / Context Engineering Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (2021) Efficient Transformers Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context (2019) Reformer: The Efficient Transformer (2019) Longformer: The Long-Document Transformer (2020) Generating Long Sequences with Sparse Transformers (2020) Big Bird: Transformers for Longer Sequences (2020) FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness (2022) Survey Papers A Survey of Transformers (2022) Efficient Transformers: A Survey (2020) A Survey of Large Language Models (2023) On the Opportunities and Risks of Foundation Models (2022) Pre-train, Prompt, and Predict: A Survey of Prompting Methods in NLP (2021) Speed Always Wins: A Survey on Efficient Architectures for Large Language Models (2025)

December 20, 2025 · 301 words · Anand Saha