Glossary

A seed glossary of terms used in the book. Definitions are being filled in as the chapters mature; this page is currently closer to an index of the vocabulary you should be able to use comfortably by the end of the book than a finished reference.

If a term you saw in the text is missing here, that’s a bug, please file an issue.

Architectures and building blocks

MLP · CNN · RNN · LSTM · GRU · transformer · attention · self-attention · cross-attention · multi-head attention · residual stream · positional encoding · RoPE · KV cache · mixture of experts (MoE) · state space model (SSM) · S4 · Mamba · vision transformer (ViT) · normalization (BatchNorm, LayerNorm) · residual / skip connections.

Training

SGD · momentum · Adam · AdamW · weight decay · learning rate · batch size · dropout · regularization · cross-entropy · next-token prediction · perplexity · self-supervised learning · pretraining · fine-tuning · SFT · LoRA · distillation · context distillation · behavioral cloning · RLHF · RLAIF · DPO · PPO · GRPO · reward model · reward hacking.

Phenomena and dynamics

scaling laws · Chinchilla scaling · emergence · in-context learning (ICL) · grokking · double descent · lottery ticket · sharpness / flatness · catastrophic forgetting · model collapse · phase transitions in training · representation re-organization · hidden capabilities · swing-by dynamics.

Generative models

variational autoencoder (VAE) · generative adversarial network (GAN) · normalizing flow · energy-based model · diffusion model · DDPM · score matching · flow matching · score-based SDE · NeRF.

Reinforcement learning

Markov decision process (MDP) · policy · value function · Q-function · advantage · policy gradient · REINFORCE · actor-critic · on-policy · off-policy · offline RL · model-based RL · Dreamer · MuZero · credit assignment · sparse reward · exploration vs. exploitation · long horizon · intrinsic motivation · self-play · Nash equilibrium · multi-agent RL · evolutionary methods · quality-diversity (QD) · MAP-Elites · open-endedness.

Concepts, representations, interpretability

concept · concept space · compositional generalization · continual learning · world model · world representations · theory of mind · probes · steering vectors · dictionary learning · sparse autoencoder · CKA · t-SNE · UMAP · linear probing · activation patching.

Other

LLM · VLM · omni-model · agent · RAG · evals · tokenization · BPE · SentencePiece · vector quantizer · foundation model · alignment · calibration · inference-time compute · chain of thought (CoT) · System 1 / System 2 · knightian uncertainty · aleatory / epistemic uncertainty · active learning · EM algorithm · principal component analysis (PCA) · Mahalanobis distance · bitter lesson.