12 References
Ho, Jonathan, Ajay Jain, and Pieter Abbeel. 2020. “Denoising
Diffusion Probabilistic Models.” Advances in Neural
Information Processing Systems 33.
Hoffmann, Jordan, Sebastian Borgeaud, Arthur
Mensch, et al. 2022. “Training Compute-Optimal Large
Language Models.” arXiv Preprint arXiv:2203.15556.
Hu, Edward J, Yelong Shen, Phillip Wallis, et al. 2021. “LoRA:
Low-Rank Adaptation of Large Language Models.” arXiv Preprint
arXiv:2106.09685.
Kaplan, Jared, Sam McCandlish, Tom Henighan, et al. 2020. “Scaling
Laws for Neural Language Models.” arXiv Preprint
arXiv:2001.08361.
Ouyang, Long, Jeff Wu, Xu Jiang, et al.
2022. “Training Language Models to Follow Instructions with Human
Feedback.” Advances in Neural Information Processing
Systems 35.
Rafailov, Rafael, Archit Sharma, Eric Mitchell, Stefano Ermon,
Christopher D Manning, and Chelsea Finn. 2023. “Direct Preference
Optimization: Your Language Model Is Secretly a Reward Model.”
NeurIPS.
Schulman, John, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg
Klimov. 2017. “Proximal Policy Optimization Algorithms.”
arXiv Preprint arXiv:1707.06347.
Song, Yang, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar,
Stefano Ermon, and Ben Poole. 2021. “Score-Based Generative
Modeling Through Stochastic Differential Equations.”
ICLR.
Sutton, Richard. 2019. “The Bitter Lesson.” Incomplete
Ideas (Blog).
Vaswani, Ashish, Noam Shazeer, Niki Parmar, et al. 2017.
“Attention Is All You Need.” Advances in Neural
Information Processing Systems 30.