Feb 24, 2023 - 1 ' read

Let's build GPT: from scratch, in code, spelled out.

Let’s build GPT: from scratch, in code, spelled out. #

视频地址：https://www.youtube.com/watch?v=kCc8FmEb1nY

内容简介：这期主要关注Andrej实现「Self-attention」「Multihead self-attention」的过程，他以莎士比亚的剧本为训练数据写出了更多的仿莎士比亚剧本，比如：

The top in a world by susphoring grace. LUCIO: We muse hath resistes him so sovere: son’t his other wrough stands of coverent sh’d: he has here, and stand it and poor exceeder or a Henry’s last, stay not in faith, forewell’s base of graves, thanks, happy comparel, warmentfully: may as face by the courst, that strangth errise hath breathed. Hastings come to Valenting. HERMIONE: Well have been bolly poor late Is the lords.

视频的最后比对了他自己在两小时实现的tiny GPT与ChatGPT的区别（然而似乎他并不打算继续介绍fine-tune和PPO的部分，也许和他入职OpenAI有关）。视频里面的一些资料（原Youtube视频介绍中有提到）：

Google colab for the video: https://colab.research.google.com/dri…
GitHub repo for the video: https://github.com/karpathy/ng-video-…
Playlist of the whole Zero to Hero series so far: https://www.youtube.com/watch?v=VMj-3…
nanoGPT repo: https://github.com/karpathy/nanoGPT
my website: https://karpathy.ai
my twitter: https://twitter.com/karpathy
our Discord channel: https://discord.gg/3zy8kqD9Cp

Supplementary links:

Attention is All You Need paper: https://arxiv.org/abs/1706.03762
OpenAI GPT-3 paper: https://arxiv.org/abs/2005.14165
OpenAI ChatGPT blog post: https://openai.com/blog/chatgpt/
The GPU I’m training the model on is from Lambda GPU Cloud, I think the best and easiest way to spin up an on-demand GPU instance in the cloud that you can ssh to: https://lambdalabs.com . If you prefer to work in notebooks, I think the easiest path today is Google Colab.