Display Bilingual:

Hey everyone. Um, I'm Jared Kaplan. I'm 00:00
going to talk briefly about scaling and 00:03
the road to human level AI, but my guess 00:06
is for this audience, a lot of these 00:08
ideas are pretty familiar, so I'll keep 00:10
it short and then we're going to do a 00:12
sort of fireside chat Q&A with uh with 00:14
Diana. I actually have only been working 00:16
on AI for about six years. I uh before 00:20
that had a long career, the vast 00:24
majority of my career as a theoretical 00:26
physicist. um working in academia. And 00:28
so uh how did I get to AI? Well, I I I 00:31
want to be brief. Why did I start in 00:34
physics? It was basically because my mom 00:36
was a science fiction writer and I 00:38
wanted to figure out if we could build a 00:41
faster than light drive and physics was 00:44
the way to do that. Um I also was very 00:46
excited about just understanding the 00:49
universe. How do things work? How do the 00:52
biggest trends that underly sort of 00:54
everything that we see around us, where 00:58
does that all come from? For example, is 01:00
the universe deterministic? Do we have 01:02
free will? I was very, very interested 01:04
in all of those questions. But 01:05
fortunately, along the way, uh during my 01:07
career as a physicist, I met a lot of 01:10
very, very interesting, very deep 01:13
people, including many of the uh 01:15
founders of Anthropic that I now work 01:17
with all of the time. And uh I was 01:20
really interested in what they were 01:23
doing and I kept track of it. And as I 01:24
moved from different uh among different 01:27
subject areas in physics from large 01:30
hadron collider physics, particle 01:33
physics, cosmology, string theory, um 01:34
and on I got a little bit frustrated, a 01:38
little bit bored. I didn't feel like we 01:41
were making progress quickly enough. And 01:42
a lot of my friends were telling me that 01:44
AI was becoming a really big deal. Um 01:46
and I didn't believe them. I was really 01:49
skeptical. I thought, well, AI, people 01:51
have been working on it for 50 years. 01:53
SVMs aren't that exciting. Um, that was 01:55
all we knew about back in 2005, 2009 01:58
when I was in school. But I got 02:01
convinced that that maybe AI would be an 02:03
exciting field to work on. Um, and I I 02:05
got very lucky to know the right people 02:08
and the rest is history. So uh I'm going 02:10
to talk a little bit about how our 02:13
contemporary AI models work and how 02:16
scaling is leading them to get better 02:18
and better. So there are really two 02:21
fundamental phases to the training of 02:24
contemporary AI models like claude 02:27
chatgpt 02:30
etc. The first phase is pre-training and 02:32
that's where we train AI models to 02:37
imitate human written data, human 02:40
written text and understand the 02:42
correlations underlying that data. And 02:45
these these figures are very very retro. 02:47
This is actually from the playground of 02:50
the original GPD3 model. And you can see 02:52
that as a speaker at a journal club, 02:55
you're probably elephant me to say 02:58
certain things. is the word elephant in 03:00
that sentence is really really unlikely. 03:01
What pre-training does is teach models 03:05
what words are likely to follow other 03:08
words in large corporate of text and now 03:10
with contemporary models multimodal 03:14
data. The second phase of training for 03:16
contemporary AI models is reinforcement 03:19
learning. This is another very retro 03:22
slide. Um it shows the original 03:24
interface we used for sort of claude 03:27
zero or claude negative one uh back in 03:29
the ancient days of 2022 03:32
when we were collecting feedback data. 03:35
And what you see here is basically the 03:39
interface for having a conversation with 03:41
very very early versions of Claude and 03:44
picking which response from Claude was 03:47
better according to you, according to 03:51
crowdworkers, etc. And using that 03:54
signal, we optimize, we reinforce the 03:57
behaviors that are chosen to be good, 04:00
that are chosen to be helpful, honest, 04:04
and harmless. And we discourage the 04:05
behaviors that are bad. So really all 04:07
there is to training these models is 04:10
learning to predict the next word and 04:12
then doing reinforcement learning to 04:15
learn to do useful tasks. And it turns 04:17
out that there are scaling laws for both 04:19
of these phases of training. So this is 04:22
a a figure that that we made five or six 04:26
years ago now and it shows how as you 04:29
scale up the pre-training phase of AI, 04:32
you predictably get better and better 04:35
performance for our models. And this was 04:38
something that came about because I was 04:41
just sort of asking the dumbest possible 04:43
question. As a physicist, that's what 04:45
you're trained to do. You sort of look 04:47
at the big picture and you ask really 04:48
dumb things. I'd heard it was very 04:50
popular in the 2010s to say that big 04:53
data was important and so I just wanted 04:55
to know how big should the data be? How 04:58
important is it? How much does it help? 05:02
Similarly, a lot of people were noticing 05:04
that larger AI models performed better. 05:06
And so we just asked the question, how 05:09
much better do these models perform? And 05:11
we got really lucky. We found that 05:14
there's actually something very very 05:16
very precise and surprising underlying 05:18
AI training. This really blew us away 05:21
that there are these nice trends that 05:23
are as precise as anything that you see 05:25
in physics or or astronomy. And these 05:27
gave us a lot of conviction to believe 05:30
that AI was just going to keep getting 05:34
smarter and smarter in a very 05:36
predictable way. Because as you can see 05:38
in these figures already back in 2019, 05:40
we were looking across many many many 05:44
orders of magnitude in compute, in data 05:47
set size, in neural network size. And so 05:50
we expected once you see something is 05:54
true over many many many orders of 05:56
magnitude you expect it's probably going 05:58
to continue to be true for a long time 06:00
further. So this has sort of been one of 06:01
the fundamental things that I think 06:04
underlies uh uh improvements in in AI. 06:05
The other is actually also something 06:09
that started to appear quite a long time 06:11
ago although it's become really really 06:13
impactful uh in the last couple of years 06:15
is that you can see scaling laws in the 06:18
reinforcement learning phase of AI 06:20
training. So uh a researcher about four 06:23
years ago decided to study scaling laws 06:27
for Alph Go. Basically putting together 06:31
two very very high-profile AI successes, 06:33
GPD3 and scaling for pre-training and 06:36
AlphaGo. This was just a researcher uh 06:39
Andy Jones working on his own uh with 06:42
like his own I think maybe single GPU 06:45
back in these sort of ancient days. And 06:48
so he couldn't study AlphaGo, that was 06:50
expensive, but he could study a simpler 06:52
game called Hex. So he made this plot 06:53
that you see here. Now, ELO scores, I 06:56
think, weren't as as as well known um 07:00
back then, but all EOS ELO scores are, 07:02
of course, is chess ratings. They 07:05
basically describe how likely it is for 07:07
one player to beat another in a game of 07:10
chess. They're used now to benchmark AI 07:13
models to see sort of how often does a 07:16
human prefer one AI model to another. 07:18
But but back then this is just sort of 07:20
the classic application of ELO scores as 07:22
as chess ratings. And he looked at as 07:24
you train different models to play this 07:27
game of hex, which is a very simple 07:30
board game, a bit simpler than than Go, 07:33
how do they do? And he saw these 07:36
remarkable straight lines. So it's sort 07:37
of a skill in science to notice very 07:40
very simple trends and and this was one 07:43
I think it went unnoticed. I think 07:45
people didn't focus on this this sort of 07:48
kind of scaling behavior in RL soon 07:50
enough but but eventually it came to 07:52
pass. So we see that basically you can 07:53
scale up the compute in both 07:56
pre-training and RL and get better and 07:57
better performance. And I think that's 08:00
sort of the fundamental thing that is 08:01
driving AI progress. It's not that AI 08:03
researchers are really smart or they 08:07
suddenly got smart. It's that we found a 08:08
very very simple way of making AI better 08:12
systematically and and we're we're 08:16
turning that crank. So what kinds of 08:18
capabilities is this unlocking? I tend 08:20
to think of AI capabilities on two axes. 08:22
I think the less interesting axis, but 08:25
it's still very important is basically 08:27
the the flexibility of AI, the ability 08:30
of AI to meet us where we are. So if you 08:34
put say Alph Go on this figure, it would 08:38
be very very far below the X-axis 08:42
because although Alph Go was super 08:44
intelligent, it was better than any Go 08:46
player at playing Go, it was uh only 08:49
able to operate in the universe of a Go 08:53
board. But we've made steady progress 08:55
since the advent of large language 08:58
models making uh AI that can deal with 09:00
many many many all of the modalities 09:05
that that people can deal with. We don't 09:08
have AI models I think that uh that have 09:09
a sense of smell. Um but that's that's 09:11
probably coming. And so as you go up the 09:14
y- axis here you get to AI systems that 09:16
can do more and more relevant things in 09:19
in the world. I think the more 09:22
interesting axis though is sort of the 09:23
the x-axis here which is how long it 09:25
would take a person to do to do the 09:28
kinds of tasks that AI models can do and 09:30
that's something that has been 09:33
increasing steadily as we increase the 09:34
capability of AI. This is sort of the 09:37
time horizon for for tasks and um an 09:38
organization meter studied this very 09:42
systematically and found yet another 09:44
scaling trend. They found that if you 09:46
look at uh the length of tasks that AI 09:49
models can do, it's doubling roughly 09:52
every 7 months. And so what this means 09:55
is that the increasing intelligence that 09:58
is being baked into AI by scaling 10:02
compute for pre-training and RL is 10:04
leading to predictable useful 10:07
tasks that the AI models uh can can do, 10:10
including longer and longer horizon 10:14
tasks. And so you can sort of speculate 10:15
about where this is heading. And in AI 10:17
2027 folks did. And this kind of picture 10:20
suggests that over the next few years we 10:24
may reach a point where AI models um can 10:27
do tasks that don't just take us minutes 10:30
or hours but days, weeks, months, years 10:32
etc. Eventually, we imagine AI models or 10:36
or millions of AI models perhaps working 10:39
together will be able to do the work 10:42
that whole human organizations can do. 10:44
They'll be able to do the kind of work 10:46
that the entire scientific community 10:48
currently does. Um, one of the nice 10:50
things about math or theoretical physics 10:52
is that you can make progress just by by 10:54
thinking. Um and so you can imagine AI 10:57
systems working together to make the 11:00
kind of progress that the theoretical 11:02
physics community makes in in say 50 11:04
years in a matter of days, weeks etc. So 11:06
what is left if if this sort of picture 11:11
of scaling can take us very far? What is 11:13
left? I think that what may be left in 11:15
order to unlock um kind of human level 11:18
AI broadly construed is relatively 11:21
simple. One of the most important 11:24
ingredients I think is relevant 11:25
organizational knowledge. So we need to 11:28
train AI models that don't just greet 11:30
you with a blank slate but can learn to 11:33
work within companies, organizations, 11:37
governments as though they have the kind 11:39
of context that someone who's been 11:42
working there for years has. So I think 11:44
AI models need to be able to work with 11:47
knowledge. They also need memory. What 11:48
is memory if not knowledge? I 11:51
distinguish it in the sense that as you 11:53
do a task that takes you a very very 11:56
long time, you need to keep track of 11:59
your progress on that specific task, you 12:01
need to build relevant memories and you 12:03
you need to be able to use them. And 12:05
that's something that we've uh we've 12:06
begun to build into into Claude 4 and I 12:08
think will become increasingly 12:11
important. A third ingredient that I 12:12
think that we need to get better at and 12:14
and we're making progress on is 12:16
oversight. the ability of AI models to 12:19
understand sort of fine grained nuances 12:24
to solve hard fuzzy tasks. So it's easy 12:26
right now and you see an explosion of 12:30
progress for us to train AI models that 12:32
can say write code that passes tests or 12:34
that answer math questions correctly 12:37
because it's very crisp what's correct 12:39
and what's incorrect. So it's very easy 12:42
to apply reinforcement learning to make 12:44
AI models uh do better and better at 12:47
those kinds of tasks. But what we need 12:49
and are developing are AI models that 12:52
help us to generate much more nuanced 12:54
reward signals so that we can leverage 12:57
reinforcement learning to do to do 13:01
things like tell good jokes, write good 13:04
poems, um and have good taste in in 13:06
research. The other ingredients that we 13:11
need, I think, are are are simpler. We 13:13
obviously need to be able to train AI 13:15
models to do more and more complex 13:16
tasks. We need to work our way up the 13:18
y-axis from text models to multimodal 13:22
models to robotics. Um, and I expect 13:24
that over the next few years, we'll see 13:27
increasing uh continued gains from scale 13:29
when applied applied to these these 13:33
different domains. 13:36
And so how should we sort of prepare for 13:38
this this future these possibilities? I 13:41
think there are a few a few things that 13:44
I always recommend. One is I think it's 13:46
really a good idea to build things that 13:49
don't quite work yet. This is probably 13:53
always a good idea. We always want to 13:55
have ambition, but I think specifically 13:56
AI models right now are getting better 13:59
very very quickly. And I think that's 14:01
going to continue. That means that if 14:03
you build uh a product that doesn't 14:04
quite work because Claude 4 is still a 14:07
little bit too dumb, um you could expect 14:09
that there'll be a Claude 5 coming that 14:11
will make that make that product work 14:14
and deliver a lot of value. So I think 14:16
that's that's something that I always 14:18
recommend is sort of experiment on the 14:19
boundaries of what AI can do because 14:21
those boundaries are moving rapidly. The 14:23
next point I think is that AI is going 14:25
to be helpful for integrating AI. I 14:28
think that one of the main bottlenecks 14:31
for AI is really just that it's 14:33
developing so quickly that we haven't 14:36
had time to integrate it into 14:38
products, companies, other thing 14:41
everything else that we we we do into 14:44
into science. Um, and so I think that in 14:46
order to sort of speed that process up, 14:49
I think leveraging AI for AI integration 14:51
is going to be is going to be very 14:53
valuable. And then finally, I mean, I 14:54
think this is sort of obvious for for 14:56
this crowd, but I think figuring out 14:58
where adoption of AI could happen very 14:59
very quickly is is key. Um, we're seeing 15:02
uh an explosion of AI integration for 15:07
coding. And there are a lot of reasons 15:10
why software engineering is a great 15:12
place for AI, but I think the big 15:14
question is sort of what's next? Um, 15:16
what beyond software engineering can 15:19
grow that that quickly? I don't know the 15:21
answer, of course. Um, but hopefully you 15:23
guys will figure it out. So that's it 15:26
for for for the talk. Um, I want to 15:28
invite Diana on stage for uh for a chat. 15:30
YC's next batch is now taking 15:34
applications. Got a startup in you? 15:36
Apply at y combinator.com/apply. 15:38
It's never too early and filling out the 15:41
app will level up your idea. Okay, back 15:44
to the video. That was a awesome talk 15:46
about all the scaling laws and recently 15:50
Anthropic just launched clot 4 which is 15:53
just available. Curious uh how does it 15:57
change what is possible as all these 16:01
model releases keep compounding for the 16:04
next 12 months? 16:07
I think that uh we'll be in trouble if 16:09
it's 12 months before before an even 16:11
better model comes out. But uh I guess 16:14
uh a few things with with Cloud 4. I 16:17
think that with Cloud 3.7 Sonnet 16:19
uh it was already really exciting to use 16:22
3.7 for coding. But I think something 16:25
that everyone noticed was that 3.7 was a 16:28
little bit too eager. Um sometimes it 16:32
just really wanted to make your tests 16:35
pass. Um and it would do things that 16:37
that you you don't really want. Uh there 16:39
are a lot of like try excepts things 16:41
like that. Um, so with Cloud 4, I think 16:43
that we've been able to improve the 16:46
model's ability to act as an agent 16:49
specifically for coding, but but in a 16:52
lot of other ways for search, for all 16:53
kinds of other applications. Um, but 16:55
also improve its supervision, the sort 16:57
of oversight that I I I mentioned in my 17:01
talk, so that it uh it follows your 17:03
directions and hopefully improves in in 17:06
code quality. I think the other thing 17:09
that we've worked on is improving its 17:10
ability to uh save and store memories 17:12
and we hope to see people leveraging 17:15
that because Claude 4 can blow through 17:17
its context window with a very complex 17:19
task but can also uh store memories as 17:21
files or records, retrieve them in order 17:24
to sort of keep doing work across many 17:27
many many context windows. But I guess 17:29
finally I think the picture that scaling 17:31
laws paint is one of incremental 17:33
progress. And so I think that what 17:35
you'll see with Claude is that steadily 17:37
it gets better in lots of different ways 17:40
with each release. Um but I think that 17:43
scaling really suggests a kind of smooth 17:45
curve towards what I expect is kind of 17:49
human level AI or AGI. 17:52
Is there some special feature that a lot 17:54
of the audience here are going to get 17:57
excited? some some beta that you can 17:59
some alpha leak you can give everyone on 18:02
what you think people are going to fall 18:05
in love with the new APIs. 18:07
I think the thing that I I'm most 18:09
excited about is sort of uh memory 18:11
unlocking longer and longer horizon 18:14
tasks. I think that like as as time goes 18:16
on we're going to see Claude as a 18:19
collaborator that can sort of take on 18:22
larger and larger chunks of work. This 18:23
is to your point of all these future 18:25
models being able to take bigger and 18:27
bigger tasks right now. At this point, 18:29
they're able to do tasks in the hours. 18:31
Yeah, I think so. I think it's a very 18:35
imprecise measure, but I think that 18:38
right now if you look at sort of 18:40
software engineering tasks, I think 18:42
meter literally benchmarked how long it 18:43
would take people to do various tasks 18:45
and uh and yeah, I think it's a time 18:47
scale of of hours. I think just gen like 18:50
broadly as people work with AI, 18:52
I think that the people who are skeptics 18:55
of AI will say correctly that AI makes 18:57
lots of stupid mistakes. Um, it can do 19:00
things that are absolutely brilliant and 19:03
and surprise you, but it can also make 19:05
uh make basic errors. I think one of the 19:07
sort of basic features of of AI that's 19:09
different about the shape of AI 19:12
intelligence compared to human 19:14
intelligence is that there are a lot of 19:15
things that I can't do but I can at 19:17
least judge whether they were done 19:19
correctly. I think for AI the judgment 19:21
versus the generative capability is much 19:24
closer which means that I think that uh 19:27
a major role people can play in 19:29
interacting with AI is kind of as 19:31
managers to sort of sanity check uh 19:33
sanity check the the work 19:36
which is fascinating because one of the 19:37
things we observe through the batches in 19:39
YC last year a lot of companies when 19:41
they were out and selling products they 19:45
were selling it more still as a co-pilot 19:47
where you would have a co-pilot let's 19:50
say for customer support where you still 19:52
need the last human approval before they 19:54
would send the reply for a customer but 19:56
one thing that has changed just in the 19:59
spring batch I think a lot of the AI 20:01
models are very capable to do task end 20:04
to end to your point that which is uh 20:07
remarkable founders are selling now 20:09
directly replacements of full workflows 20:12
how have you seen this translate to what 20:17
you hope the audience will build. 20:19
I think there are a lot of 20:22
possibilities. Basically, it's a 20:23
question of 20:25
what level of success or performance is 20:27
is acceptable. There are some tasks 20:31
where getting it sort of 70% right is is 20:33
good enough and others where you need 20:36
99.9% to to deploy. I think that 20:37
honestly I think it's probably a lot 20:41
more fun to build for use cases where uh 20:43
70 80% is good enough because then you 20:46
can really get to the frontier of what 20:50
AI is capable of. But I think that we're 20:51
sort of pushing up the the reliability 20:54
as well. So I think that uh we will see 20:59
more and more of these tasks. I think 21:01
that uh right now human AI collaboration 21:03
is is going to be the sort of most 21:06
interesting place because I think that 21:09
for the most advanced tasks you're 21:11
really going to need humans in the loop. 21:13
But I do think in the longer term there 21:14
will be more and more tasks that can be 21:16
fully automated. 21:17
Can you say more about what you think 21:18
the world is going to look like with 21:21
this human to AI loop collaboration? 21:22
because there's the essay from Dario 21:25
with machines of love and grace that he 21:28
paints this picture that's very 21:31
optimistic and what are the details of 21:33
how we get there with with this book? 21:35
I think that we already see some of some 21:38
of that happening. So at least when I 21:41
talk to folks who work in say biomedical 21:43
research um with the right sort of 21:46
orchestration I think it's possible to 21:49
take frontier AI models now and produce 21:51
interesting valuable insights for say 21:57
drug discovery. Um so I think that's 22:00
already starting to happen. I guess an 22:02
aspect of it that that I think about is 22:05
that like there there's sort of 22:07
intelligence that requires a lot of 22:10
depth um and and intelligence that 22:11
requires a lot of breadth. So for 22:15
example in math you can sort of work on 22:16
trying to prove one theorem for a decade 22:19
like the threemon hypothesis or firmat's 22:22
last theorem. Um I think that's that's 22:24
sort of solving one very specific very 22:26
hard problem. I think there's a lot of 22:28
areas of science, probably more so in 22:31
biology, maybe interestingly in 22:33
psychology or or history, where putting 22:35
together a very very large number of 22:39
pieces of information um across many 22:43
many different areas is kind of where 22:46
it's at. And I think that AI models 22:48
during the pre-training phase kind of 22:51
embibe all of human civilization's 22:53
knowledge. And so I suspect that there's 22:56
a lot of uh fruit to be picked in using 22:58
that sort of feature of AI that it knows 23:03
much much more than any one human expert 23:05
and therefore you can kind of elicit um 23:08
insights putting together many different 23:12
uh many different areas of expertise say 23:14
across biology for for for research. So 23:17
I think that um we're making a lot of 23:19
progress on making AI better at deeper 23:22
tasks like hard coding problems, hard 23:25
math problems, but I suspect that 23:27
there's a particular overhang in areas 23:28
where putting together knowledge that 23:31
maybe no one human expert would have 23:34
where that kind of intelligence is is is 23:36
very useful. So I think that's something 23:39
that I' I'd expect to see more of. Um is 23:41
sort of leveraging AI's sort of breadth 23:44
of knowledge. In terms of how exactly it 23:46
will roll out, I really don't know. It's 23:49
really really hard to predict the 23:51
future. Scaling laws give you one way of 23:52
predicting the future which says this 23:56
trend is going to continue. I think a 23:58
lot of trends that we see 24:00
over the long haul I expect will 24:03
continue. I mean the economy, the GDP, 24:05
uh the these kinds of trends are really 24:09
reliable indicators of the future. But I 24:11
think in terms of in detail how will 24:13
things be implemented, I think it's 24:15
really really hard to say. 24:16
Are there specific areas that you think 24:17
a lot more builders could go into and 24:20
build with these new models? I mean 24:23
there's a lot that has been done let's 24:25
say for coding tasks but what are some 24:27
tasks that have a lot more green field 24:30
that are just getting unlocked right now 24:32
with the current models 24:34
I come from a research background rather 24:36
than uh rather than business so I don't 24:38
I don't know that I have anything very 24:41
uh very deep to say but I think that 24:43
like in general any place where um it 24:44
requires a lot of skill um and it's a 24:49
task that mostly involves sort 24:52
sitting in front of a computer 24:55
interacting with data. I think finance 24:56
uh people who use Excel spreadsheets a 24:59
lot. Um I think I I expect law although 25:01
maybe maybe maybe law uh is is is more 25:06
regulated requires more uh more more 25:09
expertise um as a stamp of approval. But 25:12
I think all of these areas are probably 25:14
green field. I think another that that I 25:16
sort of mentioned is how do we integrate 25:19
AI into existing businesses? I think 25:23
that like when electricity came along, 25:26
there was some long adoption cycle and 25:28
the very first simplest ways of say 25:31
using electricity weren't necessarily uh 25:33
the best. You wanted to not just replace 25:36
a steam engine with an electric motor. 25:38
You wanted to sort of remake the way 25:41
that factories work. And I think that 25:43
probably leveraging AI to integrate AI 25:44
into parts of the economy um as quickly 25:48
as possible. I expect there's just a lot 25:51
of a lot of leverage there. 25:53
Now other question is you have a 25:54
extensive training as a physicist and 25:56
you were one of the first to really 25:59
observe this trend with scaling laws and 26:01
it probably comes from being a physicist 26:04
and seeing all these exponentials that 26:06
happen naturally in nature. How has that 26:09
training come about with uh being able 26:14
to perform like the best research in the 26:17
world with with with with AI? 26:20
I think the thing that was useful from a 26:22
physics point of view is looking for the 26:24
biggest picture, most macro trends and 26:28
then trying to make them as precise as 26:31
possible. So I remember meeting like 26:33
kind of brilliant AI researchers who 26:36
would say things like learning is 26:38
converging exponentially 26:41
and I would just ask really dumb 26:43
questions like are you sure it's an 26:45
exponential? Could it just be a power 26:47
law? Is it quadratic? Like like exactly 26:49
how is this thing converging? And it's a 26:52
really dumb kind of simple question to 26:55
ask, but basically I think there was a 26:57
lot of fruit to be picked and and 26:59
probably still is in trying to make the 27:01
big trends that you see as precise as 27:04
possible because that I don't know it 27:06
gives you a lot of tools. It allows you 27:08
to ask like what does it really mean to 27:09
move the needle? I think with scaling 27:11
laws, the the holy grail is finding a 27:13
better slope to the scaling law because 27:17
that means that as you put in more 27:19
compute, you're going to get a bigger 27:21
and bigger advantage over other AI 27:24
developers. Um, but until you've sort of 27:27
made precise what the trend is that you 27:30
see, you sort of don't know exactly what 27:32
it means to beat it and and how much you 27:35
can beat it by and how to know 27:37
systematically whether you're you're 27:39
you're achieving that end. So, I think 27:41
those were kind of the tools that that I 27:43
think I used. It wasn't necessarily like 27:45
literally applying say quantum field 27:47
theory to AI. I think that's uh that's a 27:50
little bit too specific. Well, are there 27:52
specific uh physics heruristics like 27:54
reormalization, symmetry that came in 27:57
very handy to really keep observing this 27:59
trend or or measuring it? 28:03
Something that you'll observe if you 28:05
look at AI models is that they're big. 28:06
Neural networks are big. They have 28:09
billions now trillions of parameters. 28:10
That means that they're made out of big 28:12
matrices. and basically studying uh 28:15
approximations 28:19
where you 28:21
take the limit that neural networks are 28:23
very big and specifically that the uh 28:25
matrices that compose neural networks 28:28
are big. That's actually been kind of 28:29
useful and that's something that 28:31
actually was a well-known approximation 28:32
in in physics um and and in math. Um 28:34
that's something that's been applied. 28:37
But I think generally it's really asking 28:39
very naive dumb questions that gets you 28:41
very far. I think AI is really in a 28:43
certain sense only like maybe 101 15 28:45
years old in terms of the current 28:48
incarnation of how we're training AI 28:50
models. That means that it's an 28:52
incredibly new field. A lot of the most 28:53
basic questions haven't been answered 28:56
like questions of interpretability, how 28:58
AI models really work. And so I think 29:01
there's there's really a lot to uh to 29:03
learn at that level rather than applying 29:06
very very fancy techniques. Are there 29:09
specific tools in physics that you apply 29:11
for interpretability? 29:14
I would say that interpretability is a 29:15
lot more like biology. It's a lot more 29:17
like neuroscience. So I think those are 29:19
kind of the tools. Um there there is 29:21
there is some more more more mathematics 29:23
there. But I I think it's more like 29:26
trying to understand the features of the 29:28
brain. Um the benefit that you get with 29:30
AI over neuroscience is that um you can 29:33
really measure everything in AI. You 29:36
can't measure the the activity of every 29:38
neuron, every syninnapse in a brain, but 29:41
you can do that in AI. So there's much 29:43
much much more data for reverse 29:45
engineering how AI models work. 29:48
Now when aspect about scaling laws, 29:50
they've held for over five orders of 29:52
magnitude, which is wild. This is a bit 29:56
of a contrarian question, but what 29:58
empirical sign would convince you that 30:01
the curve are changing that maybe we're 30:05
getting off the curve? 30:07
I think it's a really I think it's a 30:09
really hard question, right? Because I 30:10
mostly use scaling laws to diagnose 30:12
whether AI training is broken or not. 30:14
Mh. 30:16
So I think that uh once you see 30:16
something and you find it very it's a 30:20
very compelling trend, it becomes very 30:21
very interesting to examine 30:24
where it's failing. But I think that my 30:27
first inclination is to think if scaling 30:29
laws are failing, it's because we've 30:32
screwed up AI training in some way. 30:34
Maybe we got uh we got the architecture 30:36
of the neural network wrong or there's 30:39
some bottleneck in training that we 30:42
don't see or there's some problem with 30:43
precision in the algorithms that we're 30:45
using. So I think it would take a lot to 30:47
convince me at least that scaling was 30:51
really no longer working at the level of 30:54
the sort of these empirical laws because 30:55
so many times in my experience over the 30:57
last 5 years when it seemed like scaling 31:00
was broken it was because we were doing 31:01
it wrong. 31:03
Interesting. So I guess going into 31:04
something very specific that goes hand 31:06
in hand is a lot of the compute power 31:08
required to go keep going on this curve. 31:10
What happens uh as compute becomes more 31:14
more scarce how far down do you go into 31:17
the precision ladder like do you explore 31:21
things like FP4 do you explore things 31:23
like turnary representations what what 31:26
are your thoughts around that? Yeah, I 31:28
mean I think that um right now AI is 31:30
really inefficient because there's a lot 31:34
of value in AI. So um there's a lot of 31:37
value in unlocking the most capable 31:39
frontier model. Um and so companies like 31:44
Anthropic and others are moving as 31:47
quickly as we can to both make AI 31:49
training more efficient and AI inference 31:52
more efficient as well as unlocking 31:54
frontier capabilities. But a lot of the 31:56
focus really is on uh unlocking the 31:58
frontier. I think that over time as AI 32:00
becomes more and more widespread, I 32:05
think that we're going to really drive 32:08
down the cost of inference and training 32:10
dramatically from where we are right 32:13
now. I mean right now we're seeing sort 32:15
of 3x to 10x gains algorithmically and 32:17
in sort of scaling up compute um and in 32:22
uh inference efficiency per year. I 32:25
guess like the joke is that we're going 32:29
to get computers back into binary. So I 32:31
think that we will see much much lower 32:33
precision as one of the many avenues to 32:36
make inference more efficient over time. 32:38
But sort of we h we're very very very 32:41
out of equilibrium with AI development 32:43
right now. AI is improving very rapidly. 32:45
Things are changing very rapidly. We 32:47
haven't fully realized the potential of 32:49
current models, but we're unlocking more 32:52
and more capabilities. So I think that 32:54
what the equilibrium situation looks 32:56
like where AI isn't changing that 32:58
quickly, I think is one where AI is 33:01
extremely inexpensive, but it's sort of 33:03
hard to know if we're even going to get 33:05
there. like AI may just keep getting 33:07
better so quickly that uh sort of 33:09
improvements in int intelligence unlock 33:11
so much more and so we may continue to 33:13
focus on that rather than say getting 33:15
precision down to FP2 33:18
which is very much uh the Jebans paradox 33:21
as intelligence becomes better and 33:24
better people are going to want it more 33:26
not that is driving the cost down which 33:29
is this irony right 33:31
yeah absolutely I mean I think that uh 33:33
yeah that's that's certainly certainly 33:35
something that we've seen that there are 33:36
certain uh certain points where AI 33:38
becomes accessible enough. That said, um 33:41
I think as AI systems become more and 33:45
more capable um and can do more and more 33:48
of the work that that we do, it's going 33:51
to be worth it to pay for uh frontier 33:53
capabilities. I think it's a question 33:55
that I've always had and can have is 33:57
kind of like is all of the value at the 34:00
frontier or is there a lot of value with 34:02
kind of cheaper systems that aren't 34:05
quite as capable? And I think the sort 34:07
of time horizon picture is maybe one way 34:09
of thinking about this. I think that you 34:11
can do a lot of very simple bite-sized 34:14
tasks, but I think it's just much more 34:15
convenient to be able to use an AI model 34:18
that can do a very complex task end to 34:21
end rather than requiring us as humans 34:23
to sort of orchestrate a much dumber 34:26
model to break the task down into very 34:28
very small slices and put them together. 34:30
So, I do kind of expect that a lot of 34:32
the value is going to come from the most 34:33
capable models, but I might be wrong. It 34:35
it might depend and it might really 34:38
depend on the capabilities of AI 34:40
integrators to sort of leverage AI 34:43
really efficiently. 34:44
What advice would you give this audience 34:45
which there everyone is early in the 34:48
career with lots of potential in terms 34:50
of how do you stay relevant in the 34:52
future where all these models are going 34:55
to become so awesome. What should 34:57
everyone be really good at and study and 34:59
to still do really good work? I think as 35:03
I mentioned there's a lot of value in 35:06
understanding how these models work and 35:09
being able to really efficiently 35:12
leverage them and and integrate them and 35:13
I think there's a lot of value in kind 35:15
of like building building at the 35:17
frontier. Um I don't know we could turn 35:19
it over to the audience for for 35:21
questions. 35:23
Let's turn it out to the audience for 35:23
some questions. 35:24
I had a quick question on the scaling 35:26
loss. You show that a lot of the scaling 35:27
laws are like linear that like the more 35:30
we have exponential compute going up but 35:32
then like we have linear progress in uh 35:34
in the scaling loss but then on your 35:36
last slide you show that you expect then 35:38
suddenly like an exponential growth in 35:40
like how much time we save. I want to 35:42
ask you like why do you think that 35:45
suddenly on this chart we're exponential 35:46
and not linear anymore? 35:48
Thank you. 35:50
Yeah, this is a really good question and 35:52
I don't know. Um I mean the meter 35:53
finding was kind of an empirical 35:56
finding. Um the way that I tend to think 35:58
about this is that um in order to do 36:01
more and more complex logger horizon 36:04
tasks um what you really need is some 36:06
ability to self-correct. You need to be 36:09
able to sort of identify that you've 36:12
you've you make a plan and then you 36:13
start executing in the plan. But 36:15
everyone knows that our plans are kind 36:16
of worthless and uh and we encounter 36:18
reality. we get things wrong. And so I 36:21
think that a lot of what determines the 36:24
horizon length of what models can 36:26
accomplish is their ability to notice 36:28
that they're doing something wrong and 36:30
and correct it. Um, and I think that's 36:32
not sort of like a lot of bits of 36:34
information. It doesn't necessarily 36:36
require a huge change in intelligence to 36:37
sort of notice one or two more times 36:40
that you've made a mistake and how to 36:42
correct that mistake. But if you sort of 36:44
fix your mistake, maybe you sort of on 36:46
the order sort of double the horizon 36:48
length of the task because like instead 36:50
of getting stuck here, you get stuck 36:52
twice as far twice as far out. So I 36:54
think that's sort of the picture that I 36:56
have that like you can kind of unlock 36:58
longer and longer horizons with 36:59
relatively modest improvements in your 37:01
kind of ability to understand the task 37:04
and self-correct. But that just kind of 37:06
like those are just words. I think the 37:09
empirical trend is maybe the most 37:11
interesting thing. And uh maybe we can 37:13
build more detailed models for why that 37:15
trend is true, but it's sort of your 37:18
guess is as good as mine. 37:20
Yeah. So I also have a question over 37:22
here. Um so it's an honor. Um so 37:24
basically um in terms of um increasing 37:26
the time horizon, I feel like so my 37:29
mental model of neuronet networks is 37:31
very simple. If you want them to do 37:32
something, you train on such data. Um so 37:34
if you want them to um if you want to 37:37
increase the um time horizon you have to 37:39
slowly get for example verification 37:41
signals. Now um I think one way to do 37:42
this is via product. So like for example 37:45
um cloud agent and then you use the 37:47
verification signal to incrementally 37:48
improve the model. Now my question is 37:50
basically this works really nicely for 37:52
for example coding where you have a 37:54
product that is sufficiently good such 37:56
that you can deploy it and then get the 37:57
verification signal but what about other 37:59
domains like in other domains are we 38:01
just um scaling data labelers to AGI or 38:03
is there a better approach? Yeah, it's a 38:06
good question. I mean, um, so when when 38:09
sort of skeptics ask me sort of why do I 38:13
think we will be able to sort of scale 38:17
and get something like broadly human 38:20
level AI, it's basically because of of 38:21
what you said. there is some sort of 38:24
very kind of operationally intensive 38:26
path where you just sort of build more 38:29
and more different tasks for AI models 38:31
to do that are more and more complex, 38:34
more and more long horizon and you just 38:35
sort of turn the crank and train with RL 38:38
on those those more more complicated 38:40
tasks. So I sort of feel like that's the 38:43
worst case for AI progress. And I mean 38:44
given the level of investment in AI and 38:48
I think the the sort of level of value 38:50
that I think is being created with AI, I 38:52
think people will do that if necessary. 38:54
That said, I think there are a lot of 38:57
ways of sort of making it simpler. The 38:59
best is to have an AI model that is 39:01
trained to oversee and supervise what uh 39:05
claw like you have claude say which 39:09
you're training to be clawed when you 39:11
have another AI model that's sort of 39:13
providing supervision and is not just 39:14
saying did you do this incredibly 39:17
complicated task correctly like did you 39:19
become a faculty member and get tenure 39:23
will that take six or seven years is 39:25
that like an endto-end task where at the 39:27
end you sort of either get tenure or not 39:28
over seven that's that's ridiculous. 39:30
That's very inefficient. But instead can 39:32
provide more detailed supervision that 39:34
says you're doing this well, you're 39:36
doing this poorly. Um I think that sort 39:38
of as we're able to use AI more and more 39:40
in that kind of way, we'll probably be 39:43
able to make training for very long 39:45
horizon tasks more efficient and I think 39:47
we're already doing this to some extent. 39:49
We'll do one last question. 39:51
Yeah, I wanted to build on top of that. 39:53
when you're basically developing like 39:55
these tasks and then training them with 39:57
RL, would are you like like would you 39:59
like try creating these tasks like using 40:02
large language models like the tasks you 40:04
use for RL or are you still using 40:07
humans? 40:09
Great question. So I would say a mix. Um 40:10
I mean obviously we're building the 40:13
tasks as much as possible using AI to 40:14
sort of like say generate tasks with 40:17
code. we do like also uh ask humans to 40:20
create tasks. So it's it's basically 40:25
some mixture of those things. Um I think 40:27
that as AI gets better and better, 40:29
hopefully we're able to leverage AI more 40:31
and more, but of course the frontier of 40:33
the difficulty of these tasks also 40:35
increases. So I think humans are are are 40:37
still going to be involved. 40:39
Okay. Thank you. 40:40
All right. Let's give it a round of 40:41
applause to Jared. 40:43
Thank you so much. Thanks. 40:45

– English Lyrics

📲 "" is trending – don’t miss the chance to learn it in the app!
By
Viewed
53,478
Language
Learn this song

Lyrics & Translation

[English]
Hey everyone. Um, I'm Jared Kaplan. I'm
going to talk briefly about scaling and
the road to human level AI, but my guess
is for this audience, a lot of these
ideas are pretty familiar, so I'll keep
it short and then we're going to do a
sort of fireside chat Q&A with uh with
Diana. I actually have only been working
on AI for about six years. I uh before
that had a long career, the vast
majority of my career as a theoretical
physicist. um working in academia. And
so uh how did I get to AI? Well, I I I
want to be brief. Why did I start in
physics? It was basically because my mom
was a science fiction writer and I
wanted to figure out if we could build a
faster than light drive and physics was
the way to do that. Um I also was very
excited about just understanding the
universe. How do things work? How do the
biggest trends that underly sort of
everything that we see around us, where
does that all come from? For example, is
the universe deterministic? Do we have
free will? I was very, very interested
in all of those questions. But
fortunately, along the way, uh during my
career as a physicist, I met a lot of
very, very interesting, very deep
people, including many of the uh
founders of Anthropic that I now work
with all of the time. And uh I was
really interested in what they were
doing and I kept track of it. And as I
moved from different uh among different
subject areas in physics from large
hadron collider physics, particle
physics, cosmology, string theory, um
and on I got a little bit frustrated, a
little bit bored. I didn't feel like we
were making progress quickly enough. And
a lot of my friends were telling me that
AI was becoming a really big deal. Um
and I didn't believe them. I was really
skeptical. I thought, well, AI, people
have been working on it for 50 years.
SVMs aren't that exciting. Um, that was
all we knew about back in 2005, 2009
when I was in school. But I got
convinced that that maybe AI would be an
exciting field to work on. Um, and I I
got very lucky to know the right people
and the rest is history. So uh I'm going
to talk a little bit about how our
contemporary AI models work and how
scaling is leading them to get better
and better. So there are really two
fundamental phases to the training of
contemporary AI models like claude
chatgpt
etc. The first phase is pre-training and
that's where we train AI models to
imitate human written data, human
written text and understand the
correlations underlying that data. And
these these figures are very very retro.
This is actually from the playground of
the original GPD3 model. And you can see
that as a speaker at a journal club,
you're probably elephant me to say
certain things. is the word elephant in
that sentence is really really unlikely.
What pre-training does is teach models
what words are likely to follow other
words in large corporate of text and now
with contemporary models multimodal
data. The second phase of training for
contemporary AI models is reinforcement
learning. This is another very retro
slide. Um it shows the original
interface we used for sort of claude
zero or claude negative one uh back in
the ancient days of 2022
when we were collecting feedback data.
And what you see here is basically the
interface for having a conversation with
very very early versions of Claude and
picking which response from Claude was
better according to you, according to
crowdworkers, etc. And using that
signal, we optimize, we reinforce the
behaviors that are chosen to be good,
that are chosen to be helpful, honest,
and harmless. And we discourage the
behaviors that are bad. So really all
there is to training these models is
learning to predict the next word and
then doing reinforcement learning to
learn to do useful tasks. And it turns
out that there are scaling laws for both
of these phases of training. So this is
a a figure that that we made five or six
years ago now and it shows how as you
scale up the pre-training phase of AI,
you predictably get better and better
performance for our models. And this was
something that came about because I was
just sort of asking the dumbest possible
question. As a physicist, that's what
you're trained to do. You sort of look
at the big picture and you ask really
dumb things. I'd heard it was very
popular in the 2010s to say that big
data was important and so I just wanted
to know how big should the data be? How
important is it? How much does it help?
Similarly, a lot of people were noticing
that larger AI models performed better.
And so we just asked the question, how
much better do these models perform? And
we got really lucky. We found that
there's actually something very very
very precise and surprising underlying
AI training. This really blew us away
that there are these nice trends that
are as precise as anything that you see
in physics or or astronomy. And these
gave us a lot of conviction to believe
that AI was just going to keep getting
smarter and smarter in a very
predictable way. Because as you can see
in these figures already back in 2019,
we were looking across many many many
orders of magnitude in compute, in data
set size, in neural network size. And so
we expected once you see something is
true over many many many orders of
magnitude you expect it's probably going
to continue to be true for a long time
further. So this has sort of been one of
the fundamental things that I think
underlies uh uh improvements in in AI.
The other is actually also something
that started to appear quite a long time
ago although it's become really really
impactful uh in the last couple of years
is that you can see scaling laws in the
reinforcement learning phase of AI
training. So uh a researcher about four
years ago decided to study scaling laws
for Alph Go. Basically putting together
two very very high-profile AI successes,
GPD3 and scaling for pre-training and
AlphaGo. This was just a researcher uh
Andy Jones working on his own uh with
like his own I think maybe single GPU
back in these sort of ancient days. And
so he couldn't study AlphaGo, that was
expensive, but he could study a simpler
game called Hex. So he made this plot
that you see here. Now, ELO scores, I
think, weren't as as as well known um
back then, but all EOS ELO scores are,
of course, is chess ratings. They
basically describe how likely it is for
one player to beat another in a game of
chess. They're used now to benchmark AI
models to see sort of how often does a
human prefer one AI model to another.
But but back then this is just sort of
the classic application of ELO scores as
as chess ratings. And he looked at as
you train different models to play this
game of hex, which is a very simple
board game, a bit simpler than than Go,
how do they do? And he saw these
remarkable straight lines. So it's sort
of a skill in science to notice very
very simple trends and and this was one
I think it went unnoticed. I think
people didn't focus on this this sort of
kind of scaling behavior in RL soon
enough but but eventually it came to
pass. So we see that basically you can
scale up the compute in both
pre-training and RL and get better and
better performance. And I think that's
sort of the fundamental thing that is
driving AI progress. It's not that AI
researchers are really smart or they
suddenly got smart. It's that we found a
very very simple way of making AI better
systematically and and we're we're
turning that crank. So what kinds of
capabilities is this unlocking? I tend
to think of AI capabilities on two axes.
I think the less interesting axis, but
it's still very important is basically
the the flexibility of AI, the ability
of AI to meet us where we are. So if you
put say Alph Go on this figure, it would
be very very far below the X-axis
because although Alph Go was super
intelligent, it was better than any Go
player at playing Go, it was uh only
able to operate in the universe of a Go
board. But we've made steady progress
since the advent of large language
models making uh AI that can deal with
many many many all of the modalities
that that people can deal with. We don't
have AI models I think that uh that have
a sense of smell. Um but that's that's
probably coming. And so as you go up the
y- axis here you get to AI systems that
can do more and more relevant things in
in the world. I think the more
interesting axis though is sort of the
the x-axis here which is how long it
would take a person to do to do the
kinds of tasks that AI models can do and
that's something that has been
increasing steadily as we increase the
capability of AI. This is sort of the
time horizon for for tasks and um an
organization meter studied this very
systematically and found yet another
scaling trend. They found that if you
look at uh the length of tasks that AI
models can do, it's doubling roughly
every 7 months. And so what this means
is that the increasing intelligence that
is being baked into AI by scaling
compute for pre-training and RL is
leading to predictable useful
tasks that the AI models uh can can do,
including longer and longer horizon
tasks. And so you can sort of speculate
about where this is heading. And in AI
2027 folks did. And this kind of picture
suggests that over the next few years we
may reach a point where AI models um can
do tasks that don't just take us minutes
or hours but days, weeks, months, years
etc. Eventually, we imagine AI models or
or millions of AI models perhaps working
together will be able to do the work
that whole human organizations can do.
They'll be able to do the kind of work
that the entire scientific community
currently does. Um, one of the nice
things about math or theoretical physics
is that you can make progress just by by
thinking. Um and so you can imagine AI
systems working together to make the
kind of progress that the theoretical
physics community makes in in say 50
years in a matter of days, weeks etc. So
what is left if if this sort of picture
of scaling can take us very far? What is
left? I think that what may be left in
order to unlock um kind of human level
AI broadly construed is relatively
simple. One of the most important
ingredients I think is relevant
organizational knowledge. So we need to
train AI models that don't just greet
you with a blank slate but can learn to
work within companies, organizations,
governments as though they have the kind
of context that someone who's been
working there for years has. So I think
AI models need to be able to work with
knowledge. They also need memory. What
is memory if not knowledge? I
distinguish it in the sense that as you
do a task that takes you a very very
long time, you need to keep track of
your progress on that specific task, you
need to build relevant memories and you
you need to be able to use them. And
that's something that we've uh we've
begun to build into into Claude 4 and I
think will become increasingly
important. A third ingredient that I
think that we need to get better at and
and we're making progress on is
oversight. the ability of AI models to
understand sort of fine grained nuances
to solve hard fuzzy tasks. So it's easy
right now and you see an explosion of
progress for us to train AI models that
can say write code that passes tests or
that answer math questions correctly
because it's very crisp what's correct
and what's incorrect. So it's very easy
to apply reinforcement learning to make
AI models uh do better and better at
those kinds of tasks. But what we need
and are developing are AI models that
help us to generate much more nuanced
reward signals so that we can leverage
reinforcement learning to do to do
things like tell good jokes, write good
poems, um and have good taste in in
research. The other ingredients that we
need, I think, are are are simpler. We
obviously need to be able to train AI
models to do more and more complex
tasks. We need to work our way up the
y-axis from text models to multimodal
models to robotics. Um, and I expect
that over the next few years, we'll see
increasing uh continued gains from scale
when applied applied to these these
different domains.
And so how should we sort of prepare for
this this future these possibilities? I
think there are a few a few things that
I always recommend. One is I think it's
really a good idea to build things that
don't quite work yet. This is probably
always a good idea. We always want to
have ambition, but I think specifically
AI models right now are getting better
very very quickly. And I think that's
going to continue. That means that if
you build uh a product that doesn't
quite work because Claude 4 is still a
little bit too dumb, um you could expect
that there'll be a Claude 5 coming that
will make that make that product work
and deliver a lot of value. So I think
that's that's something that I always
recommend is sort of experiment on the
boundaries of what AI can do because
those boundaries are moving rapidly. The
next point I think is that AI is going
to be helpful for integrating AI. I
think that one of the main bottlenecks
for AI is really just that it's
developing so quickly that we haven't
had time to integrate it into
products, companies, other thing
everything else that we we we do into
into science. Um, and so I think that in
order to sort of speed that process up,
I think leveraging AI for AI integration
is going to be is going to be very
valuable. And then finally, I mean, I
think this is sort of obvious for for
this crowd, but I think figuring out
where adoption of AI could happen very
very quickly is is key. Um, we're seeing
uh an explosion of AI integration for
coding. And there are a lot of reasons
why software engineering is a great
place for AI, but I think the big
question is sort of what's next? Um,
what beyond software engineering can
grow that that quickly? I don't know the
answer, of course. Um, but hopefully you
guys will figure it out. So that's it
for for for the talk. Um, I want to
invite Diana on stage for uh for a chat.
YC's next batch is now taking
applications. Got a startup in you?
Apply at y combinator.com/apply.
It's never too early and filling out the
app will level up your idea. Okay, back
to the video. That was a awesome talk
about all the scaling laws and recently
Anthropic just launched clot 4 which is
just available. Curious uh how does it
change what is possible as all these
model releases keep compounding for the
next 12 months?
I think that uh we'll be in trouble if
it's 12 months before before an even
better model comes out. But uh I guess
uh a few things with with Cloud 4. I
think that with Cloud 3.7 Sonnet
uh it was already really exciting to use
3.7 for coding. But I think something
that everyone noticed was that 3.7 was a
little bit too eager. Um sometimes it
just really wanted to make your tests
pass. Um and it would do things that
that you you don't really want. Uh there
are a lot of like try excepts things
like that. Um, so with Cloud 4, I think
that we've been able to improve the
model's ability to act as an agent
specifically for coding, but but in a
lot of other ways for search, for all
kinds of other applications. Um, but
also improve its supervision, the sort
of oversight that I I I mentioned in my
talk, so that it uh it follows your
directions and hopefully improves in in
code quality. I think the other thing
that we've worked on is improving its
ability to uh save and store memories
and we hope to see people leveraging
that because Claude 4 can blow through
its context window with a very complex
task but can also uh store memories as
files or records, retrieve them in order
to sort of keep doing work across many
many many context windows. But I guess
finally I think the picture that scaling
laws paint is one of incremental
progress. And so I think that what
you'll see with Claude is that steadily
it gets better in lots of different ways
with each release. Um but I think that
scaling really suggests a kind of smooth
curve towards what I expect is kind of
human level AI or AGI.
Is there some special feature that a lot
of the audience here are going to get
excited? some some beta that you can
some alpha leak you can give everyone on
what you think people are going to fall
in love with the new APIs.
I think the thing that I I'm most
excited about is sort of uh memory
unlocking longer and longer horizon
tasks. I think that like as as time goes
on we're going to see Claude as a
collaborator that can sort of take on
larger and larger chunks of work. This
is to your point of all these future
models being able to take bigger and
bigger tasks right now. At this point,
they're able to do tasks in the hours.
Yeah, I think so. I think it's a very
imprecise measure, but I think that
right now if you look at sort of
software engineering tasks, I think
meter literally benchmarked how long it
would take people to do various tasks
and uh and yeah, I think it's a time
scale of of hours. I think just gen like
broadly as people work with AI,
I think that the people who are skeptics
of AI will say correctly that AI makes
lots of stupid mistakes. Um, it can do
things that are absolutely brilliant and
and surprise you, but it can also make
uh make basic errors. I think one of the
sort of basic features of of AI that's
different about the shape of AI
intelligence compared to human
intelligence is that there are a lot of
things that I can't do but I can at
least judge whether they were done
correctly. I think for AI the judgment
versus the generative capability is much
closer which means that I think that uh
a major role people can play in
interacting with AI is kind of as
managers to sort of sanity check uh
sanity check the the work
which is fascinating because one of the
things we observe through the batches in
YC last year a lot of companies when
they were out and selling products they
were selling it more still as a co-pilot
where you would have a co-pilot let's
say for customer support where you still
need the last human approval before they
would send the reply for a customer but
one thing that has changed just in the
spring batch I think a lot of the AI
models are very capable to do task end
to end to your point that which is uh
remarkable founders are selling now
directly replacements of full workflows
how have you seen this translate to what
you hope the audience will build.
I think there are a lot of
possibilities. Basically, it's a
question of
what level of success or performance is
is acceptable. There are some tasks
where getting it sort of 70% right is is
good enough and others where you need
99.9% to to deploy. I think that
honestly I think it's probably a lot
more fun to build for use cases where uh
70 80% is good enough because then you
can really get to the frontier of what
AI is capable of. But I think that we're
sort of pushing up the the reliability
as well. So I think that uh we will see
more and more of these tasks. I think
that uh right now human AI collaboration
is is going to be the sort of most
interesting place because I think that
for the most advanced tasks you're
really going to need humans in the loop.
But I do think in the longer term there
will be more and more tasks that can be
fully automated.
Can you say more about what you think
the world is going to look like with
this human to AI loop collaboration?
because there's the essay from Dario
with machines of love and grace that he
paints this picture that's very
optimistic and what are the details of
how we get there with with this book?
I think that we already see some of some
of that happening. So at least when I
talk to folks who work in say biomedical
research um with the right sort of
orchestration I think it's possible to
take frontier AI models now and produce
interesting valuable insights for say
drug discovery. Um so I think that's
already starting to happen. I guess an
aspect of it that that I think about is
that like there there's sort of
intelligence that requires a lot of
depth um and and intelligence that
requires a lot of breadth. So for
example in math you can sort of work on
trying to prove one theorem for a decade
like the threemon hypothesis or firmat's
last theorem. Um I think that's that's
sort of solving one very specific very
hard problem. I think there's a lot of
areas of science, probably more so in
biology, maybe interestingly in
psychology or or history, where putting
together a very very large number of
pieces of information um across many
many different areas is kind of where
it's at. And I think that AI models
during the pre-training phase kind of
embibe all of human civilization's
knowledge. And so I suspect that there's
a lot of uh fruit to be picked in using
that sort of feature of AI that it knows
much much more than any one human expert
and therefore you can kind of elicit um
insights putting together many different
uh many different areas of expertise say
across biology for for for research. So
I think that um we're making a lot of
progress on making AI better at deeper
tasks like hard coding problems, hard
math problems, but I suspect that
there's a particular overhang in areas
where putting together knowledge that
maybe no one human expert would have
where that kind of intelligence is is is
very useful. So I think that's something
that I' I'd expect to see more of. Um is
sort of leveraging AI's sort of breadth
of knowledge. In terms of how exactly it
will roll out, I really don't know. It's
really really hard to predict the
future. Scaling laws give you one way of
predicting the future which says this
trend is going to continue. I think a
lot of trends that we see
over the long haul I expect will
continue. I mean the economy, the GDP,
uh the these kinds of trends are really
reliable indicators of the future. But I
think in terms of in detail how will
things be implemented, I think it's
really really hard to say.
Are there specific areas that you think
a lot more builders could go into and
build with these new models? I mean
there's a lot that has been done let's
say for coding tasks but what are some
tasks that have a lot more green field
that are just getting unlocked right now
with the current models
I come from a research background rather
than uh rather than business so I don't
I don't know that I have anything very
uh very deep to say but I think that
like in general any place where um it
requires a lot of skill um and it's a
task that mostly involves sort
sitting in front of a computer
interacting with data. I think finance
uh people who use Excel spreadsheets a
lot. Um I think I I expect law although
maybe maybe maybe law uh is is is more
regulated requires more uh more more
expertise um as a stamp of approval. But
I think all of these areas are probably
green field. I think another that that I
sort of mentioned is how do we integrate
AI into existing businesses? I think
that like when electricity came along,
there was some long adoption cycle and
the very first simplest ways of say
using electricity weren't necessarily uh
the best. You wanted to not just replace
a steam engine with an electric motor.
You wanted to sort of remake the way
that factories work. And I think that
probably leveraging AI to integrate AI
into parts of the economy um as quickly
as possible. I expect there's just a lot
of a lot of leverage there.
Now other question is you have a
extensive training as a physicist and
you were one of the first to really
observe this trend with scaling laws and
it probably comes from being a physicist
and seeing all these exponentials that
happen naturally in nature. How has that
training come about with uh being able
to perform like the best research in the
world with with with with AI?
I think the thing that was useful from a
physics point of view is looking for the
biggest picture, most macro trends and
then trying to make them as precise as
possible. So I remember meeting like
kind of brilliant AI researchers who
would say things like learning is
converging exponentially
and I would just ask really dumb
questions like are you sure it's an
exponential? Could it just be a power
law? Is it quadratic? Like like exactly
how is this thing converging? And it's a
really dumb kind of simple question to
ask, but basically I think there was a
lot of fruit to be picked and and
probably still is in trying to make the
big trends that you see as precise as
possible because that I don't know it
gives you a lot of tools. It allows you
to ask like what does it really mean to
move the needle? I think with scaling
laws, the the holy grail is finding a
better slope to the scaling law because
that means that as you put in more
compute, you're going to get a bigger
and bigger advantage over other AI
developers. Um, but until you've sort of
made precise what the trend is that you
see, you sort of don't know exactly what
it means to beat it and and how much you
can beat it by and how to know
systematically whether you're you're
you're achieving that end. So, I think
those were kind of the tools that that I
think I used. It wasn't necessarily like
literally applying say quantum field
theory to AI. I think that's uh that's a
little bit too specific. Well, are there
specific uh physics heruristics like
reormalization, symmetry that came in
very handy to really keep observing this
trend or or measuring it?
Something that you'll observe if you
look at AI models is that they're big.
Neural networks are big. They have
billions now trillions of parameters.
That means that they're made out of big
matrices. and basically studying uh
approximations
where you
take the limit that neural networks are
very big and specifically that the uh
matrices that compose neural networks
are big. That's actually been kind of
useful and that's something that
actually was a well-known approximation
in in physics um and and in math. Um
that's something that's been applied.
But I think generally it's really asking
very naive dumb questions that gets you
very far. I think AI is really in a
certain sense only like maybe 101 15
years old in terms of the current
incarnation of how we're training AI
models. That means that it's an
incredibly new field. A lot of the most
basic questions haven't been answered
like questions of interpretability, how
AI models really work. And so I think
there's there's really a lot to uh to
learn at that level rather than applying
very very fancy techniques. Are there
specific tools in physics that you apply
for interpretability?
I would say that interpretability is a
lot more like biology. It's a lot more
like neuroscience. So I think those are
kind of the tools. Um there there is
there is some more more more mathematics
there. But I I think it's more like
trying to understand the features of the
brain. Um the benefit that you get with
AI over neuroscience is that um you can
really measure everything in AI. You
can't measure the the activity of every
neuron, every syninnapse in a brain, but
you can do that in AI. So there's much
much much more data for reverse
engineering how AI models work.
Now when aspect about scaling laws,
they've held for over five orders of
magnitude, which is wild. This is a bit
of a contrarian question, but what
empirical sign would convince you that
the curve are changing that maybe we're
getting off the curve?
I think it's a really I think it's a
really hard question, right? Because I
mostly use scaling laws to diagnose
whether AI training is broken or not.
Mh.
So I think that uh once you see
something and you find it very it's a
very compelling trend, it becomes very
very interesting to examine
where it's failing. But I think that my
first inclination is to think if scaling
laws are failing, it's because we've
screwed up AI training in some way.
Maybe we got uh we got the architecture
of the neural network wrong or there's
some bottleneck in training that we
don't see or there's some problem with
precision in the algorithms that we're
using. So I think it would take a lot to
convince me at least that scaling was
really no longer working at the level of
the sort of these empirical laws because
so many times in my experience over the
last 5 years when it seemed like scaling
was broken it was because we were doing
it wrong.
Interesting. So I guess going into
something very specific that goes hand
in hand is a lot of the compute power
required to go keep going on this curve.
What happens uh as compute becomes more
more scarce how far down do you go into
the precision ladder like do you explore
things like FP4 do you explore things
like turnary representations what what
are your thoughts around that? Yeah, I
mean I think that um right now AI is
really inefficient because there's a lot
of value in AI. So um there's a lot of
value in unlocking the most capable
frontier model. Um and so companies like
Anthropic and others are moving as
quickly as we can to both make AI
training more efficient and AI inference
more efficient as well as unlocking
frontier capabilities. But a lot of the
focus really is on uh unlocking the
frontier. I think that over time as AI
becomes more and more widespread, I
think that we're going to really drive
down the cost of inference and training
dramatically from where we are right
now. I mean right now we're seeing sort
of 3x to 10x gains algorithmically and
in sort of scaling up compute um and in
uh inference efficiency per year. I
guess like the joke is that we're going
to get computers back into binary. So I
think that we will see much much lower
precision as one of the many avenues to
make inference more efficient over time.
But sort of we h we're very very very
out of equilibrium with AI development
right now. AI is improving very rapidly.
Things are changing very rapidly. We
haven't fully realized the potential of
current models, but we're unlocking more
and more capabilities. So I think that
what the equilibrium situation looks
like where AI isn't changing that
quickly, I think is one where AI is
extremely inexpensive, but it's sort of
hard to know if we're even going to get
there. like AI may just keep getting
better so quickly that uh sort of
improvements in int intelligence unlock
so much more and so we may continue to
focus on that rather than say getting
precision down to FP2
which is very much uh the Jebans paradox
as intelligence becomes better and
better people are going to want it more
not that is driving the cost down which
is this irony right
yeah absolutely I mean I think that uh
yeah that's that's certainly certainly
something that we've seen that there are
certain uh certain points where AI
becomes accessible enough. That said, um
I think as AI systems become more and
more capable um and can do more and more
of the work that that we do, it's going
to be worth it to pay for uh frontier
capabilities. I think it's a question
that I've always had and can have is
kind of like is all of the value at the
frontier or is there a lot of value with
kind of cheaper systems that aren't
quite as capable? And I think the sort
of time horizon picture is maybe one way
of thinking about this. I think that you
can do a lot of very simple bite-sized
tasks, but I think it's just much more
convenient to be able to use an AI model
that can do a very complex task end to
end rather than requiring us as humans
to sort of orchestrate a much dumber
model to break the task down into very
very small slices and put them together.
So, I do kind of expect that a lot of
the value is going to come from the most
capable models, but I might be wrong. It
it might depend and it might really
depend on the capabilities of AI
integrators to sort of leverage AI
really efficiently.
What advice would you give this audience
which there everyone is early in the
career with lots of potential in terms
of how do you stay relevant in the
future where all these models are going
to become so awesome. What should
everyone be really good at and study and
to still do really good work? I think as
I mentioned there's a lot of value in
understanding how these models work and
being able to really efficiently
leverage them and and integrate them and
I think there's a lot of value in kind
of like building building at the
frontier. Um I don't know we could turn
it over to the audience for for
questions.
Let's turn it out to the audience for
some questions.
I had a quick question on the scaling
loss. You show that a lot of the scaling
laws are like linear that like the more
we have exponential compute going up but
then like we have linear progress in uh
in the scaling loss but then on your
last slide you show that you expect then
suddenly like an exponential growth in
like how much time we save. I want to
ask you like why do you think that
suddenly on this chart we're exponential
and not linear anymore?
Thank you.
Yeah, this is a really good question and
I don't know. Um I mean the meter
finding was kind of an empirical
finding. Um the way that I tend to think
about this is that um in order to do
more and more complex logger horizon
tasks um what you really need is some
ability to self-correct. You need to be
able to sort of identify that you've
you've you make a plan and then you
start executing in the plan. But
everyone knows that our plans are kind
of worthless and uh and we encounter
reality. we get things wrong. And so I
think that a lot of what determines the
horizon length of what models can
accomplish is their ability to notice
that they're doing something wrong and
and correct it. Um, and I think that's
not sort of like a lot of bits of
information. It doesn't necessarily
require a huge change in intelligence to
sort of notice one or two more times
that you've made a mistake and how to
correct that mistake. But if you sort of
fix your mistake, maybe you sort of on
the order sort of double the horizon
length of the task because like instead
of getting stuck here, you get stuck
twice as far twice as far out. So I
think that's sort of the picture that I
have that like you can kind of unlock
longer and longer horizons with
relatively modest improvements in your
kind of ability to understand the task
and self-correct. But that just kind of
like those are just words. I think the
empirical trend is maybe the most
interesting thing. And uh maybe we can
build more detailed models for why that
trend is true, but it's sort of your
guess is as good as mine.
Yeah. So I also have a question over
here. Um so it's an honor. Um so
basically um in terms of um increasing
the time horizon, I feel like so my
mental model of neuronet networks is
very simple. If you want them to do
something, you train on such data. Um so
if you want them to um if you want to
increase the um time horizon you have to
slowly get for example verification
signals. Now um I think one way to do
this is via product. So like for example
um cloud agent and then you use the
verification signal to incrementally
improve the model. Now my question is
basically this works really nicely for
for example coding where you have a
product that is sufficiently good such
that you can deploy it and then get the
verification signal but what about other
domains like in other domains are we
just um scaling data labelers to AGI or
is there a better approach? Yeah, it's a
good question. I mean, um, so when when
sort of skeptics ask me sort of why do I
think we will be able to sort of scale
and get something like broadly human
level AI, it's basically because of of
what you said. there is some sort of
very kind of operationally intensive
path where you just sort of build more
and more different tasks for AI models
to do that are more and more complex,
more and more long horizon and you just
sort of turn the crank and train with RL
on those those more more complicated
tasks. So I sort of feel like that's the
worst case for AI progress. And I mean
given the level of investment in AI and
I think the the sort of level of value
that I think is being created with AI, I
think people will do that if necessary.
That said, I think there are a lot of
ways of sort of making it simpler. The
best is to have an AI model that is
trained to oversee and supervise what uh
claw like you have claude say which
you're training to be clawed when you
have another AI model that's sort of
providing supervision and is not just
saying did you do this incredibly
complicated task correctly like did you
become a faculty member and get tenure
will that take six or seven years is
that like an endto-end task where at the
end you sort of either get tenure or not
over seven that's that's ridiculous.
That's very inefficient. But instead can
provide more detailed supervision that
says you're doing this well, you're
doing this poorly. Um I think that sort
of as we're able to use AI more and more
in that kind of way, we'll probably be
able to make training for very long
horizon tasks more efficient and I think
we're already doing this to some extent.
We'll do one last question.
Yeah, I wanted to build on top of that.
when you're basically developing like
these tasks and then training them with
RL, would are you like like would you
like try creating these tasks like using
large language models like the tasks you
use for RL or are you still using
humans?
Great question. So I would say a mix. Um
I mean obviously we're building the
tasks as much as possible using AI to
sort of like say generate tasks with
code. we do like also uh ask humans to
create tasks. So it's it's basically
some mixture of those things. Um I think
that as AI gets better and better,
hopefully we're able to leverage AI more
and more, but of course the frontier of
the difficulty of these tasks also
increases. So I think humans are are are
still going to be involved.
Okay. Thank you.
All right. Let's give it a round of
applause to Jared.
Thank you so much. Thanks.

Key Vocabulary

Start Practicing
Vocabulary Meanings

scaling

/ˈskeɪlɪŋ/

B2
  • noun
  • - the process of increasing or decreasing in size or extent

AI

/ˌeɪˈaɪ/

B1
  • noun
  • - Artificial Intelligence, the simulation of human intelligence in machines

models

/ˈmɒdəlz/

A2
  • noun
  • - a simplified representation of a system or process

training

/ˈtreɪnɪŋ/

A2
  • noun
  • - the process of teaching or learning a skill

reinforcement

/rɪˈɪnfərsəmənt/

C1
  • noun
  • - the process of encouraging or strengthening a behavior

learning

/ˈlɜːnɪŋ/

A1
  • noun
  • - the process of acquiring knowledge or skill

compute

/kəmˈpjuːt/

B1
  • verb
  • - to calculate or determine using a computer

data

/ˈdeɪtə/

A2
  • noun
  • - facts and statistics collected together for reference or analysis

intelligence

/ɪnˈtelɪdʒəns/

B1
  • noun
  • - the ability to learn, understand, and think in a logical way

capabilities

/kəˈpeɪbɪlɪtiz/

B2
  • noun
  • - the ability to do something

tasks

/tɑːsks/

A1
  • noun
  • - a piece of work to be done or undertaken

horizon

/həˈraɪzən/

B1
  • noun
  • - the limit of a person's mental perception, experience, or interest

memory

/ˈmeməri/

A2
  • noun
  • - the faculty by which the mind stores and remembers information

oversight

/ˈoʊvərsaɪt/

C1
  • noun
  • - the action of overseeing or the state of being overseen

integration

/ˌɪntɪˈgreɪʃən/

B2
  • noun
  • - the process of combining or coordinating different elements

progress

/ˈprəʊɡres/

A2
  • noun
  • - forward or onward movement toward a destination

🚀 "scaling", "AI" – from “” still a mystery?

Learn trendy vocab – vibe with music, get the meaning, and use it right away without sounding awkward!

Key Grammar Structures

Coming Soon!

We're updating this section. Stay tuned!

Related Songs