Trending Songs Recently Updated Songs Popular Music Genres Add Songs

Explore

Display Bilingual:

Off Tiếng Việt 한국어 Español Português Français 中文日本語

Hey everyone. Um, I'm Jared Kaplan. I'm 00:00

going to talk briefly about scaling and 00:03

the road to human level AI, but my guess 00:06

is for this audience, a lot of these 00:08

ideas are pretty familiar, so I'll keep 00:10

it short and then we're going to do a 00:12

sort of fireside chat Q&A with uh with 00:14

Diana. I actually have only been working 00:16

on AI for about six years. I uh before 00:20

that had a long career, the vast 00:24

majority of my career as a theoretical 00:26

physicist. um working in academia. And 00:28

so uh how did I get to AI? Well, I I I 00:31

want to be brief. Why did I start in 00:34

physics? It was basically because my mom 00:36

was a science fiction writer and I 00:38

wanted to figure out if we could build a 00:41

faster than light drive and physics was 00:44

the way to do that. Um I also was very 00:46

excited about just understanding the 00:49

universe. How do things work? How do the 00:52

biggest trends that underly sort of 00:54

everything that we see around us, where 00:58

does that all come from? For example, is 01:00

the universe deterministic? Do we have 01:02

free will? I was very, very interested 01:04

in all of those questions. But 01:05

fortunately, along the way, uh during my 01:07

career as a physicist, I met a lot of 01:10

very, very interesting, very deep 01:13

people, including many of the uh 01:15

founders of Anthropic that I now work 01:17

with all of the time. And uh I was 01:20

really interested in what they were 01:23

doing and I kept track of it. And as I 01:24

moved from different uh among different 01:27

subject areas in physics from large 01:30

hadron collider physics, particle 01:33

physics, cosmology, string theory, um 01:34

and on I got a little bit frustrated, a 01:38

little bit bored. I didn't feel like we 01:41

were making progress quickly enough. And 01:42

a lot of my friends were telling me that 01:44

AI was becoming a really big deal. Um 01:46

and I didn't believe them. I was really 01:49

skeptical. I thought, well, AI, people 01:51

have been working on it for 50 years. 01:53

SVMs aren't that exciting. Um, that was 01:55

all we knew about back in 2005, 2009 01:58

when I was in school. But I got 02:01

convinced that that maybe AI would be an 02:03

exciting field to work on. Um, and I I 02:05

got very lucky to know the right people 02:08

and the rest is history. So uh I'm going 02:10

to talk a little bit about how our 02:13

contemporary AI models work and how 02:16

scaling is leading them to get better 02:18

and better. So there are really two 02:21

fundamental phases to the training of 02:24

contemporary AI models like claude 02:27

chatgpt 02:30

etc. The first phase is pre-training and 02:32

that's where we train AI models to 02:37

imitate human written data, human 02:40

written text and understand the 02:42

correlations underlying that data. And 02:45

these these figures are very very retro. 02:47

This is actually from the playground of 02:50

the original GPD3 model. And you can see 02:52

that as a speaker at a journal club, 02:55

you're probably elephant me to say 02:58

certain things. is the word elephant in 03:00

that sentence is really really unlikely. 03:01

What pre-training does is teach models 03:05

what words are likely to follow other 03:08

words in large corporate of text and now 03:10

with contemporary models multimodal 03:14

data. The second phase of training for 03:16

contemporary AI models is reinforcement 03:19

learning. This is another very retro 03:22

slide. Um it shows the original 03:24

interface we used for sort of claude 03:27

zero or claude negative one uh back in 03:29

the ancient days of 2022 03:32

when we were collecting feedback data. 03:35

And what you see here is basically the 03:39

interface for having a conversation with 03:41

very very early versions of Claude and 03:44

picking which response from Claude was 03:47

better according to you, according to 03:51

crowdworkers, etc. And using that 03:54

signal, we optimize, we reinforce the 03:57

behaviors that are chosen to be good, 04:00

that are chosen to be helpful, honest, 04:04

and harmless. And we discourage the 04:05

behaviors that are bad. So really all 04:07

there is to training these models is 04:10

learning to predict the next word and 04:12

then doing reinforcement learning to 04:15

learn to do useful tasks. And it turns 04:17

out that there are scaling laws for both 04:19

of these phases of training. So this is 04:22

a a figure that that we made five or six 04:26

years ago now and it shows how as you 04:29

scale up the pre-training phase of AI, 04:32

you predictably get better and better 04:35

performance for our models. And this was 04:38

something that came about because I was 04:41

just sort of asking the dumbest possible 04:43

question. As a physicist, that's what 04:45

you're trained to do. You sort of look 04:47

at the big picture and you ask really 04:48

dumb things. I'd heard it was very 04:50

popular in the 2010s to say that big 04:53

data was important and so I just wanted 04:55

to know how big should the data be? How 04:58

important is it? How much does it help? 05:02

Similarly, a lot of people were noticing 05:04

that larger AI models performed better. 05:06

And so we just asked the question, how 05:09

much better do these models perform? And 05:11

we got really lucky. We found that 05:14

there's actually something very very 05:16

very precise and surprising underlying 05:18

AI training. This really blew us away 05:21

that there are these nice trends that 05:23

are as precise as anything that you see 05:25

in physics or or astronomy. And these 05:27

gave us a lot of conviction to believe 05:30

that AI was just going to keep getting 05:34

smarter and smarter in a very 05:36

predictable way. Because as you can see 05:38

in these figures already back in 2019, 05:40

we were looking across many many many 05:44

orders of magnitude in compute, in data 05:47

set size, in neural network size. And so 05:50

we expected once you see something is 05:54

true over many many many orders of 05:56

magnitude you expect it's probably going 05:58

to continue to be true for a long time 06:00

further. So this has sort of been one of 06:01

the fundamental things that I think 06:04

underlies uh uh improvements in in AI. 06:05

The other is actually also something 06:09

that started to appear quite a long time 06:11

ago although it's become really really 06:13

impactful uh in the last couple of years 06:15

is that you can see scaling laws in the 06:18

reinforcement learning phase of AI 06:20

training. So uh a researcher about four 06:23

years ago decided to study scaling laws 06:27

for Alph Go. Basically putting together 06:31

two very very high-profile AI successes, 06:33

GPD3 and scaling for pre-training and 06:36

AlphaGo. This was just a researcher uh 06:39

Andy Jones working on his own uh with 06:42

like his own I think maybe single GPU 06:45

back in these sort of ancient days. And 06:48

so he couldn't study AlphaGo, that was 06:50

expensive, but he could study a simpler 06:52

game called Hex. So he made this plot 06:53

that you see here. Now, ELO scores, I 06:56

think, weren't as as as well known um 07:00

back then, but all EOS ELO scores are, 07:02

of course, is chess ratings. They 07:05

basically describe how likely it is for 07:07

one player to beat another in a game of 07:10

chess. They're used now to benchmark AI 07:13

models to see sort of how often does a 07:16

human prefer one AI model to another. 07:18

But but back then this is just sort of 07:20

the classic application of ELO scores as 07:22

as chess ratings. And he looked at as 07:24

you train different models to play this 07:27

game of hex, which is a very simple 07:30

board game, a bit simpler than than Go, 07:33

how do they do? And he saw these 07:36

remarkable straight lines. So it's sort 07:37

of a skill in science to notice very 07:40

very simple trends and and this was one 07:43

I think it went unnoticed. I think 07:45

people didn't focus on this this sort of 07:48

kind of scaling behavior in RL soon 07:50

enough but but eventually it came to 07:52

pass. So we see that basically you can 07:53

scale up the compute in both 07:56

pre-training and RL and get better and 07:57

better performance. And I think that's 08:00

sort of the fundamental thing that is 08:01

driving AI progress. It's not that AI 08:03

researchers are really smart or they 08:07

suddenly got smart. It's that we found a 08:08

very very simple way of making AI better 08:12

systematically and and we're we're 08:16

turning that crank. So what kinds of 08:18

capabilities is this unlocking? I tend 08:20

to think of AI capabilities on two axes. 08:22

I think the less interesting axis, but 08:25

it's still very important is basically 08:27

the the flexibility of AI, the ability 08:30

of AI to meet us where we are. So if you 08:34

put say Alph Go on this figure, it would 08:38

be very very far below the X-axis 08:42

because although Alph Go was super 08:44

intelligent, it was better than any Go 08:46

player at playing Go, it was uh only 08:49

able to operate in the universe of a Go 08:53

board. But we've made steady progress 08:55

since the advent of large language 08:58

models making uh AI that can deal with 09:00

many many many all of the modalities 09:05

that that people can deal with. We don't 09:08

have AI models I think that uh that have 09:09

a sense of smell. Um but that's that's 09:11

probably coming. And so as you go up the 09:14

y- axis here you get to AI systems that 09:16

can do more and more relevant things in 09:19

in the world. I think the more 09:22

interesting axis though is sort of the 09:23

the x-axis here which is how long it 09:25

would take a person to do to do the 09:28

kinds of tasks that AI models can do and 09:30

that's something that has been 09:33

increasing steadily as we increase the 09:34

capability of AI. This is sort of the 09:37

time horizon for for tasks and um an 09:38

organization meter studied this very 09:42

systematically and found yet another 09:44

scaling trend. They found that if you 09:46

look at uh the length of tasks that AI 09:49

models can do, it's doubling roughly 09:52

every 7 months. And so what this means 09:55

is that the increasing intelligence that 09:58

is being baked into AI by scaling 10:02

compute for pre-training and RL is 10:04

leading to predictable useful 10:07

tasks that the AI models uh can can do, 10:10

including longer and longer horizon 10:14

tasks. And so you can sort of speculate 10:15

about where this is heading. And in AI 10:17

2027 folks did. And this kind of picture 10:20

suggests that over the next few years we 10:24

may reach a point where AI models um can 10:27

do tasks that don't just take us minutes 10:30

or hours but days, weeks, months, years 10:32

etc. Eventually, we imagine AI models or 10:36

or millions of AI models perhaps working 10:39

together will be able to do the work 10:42

that whole human organizations can do. 10:44

They'll be able to do the kind of work 10:46

that the entire scientific community 10:48

currently does. Um, one of the nice 10:50

things about math or theoretical physics 10:52

is that you can make progress just by by 10:54

thinking. Um and so you can imagine AI 10:57

systems working together to make the 11:00

kind of progress that the theoretical 11:02

physics community makes in in say 50 11:04

years in a matter of days, weeks etc. So 11:06

what is left if if this sort of picture 11:11

of scaling can take us very far? What is 11:13

left? I think that what may be left in 11:15

order to unlock um kind of human level 11:18

AI broadly construed is relatively 11:21

simple. One of the most important 11:24

ingredients I think is relevant 11:25

organizational knowledge. So we need to 11:28

train AI models that don't just greet 11:30

you with a blank slate but can learn to 11:33

work within companies, organizations, 11:37

governments as though they have the kind 11:39

of context that someone who's been 11:42

working there for years has. So I think 11:44

AI models need to be able to work with 11:47

knowledge. They also need memory. What 11:48

is memory if not knowledge? I 11:51

distinguish it in the sense that as you 11:53

do a task that takes you a very very 11:56

long time, you need to keep track of 11:59

your progress on that specific task, you 12:01

need to build relevant memories and you 12:03

you need to be able to use them. And 12:05

that's something that we've uh we've 12:06

begun to build into into Claude 4 and I 12:08

think will become increasingly 12:11

important. A third ingredient that I 12:12

think that we need to get better at and 12:14

and we're making progress on is 12:16

oversight. the ability of AI models to 12:19

understand sort of fine grained nuances 12:24

to solve hard fuzzy tasks. So it's easy 12:26

right now and you see an explosion of 12:30

progress for us to train AI models that 12:32

can say write code that passes tests or 12:34

that answer math questions correctly 12:37

because it's very crisp what's correct 12:39

and what's incorrect. So it's very easy 12:42

to apply reinforcement learning to make 12:44

AI models uh do better and better at 12:47

those kinds of tasks. But what we need 12:49

and are developing are AI models that 12:52

help us to generate much more nuanced 12:54

reward signals so that we can leverage 12:57

reinforcement learning to do to do 13:01

things like tell good jokes, write good 13:04

poems, um and have good taste in in 13:06

research. The other ingredients that we 13:11

need, I think, are are are simpler. We 13:13

obviously need to be able to train AI 13:15

models to do more and more complex 13:16

tasks. We need to work our way up the 13:18

y-axis from text models to multimodal 13:22

models to robotics. Um, and I expect 13:24

that over the next few years, we'll see 13:27

increasing uh continued gains from scale 13:29

when applied applied to these these 13:33

different domains. 13:36

And so how should we sort of prepare for 13:38

this this future these possibilities? I 13:41

think there are a few a few things that 13:44

I always recommend. One is I think it's 13:46

really a good idea to build things that 13:49

don't quite work yet. This is probably 13:53

always a good idea. We always want to 13:55

have ambition, but I think specifically 13:56

AI models right now are getting better 13:59

very very quickly. And I think that's 14:01

going to continue. That means that if 14:03

you build uh a product that doesn't 14:04

quite work because Claude 4 is still a 14:07

little bit too dumb, um you could expect 14:09

that there'll be a Claude 5 coming that 14:11

will make that make that product work 14:14

and deliver a lot of value. So I think 14:16

that's that's something that I always 14:18

recommend is sort of experiment on the 14:19

boundaries of what AI can do because 14:21

those boundaries are moving rapidly. The 14:23

next point I think is that AI is going 14:25

to be helpful for integrating AI. I 14:28

think that one of the main bottlenecks 14:31

for AI is really just that it's 14:33

developing so quickly that we haven't 14:36

had time to integrate it into 14:38

products, companies, other thing 14:41

everything else that we we we do into 14:44

into science. Um, and so I think that in 14:46

order to sort of speed that process up, 14:49

I think leveraging AI for AI integration 14:51

is going to be is going to be very 14:53

valuable. And then finally, I mean, I 14:54

think this is sort of obvious for for 14:56

this crowd, but I think figuring out 14:58

where adoption of AI could happen very 14:59

very quickly is is key. Um, we're seeing 15:02

uh an explosion of AI integration for 15:07

coding. And there are a lot of reasons 15:10

why software engineering is a great 15:12

place for AI, but I think the big 15:14

question is sort of what's next? Um, 15:16

what beyond software engineering can 15:19

grow that that quickly? I don't know the 15:21

answer, of course. Um, but hopefully you 15:23

guys will figure it out. So that's it 15:26

for for for the talk. Um, I want to 15:28

invite Diana on stage for uh for a chat. 15:30

YC's next batch is now taking 15:34

applications. Got a startup in you? 15:36

Apply at y combinator.com/apply. 15:38

It's never too early and filling out the 15:41

app will level up your idea. Okay, back 15:44

to the video. That was a awesome talk 15:46

about all the scaling laws and recently 15:50

Anthropic just launched clot 4 which is 15:53

just available. Curious uh how does it 15:57

change what is possible as all these 16:01

model releases keep compounding for the 16:04

next 12 months? 16:07

I think that uh we'll be in trouble if 16:09

it's 12 months before before an even 16:11

better model comes out. But uh I guess 16:14

uh a few things with with Cloud 4. I 16:17

think that with Cloud 3.7 Sonnet 16:19

uh it was already really exciting to use 16:22

3.7 for coding. But I think something 16:25

that everyone noticed was that 3.7 was a 16:28

little bit too eager. Um sometimes it 16:32

just really wanted to make your tests 16:35

pass. Um and it would do things that 16:37

that you you don't really want. Uh there 16:39

are a lot of like try excepts things 16:41

like that. Um, so with Cloud 4, I think 16:43

that we've been able to improve the 16:46

model's ability to act as an agent 16:49

specifically for coding, but but in a 16:52

lot of other ways for search, for all 16:53

kinds of other applications. Um, but 16:55

also improve its supervision, the sort 16:57

of oversight that I I I mentioned in my 17:01

talk, so that it uh it follows your 17:03

directions and hopefully improves in in 17:06

code quality. I think the other thing 17:09

that we've worked on is improving its 17:10

ability to uh save and store memories 17:12

and we hope to see people leveraging 17:15

that because Claude 4 can blow through 17:17

its context window with a very complex 17:19

task but can also uh store memories as 17:21

files or records, retrieve them in order 17:24

to sort of keep doing work across many 17:27

many many context windows. But I guess 17:29

finally I think the picture that scaling 17:31

laws paint is one of incremental 17:33

progress. And so I think that what 17:35

you'll see with Claude is that steadily 17:37

it gets better in lots of different ways 17:40

with each release. Um but I think that 17:43

scaling really suggests a kind of smooth 17:45

curve towards what I expect is kind of 17:49

human level AI or AGI. 17:52

Is there some special feature that a lot 17:54

of the audience here are going to get 17:57

excited? some some beta that you can 17:59

some alpha leak you can give everyone on 18:02

what you think people are going to fall 18:05

in love with the new APIs. 18:07

I think the thing that I I'm most 18:09

excited about is sort of uh memory 18:11

unlocking longer and longer horizon 18:14

tasks. I think that like as as time goes 18:16

on we're going to see Claude as a 18:19

collaborator that can sort of take on 18:22

larger and larger chunks of work. This 18:23

is to your point of all these future 18:25

models being able to take bigger and 18:27

bigger tasks right now. At this point, 18:29

they're able to do tasks in the hours. 18:31

Yeah, I think so. I think it's a very 18:35

imprecise measure, but I think that 18:38

right now if you look at sort of 18:40

software engineering tasks, I think 18:42

meter literally benchmarked how long it 18:43

would take people to do various tasks 18:45

and uh and yeah, I think it's a time 18:47

scale of of hours. I think just gen like 18:50

broadly as people work with AI, 18:52

I think that the people who are skeptics 18:55

of AI will say correctly that AI makes 18:57

lots of stupid mistakes. Um, it can do 19:00

things that are absolutely brilliant and 19:03

and surprise you, but it can also make 19:05

uh make basic errors. I think one of the 19:07

sort of basic features of of AI that's 19:09

different about the shape of AI 19:12

intelligence compared to human 19:14

intelligence is that there are a lot of 19:15

things that I can't do but I can at 19:17

least judge whether they were done 19:19

correctly. I think for AI the judgment 19:21

versus the generative capability is much 19:24

closer which means that I think that uh 19:27

a major role people can play in 19:29

interacting with AI is kind of as 19:31

managers to sort of sanity check uh 19:33

sanity check the the work 19:36

which is fascinating because one of the 19:37

things we observe through the batches in 19:39

YC last year a lot of companies when 19:41

they were out and selling products they 19:45

were selling it more still as a co-pilot 19:47

where you would have a co-pilot let's 19:50

say for customer support where you still 19:52

need the last human approval before they 19:54

would send the reply for a customer but 19:56

one thing that has changed just in the 19:59

spring batch I think a lot of the AI 20:01

models are very capable to do task end 20:04

to end to your point that which is uh 20:07

remarkable founders are selling now 20:09

directly replacements of full workflows 20:12

how have you seen this translate to what 20:17

you hope the audience will build. 20:19

I think there are a lot of 20:22

possibilities. Basically, it's a 20:23

question of 20:25

what level of success or performance is 20:27

is acceptable. There are some tasks 20:31

where getting it sort of 70% right is is 20:33

good enough and others where you need 20:36

99.9% to to deploy. I think that 20:37

honestly I think it's probably a lot 20:41

more fun to build for use cases where uh 20:43

70 80% is good enough because then you 20:46

can really get to the frontier of what 20:50

AI is capable of. But I think that we're 20:51

sort of pushing up the the reliability 20:54

as well. So I think that uh we will see 20:59

more and more of these tasks. I think 21:01

that uh right now human AI collaboration 21:03

is is going to be the sort of most 21:06

interesting place because I think that 21:09

for the most advanced tasks you're 21:11

really going to need humans in the loop. 21:13

But I do think in the longer term there 21:14

will be more and more tasks that can be 21:16

fully automated. 21:17

Can you say more about what you think 21:18

the world is going to look like with 21:21

this human to AI loop collaboration? 21:22

because there's the essay from Dario 21:25

with machines of love and grace that he 21:28

paints this picture that's very 21:31

optimistic and what are the details of 21:33

how we get there with with this book? 21:35

I think that we already see some of some 21:38

of that happening. So at least when I 21:41

talk to folks who work in say biomedical 21:43

research um with the right sort of 21:46

orchestration I think it's possible to 21:49

take frontier AI models now and produce 21:51

interesting valuable insights for say 21:57

drug discovery. Um so I think that's 22:00

already starting to happen. I guess an 22:02

aspect of it that that I think about is 22:05

that like there there's sort of 22:07

intelligence that requires a lot of 22:10

depth um and and intelligence that 22:11

requires a lot of breadth. So for 22:15

example in math you can sort of work on 22:16

trying to prove one theorem for a decade 22:19

like the threemon hypothesis or firmat's 22:22

last theorem. Um I think that's that's 22:24

sort of solving one very specific very 22:26

hard problem. I think there's a lot of 22:28

areas of science, probably more so in 22:31

biology, maybe interestingly in 22:33

psychology or or history, where putting 22:35

together a very very large number of 22:39

pieces of information um across many 22:43

many different areas is kind of where 22:46

it's at. And I think that AI models 22:48

during the pre-training phase kind of 22:51

embibe all of human civilization's 22:53

knowledge. And so I suspect that there's 22:56

a lot of uh fruit to be picked in using 22:58

that sort of feature of AI that it knows 23:03

much much more than any one human expert 23:05

and therefore you can kind of elicit um 23:08

insights putting together many different 23:12

uh many different areas of expertise say 23:14

across biology for for for research. So 23:17

I think that um we're making a lot of 23:19

progress on making AI better at deeper 23:22

tasks like hard coding problems, hard 23:25

math problems, but I suspect that 23:27

there's a particular overhang in areas 23:28

where putting together knowledge that 23:31

maybe no one human expert would have 23:34

where that kind of intelligence is is is 23:36

very useful. So I think that's something 23:39

that I' I'd expect to see more of. Um is 23:41

sort of leveraging AI's sort of breadth 23:44

of knowledge. In terms of how exactly it 23:46

will roll out, I really don't know. It's 23:49

really really hard to predict the 23:51

future. Scaling laws give you one way of 23:52

predicting the future which says this 23:56

trend is going to continue. I think a 23:58

lot of trends that we see 24:00

over the long haul I expect will 24:03

continue. I mean the economy, the GDP, 24:05

uh the these kinds of trends are really 24:09

reliable indicators of the future. But I 24:11

think in terms of in detail how will 24:13

things be implemented, I think it's 24:15

really really hard to say. 24:16

Are there specific areas that you think 24:17

a lot more builders could go into and 24:20

build with these new models? I mean 24:23

there's a lot that has been done let's 24:25

say for coding tasks but what are some 24:27

tasks that have a lot more green field 24:30

that are just getting unlocked right now 24:32

with the current models 24:34

I come from a research background rather 24:36

than uh rather than business so I don't 24:38

I don't know that I have anything very 24:41

uh very deep to say but I think that 24:43

like in general any place where um it 24:44

requires a lot of skill um and it's a 24:49

task that mostly involves sort 24:52

sitting in front of a computer 24:55

interacting with data. I think finance 24:56

uh people who use Excel spreadsheets a 24:59

lot. Um I think I I expect law although 25:01

maybe maybe maybe law uh is is is more 25:06

regulated requires more uh more more 25:09

expertise um as a stamp of approval. But 25:12

I think all of these areas are probably 25:14

green field. I think another that that I 25:16

sort of mentioned is how do we integrate 25:19

AI into existing businesses? I think 25:23

that like when electricity came along, 25:26

there was some long adoption cycle and 25:28

the very first simplest ways of say 25:31

using electricity weren't necessarily uh 25:33

the best. You wanted to not just replace 25:36

a steam engine with an electric motor. 25:38

You wanted to sort of remake the way 25:41

that factories work. And I think that 25:43

probably leveraging AI to integrate AI 25:44

into parts of the economy um as quickly 25:48

as possible. I expect there's just a lot 25:51

of a lot of leverage there. 25:53

Now other question is you have a 25:54

extensive training as a physicist and 25:56

you were one of the first to really 25:59

observe this trend with scaling laws and 26:01

it probably comes from being a physicist 26:04

and seeing all these exponentials that 26:06

happen naturally in nature. How has that 26:09

training come about with uh being able 26:14

to perform like the best research in the 26:17

world with with with with AI? 26:20

I think the thing that was useful from a 26:22

physics point of view is looking for the 26:24

biggest picture, most macro trends and 26:28

then trying to make them as precise as 26:31

possible. So I remember meeting like 26:33

kind of brilliant AI researchers who 26:36

would say things like learning is 26:38

converging exponentially 26:41

and I would just ask really dumb 26:43

questions like are you sure it's an 26:45

exponential? Could it just be a power 26:47

law? Is it quadratic? Like like exactly 26:49

how is this thing converging? And it's a 26:52

really dumb kind of simple question to 26:55

ask, but basically I think there was a 26:57

lot of fruit to be picked and and 26:59

probably still is in trying to make the 27:01

big trends that you see as precise as 27:04

possible because that I don't know it 27:06

gives you a lot of tools. It allows you 27:08

to ask like what does it really mean to 27:09

move the needle? I think with scaling 27:11

laws, the the holy grail is finding a 27:13

better slope to the scaling law because 27:17

that means that as you put in more 27:19

compute, you're going to get a bigger 27:21

and bigger advantage over other AI 27:24

developers. Um, but until you've sort of 27:27

made precise what the trend is that you 27:30

see, you sort of don't know exactly what 27:32

it means to beat it and and how much you 27:35

can beat it by and how to know 27:37

systematically whether you're you're 27:39

you're achieving that end. So, I think 27:41

those were kind of the tools that that I 27:43

think I used. It wasn't necessarily like 27:45

literally applying say quantum field 27:47

theory to AI. I think that's uh that's a 27:50

little bit too specific. Well, are there 27:52

specific uh physics heruristics like 27:54

reormalization, symmetry that came in 27:57

very handy to really keep observing this 27:59

trend or or measuring it? 28:03

Something that you'll observe if you 28:05

look at AI models is that they're big. 28:06

Neural networks are big. They have 28:09

billions now trillions of parameters. 28:10

That means that they're made out of big 28:12

matrices. and basically studying uh 28:15

approximations 28:19

where you 28:21

take the limit that neural networks are 28:23

very big and specifically that the uh 28:25

matrices that compose neural networks 28:28

are big. That's actually been kind of 28:29

useful and that's something that 28:31

actually was a well-known approximation 28:32

in in physics um and and in math. Um 28:34

that's something that's been applied. 28:37

But I think generally it's really asking 28:39

very naive dumb questions that gets you 28:41

very far. I think AI is really in a 28:43

certain sense only like maybe 101 15 28:45

years old in terms of the current 28:48

incarnation of how we're training AI 28:50

models. That means that it's an 28:52

incredibly new field. A lot of the most 28:53

basic questions haven't been answered 28:56

like questions of interpretability, how 28:58

AI models really work. And so I think 29:01

there's there's really a lot to uh to 29:03

learn at that level rather than applying 29:06

very very fancy techniques. Are there 29:09

specific tools in physics that you apply 29:11

for interpretability? 29:14

I would say that interpretability is a 29:15

lot more like biology. It's a lot more 29:17

like neuroscience. So I think those are 29:19

kind of the tools. Um there there is 29:21

there is some more more more mathematics 29:23

there. But I I think it's more like 29:26

trying to understand the features of the 29:28

brain. Um the benefit that you get with 29:30

AI over neuroscience is that um you can 29:33

really measure everything in AI. You 29:36

can't measure the the activity of every 29:38

neuron, every syninnapse in a brain, but 29:41

you can do that in AI. So there's much 29:43

much much more data for reverse 29:45

engineering how AI models work. 29:48

Now when aspect about scaling laws, 29:50

they've held for over five orders of 29:52

magnitude, which is wild. This is a bit 29:56

of a contrarian question, but what 29:58

empirical sign would convince you that 30:01

the curve are changing that maybe we're 30:05

getting off the curve? 30:07

I think it's a really I think it's a 30:09

really hard question, right? Because I 30:10

mostly use scaling laws to diagnose 30:12

whether AI training is broken or not. 30:14

Mh. 30:16

So I think that uh once you see 30:16

something and you find it very it's a 30:20

very compelling trend, it becomes very 30:21

very interesting to examine 30:24

where it's failing. But I think that my 30:27

first inclination is to think if scaling 30:29

laws are failing, it's because we've 30:32

screwed up AI training in some way. 30:34

Maybe we got uh we got the architecture 30:36

of the neural network wrong or there's 30:39

some bottleneck in training that we 30:42

don't see or there's some problem with 30:43

precision in the algorithms that we're 30:45

using. So I think it would take a lot to 30:47

convince me at least that scaling was 30:51

really no longer working at the level of 30:54

the sort of these empirical laws because 30:55

so many times in my experience over the 30:57

last 5 years when it seemed like scaling 31:00

was broken it was because we were doing 31:01

it wrong. 31:03

Interesting. So I guess going into 31:04

something very specific that goes hand 31:06

in hand is a lot of the compute power 31:08

required to go keep going on this curve. 31:10

What happens uh as compute becomes more 31:14

more scarce how far down do you go into 31:17

the precision ladder like do you explore 31:21

things like FP4 do you explore things 31:23

like turnary representations what what 31:26

are your thoughts around that? Yeah, I 31:28

mean I think that um right now AI is 31:30

really inefficient because there's a lot 31:34

of value in AI. So um there's a lot of 31:37

value in unlocking the most capable 31:39

frontier model. Um and so companies like 31:44

Anthropic and others are moving as 31:47

quickly as we can to both make AI 31:49

training more efficient and AI inference 31:52

more efficient as well as unlocking 31:54

frontier capabilities. But a lot of the 31:56

focus really is on uh unlocking the 31:58

frontier. I think that over time as AI 32:00

becomes more and more widespread, I 32:05

think that we're going to really drive 32:08

down the cost of inference and training 32:10

dramatically from where we are right 32:13

now. I mean right now we're seeing sort 32:15

of 3x to 10x gains algorithmically and 32:17

in sort of scaling up compute um and in 32:22

uh inference efficiency per year. I 32:25

guess like the joke is that we're going 32:29

to get computers back into binary. So I 32:31

think that we will see much much lower 32:33

precision as one of the many avenues to 32:36

make inference more efficient over time. 32:38

But sort of we h we're very very very 32:41

out of equilibrium with AI development 32:43

right now. AI is improving very rapidly. 32:45

Things are changing very rapidly. We 32:47

haven't fully realized the potential of 32:49

current models, but we're unlocking more 32:52

and more capabilities. So I think that 32:54

what the equilibrium situation looks 32:56

like where AI isn't changing that 32:58

quickly, I think is one where AI is 33:01

extremely inexpensive, but it's sort of 33:03

hard to know if we're even going to get 33:05

there. like AI may just keep getting 33:07

better so quickly that uh sort of 33:09

improvements in int intelligence unlock 33:11

so much more and so we may continue to 33:13

focus on that rather than say getting 33:15

precision down to FP2 33:18

which is very much uh the Jebans paradox 33:21

as intelligence becomes better and 33:24

better people are going to want it more 33:26

not that is driving the cost down which 33:29

is this irony right 33:31

yeah absolutely I mean I think that uh 33:33

yeah that's that's certainly certainly 33:35

something that we've seen that there are 33:36

certain uh certain points where AI 33:38

becomes accessible enough. That said, um 33:41

I think as AI systems become more and 33:45

more capable um and can do more and more 33:48

of the work that that we do, it's going 33:51

to be worth it to pay for uh frontier 33:53

capabilities. I think it's a question 33:55

that I've always had and can have is 33:57

kind of like is all of the value at the 34:00

frontier or is there a lot of value with 34:02

kind of cheaper systems that aren't 34:05

quite as capable? And I think the sort 34:07

of time horizon picture is maybe one way 34:09

of thinking about this. I think that you 34:11

can do a lot of very simple bite-sized 34:14

tasks, but I think it's just much more 34:15

convenient to be able to use an AI model 34:18

that can do a very complex task end to 34:21

end rather than requiring us as humans 34:23

to sort of orchestrate a much dumber 34:26

model to break the task down into very 34:28

very small slices and put them together. 34:30

So, I do kind of expect that a lot of 34:32

the value is going to come from the most 34:33

capable models, but I might be wrong. It 34:35

it might depend and it might really 34:38

depend on the capabilities of AI 34:40

integrators to sort of leverage AI 34:43

really efficiently. 34:44

What advice would you give this audience 34:45

which there everyone is early in the 34:48

career with lots of potential in terms 34:50

of how do you stay relevant in the 34:52

future where all these models are going 34:55

to become so awesome. What should 34:57

everyone be really good at and study and 34:59

to still do really good work? I think as 35:03

I mentioned there's a lot of value in 35:06

understanding how these models work and 35:09

being able to really efficiently 35:12

leverage them and and integrate them and 35:13

I think there's a lot of value in kind 35:15

of like building building at the 35:17

frontier. Um I don't know we could turn 35:19

it over to the audience for for 35:21

questions. 35:23

Let's turn it out to the audience for 35:23

some questions. 35:24

I had a quick question on the scaling 35:26

loss. You show that a lot of the scaling 35:27

laws are like linear that like the more 35:30

we have exponential compute going up but 35:32

then like we have linear progress in uh 35:34

in the scaling loss but then on your 35:36

last slide you show that you expect then 35:38

suddenly like an exponential growth in 35:40

like how much time we save. I want to 35:42

ask you like why do you think that 35:45

suddenly on this chart we're exponential 35:46

and not linear anymore? 35:48

Thank you. 35:50

Yeah, this is a really good question and 35:52

I don't know. Um I mean the meter 35:53

finding was kind of an empirical 35:56

finding. Um the way that I tend to think 35:58

about this is that um in order to do 36:01

more and more complex logger horizon 36:04

tasks um what you really need is some 36:06

ability to self-correct. You need to be 36:09

able to sort of identify that you've 36:12

you've you make a plan and then you 36:13

start executing in the plan. But 36:15

everyone knows that our plans are kind 36:16

of worthless and uh and we encounter 36:18

reality. we get things wrong. And so I 36:21

think that a lot of what determines the 36:24

horizon length of what models can 36:26

accomplish is their ability to notice 36:28

that they're doing something wrong and 36:30

and correct it. Um, and I think that's 36:32

not sort of like a lot of bits of 36:34

information. It doesn't necessarily 36:36

require a huge change in intelligence to 36:37

sort of notice one or two more times 36:40

that you've made a mistake and how to 36:42

correct that mistake. But if you sort of 36:44

fix your mistake, maybe you sort of on 36:46

the order sort of double the horizon 36:48

length of the task because like instead 36:50

of getting stuck here, you get stuck 36:52

twice as far twice as far out. So I 36:54

think that's sort of the picture that I 36:56

have that like you can kind of unlock 36:58

longer and longer horizons with 36:59

relatively modest improvements in your 37:01

kind of ability to understand the task 37:04

and self-correct. But that just kind of 37:06

like those are just words. I think the 37:09

empirical trend is maybe the most 37:11

interesting thing. And uh maybe we can 37:13

build more detailed models for why that 37:15

trend is true, but it's sort of your 37:18

guess is as good as mine. 37:20

Yeah. So I also have a question over 37:22

here. Um so it's an honor. Um so 37:24

basically um in terms of um increasing 37:26

the time horizon, I feel like so my 37:29

mental model of neuronet networks is 37:31

very simple. If you want them to do 37:32

something, you train on such data. Um so 37:34

if you want them to um if you want to 37:37

increase the um time horizon you have to 37:39

slowly get for example verification 37:41

signals. Now um I think one way to do 37:42

this is via product. So like for example 37:45

um cloud agent and then you use the 37:47

verification signal to incrementally 37:48

improve the model. Now my question is 37:50

basically this works really nicely for 37:52

for example coding where you have a 37:54

product that is sufficiently good such 37:56

that you can deploy it and then get the 37:57

verification signal but what about other 37:59

domains like in other domains are we 38:01

just um scaling data labelers to AGI or 38:03

is there a better approach? Yeah, it's a 38:06

good question. I mean, um, so when when 38:09

sort of skeptics ask me sort of why do I 38:13

think we will be able to sort of scale 38:17

and get something like broadly human 38:20

level AI, it's basically because of of 38:21

what you said. there is some sort of 38:24

very kind of operationally intensive 38:26

path where you just sort of build more 38:29

and more different tasks for AI models 38:31

to do that are more and more complex, 38:34

more and more long horizon and you just 38:35

sort of turn the crank and train with RL 38:38

on those those more more complicated 38:40

tasks. So I sort of feel like that's the 38:43

worst case for AI progress. And I mean 38:44

given the level of investment in AI and 38:48

I think the the sort of level of value 38:50

that I think is being created with AI, I 38:52

think people will do that if necessary. 38:54

That said, I think there are a lot of 38:57

ways of sort of making it simpler. The 38:59

best is to have an AI model that is 39:01

trained to oversee and supervise what uh 39:05

claw like you have claude say which 39:09

you're training to be clawed when you 39:11

have another AI model that's sort of 39:13

providing supervision and is not just 39:14

saying did you do this incredibly 39:17

complicated task correctly like did you 39:19

become a faculty member and get tenure 39:23

will that take six or seven years is 39:25

that like an endto-end task where at the 39:27

end you sort of either get tenure or not 39:28

over seven that's that's ridiculous. 39:30

That's very inefficient. But instead can 39:32

provide more detailed supervision that 39:34

says you're doing this well, you're 39:36

doing this poorly. Um I think that sort 39:38

of as we're able to use AI more and more 39:40

in that kind of way, we'll probably be 39:43

able to make training for very long 39:45

horizon tasks more efficient and I think 39:47

we're already doing this to some extent. 39:49

We'll do one last question. 39:51

Yeah, I wanted to build on top of that. 39:53

when you're basically developing like 39:55

these tasks and then training them with 39:57

RL, would are you like like would you 39:59

like try creating these tasks like using 40:02

large language models like the tasks you 40:04

use for RL or are you still using 40:07

humans? 40:09

Great question. So I would say a mix. Um 40:10

I mean obviously we're building the 40:13

tasks as much as possible using AI to 40:14

sort of like say generate tasks with 40:17

code. we do like also uh ask humans to 40:20

create tasks. So it's it's basically 40:25

some mixture of those things. Um I think 40:27

that as AI gets better and better, 40:29

hopefully we're able to leverage AI more 40:31

and more, but of course the frontier of 40:33

the difficulty of these tasks also 40:35

increases. So I think humans are are are 40:37

still going to be involved. 40:39

Okay. Thank you. 40:40

All right. Let's give it a round of 40:41

applause to Jared. 40:43

Thank you so much. Thanks. 40:45

– English Lyrics

📲 "" is trending – don’t miss the chance to learn it in the app!

By

Viewed

53,478

Language

English

Learn this song

Lyrics & Translation

[English]

Hey everyone. Um, I'm Jared Kaplan. I'm

going to talk briefly about scaling and

the road to human level AI, but my guess

is for this audience, a lot of these

ideas are pretty familiar, so I'll keep

it short and then we're going to do a

sort of fireside chat Q&A with uh with

Diana. I actually have only been working

on AI for about six years. I uh before

that had a long career, the vast

majority of my career as a theoretical

physicist. um working in academia. And

so uh how did I get to AI? Well, I I I

want to be brief. Why did I start in

physics? It was basically because my mom

was a science fiction writer and I

wanted to figure out if we could build a

faster than light drive and physics was

the way to do that. Um I also was very

excited about just understanding the

universe. How do things work? How do the

biggest trends that underly sort of

everything that we see around us, where

does that all come from? For example, is

the universe deterministic? Do we have

free will? I was very, very interested

in all of those questions. But

fortunately, along the way, uh during my

career as a physicist, I met a lot of

very, very interesting, very deep

people, including many of the uh

founders of Anthropic that I now work

with all of the time. And uh I was

really interested in what they were

doing and I kept track of it. And as I

moved from different uh among different

subject areas in physics from large

hadron collider physics, particle

physics, cosmology, string theory, um

and on I got a little bit frustrated, a

little bit bored. I didn't feel like we

were making progress quickly enough. And

a lot of my friends were telling me that

AI was becoming a really big deal. Um

and I didn't believe them. I was really

skeptical. I thought, well, AI, people

have been working on it for 50 years.

SVMs aren't that exciting. Um, that was

all we knew about back in 2005, 2009

when I was in school. But I got

convinced that that maybe AI would be an

exciting field to work on. Um, and I I

got very lucky to know the right people

and the rest is history. So uh I'm going

to talk a little bit about how our

contemporary AI models work and how

scaling is leading them to get better

and better. So there are really two

fundamental phases to the training of

contemporary AI models like claude

chatgpt

etc. The first phase is pre-training and

that's where we train AI models to

imitate human written data, human

written text and understand the

correlations underlying that data. And

these these figures are very very retro.

This is actually from the playground of

the original GPD3 model. And you can see

that as a speaker at a journal club,

you're probably elephant me to say

certain things. is the word elephant in

that sentence is really really unlikely.

What pre-training does is teach models

what words are likely to follow other

words in large corporate of text and now

with contemporary models multimodal

data. The second phase of training for

contemporary AI models is reinforcement

learning. This is another very retro

slide. Um it shows the original

interface we used for sort of claude

zero or claude negative one uh back in

the ancient days of 2022

when we were collecting feedback data.

And what you see here is basically the

interface for having a conversation with

very very early versions of Claude and

picking which response from Claude was

better according to you, according to

crowdworkers, etc. And using that

signal, we optimize, we reinforce the

behaviors that are chosen to be good,

that are chosen to be helpful, honest,

and harmless. And we discourage the

behaviors that are bad. So really all

there is to training these models is

learning to predict the next word and

then doing reinforcement learning to

learn to do useful tasks. And it turns

out that there are scaling laws for both

of these phases of training. So this is

a a figure that that we made five or six

years ago now and it shows how as you

scale up the pre-training phase of AI,

you predictably get better and better

performance for our models. And this was

something that came about because I was

just sort of asking the dumbest possible

question. As a physicist, that's what

you're trained to do. You sort of look

at the big picture and you ask really

dumb things. I'd heard it was very

popular in the 2010s to say that big

data was important and so I just wanted

to know how big should the data be? How

important is it? How much does it help?

Similarly, a lot of people were noticing

that larger AI models performed better.

And so we just asked the question, how

much better do these models perform? And

we got really lucky. We found that

there's actually something very very

very precise and surprising underlying

AI training. This really blew us away

that there are these nice trends that

are as precise as anything that you see

in physics or or astronomy. And these

gave us a lot of conviction to believe

that AI was just going to keep getting

smarter and smarter in a very

predictable way. Because as you can see

in these figures already back in 2019,

we were looking across many many many

orders of magnitude in compute, in data

set size, in neural network size. And so

we expected once you see something is

true over many many many orders of

magnitude you expect it's probably going

to continue to be true for a long time

further. So this has sort of been one of

the fundamental things that I think

underlies uh uh improvements in in AI.

The other is actually also something

that started to appear quite a long time

ago although it's become really really

impactful uh in the last couple of years

is that you can see scaling laws in the

reinforcement learning phase of AI

training. So uh a researcher about four

years ago decided to study scaling laws

for Alph Go. Basically putting together

two very very high-profile AI successes,

GPD3 and scaling for pre-training and

AlphaGo. This was just a researcher uh

Andy Jones working on his own uh with

like his own I think maybe single GPU

back in these sort of ancient days. And

so he couldn't study AlphaGo, that was

expensive, but he could study a simpler

game called Hex. So he made this plot

that you see here. Now, ELO scores, I

think, weren't as as as well known um

back then, but all EOS ELO scores are,

of course, is chess ratings. They

basically describe how likely it is for

one player to beat another in a game of

chess. They're used now to benchmark AI

models to see sort of how often does a

human prefer one AI model to another.

But but back then this is just sort of

the classic application of ELO scores as

as chess ratings. And he looked at as

you train different models to play this

game of hex, which is a very simple

board game, a bit simpler than than Go,

how do they do? And he saw these

remarkable straight lines. So it's sort

of a skill in science to notice very

very simple trends and and this was one

I think it went unnoticed. I think

people didn't focus on this this sort of

kind of scaling behavior in RL soon

enough but but eventually it came to

pass. So we see that basically you can

scale up the compute in both

pre-training and RL and get better and

better performance. And I think that's

sort of the fundamental thing that is

driving AI progress. It's not that AI

researchers are really smart or they

suddenly got smart. It's that we found a

very very simple way of making AI better

systematically and and we're we're

turning that crank. So what kinds of

capabilities is this unlocking? I tend

to think of AI capabilities on two axes.

I think the less interesting axis, but

it's still very important is basically

the the flexibility of AI, the ability

of AI to meet us where we are. So if you

put say Alph Go on this figure, it would

be very very far below the X-axis

because although Alph Go was super

intelligent, it was better than any Go

player at playing Go, it was uh only

able to operate in the universe of a Go

board. But we've made steady progress

since the advent of large language

models making uh AI that can deal with

many many many all of the modalities

that that people can deal with. We don't

have AI models I think that uh that have

a sense of smell. Um but that's that's

probably coming. And so as you go up the

y- axis here you get to AI systems that

can do more and more relevant things in

in the world. I think the more

interesting axis though is sort of the

the x-axis here which is how long it

would take a person to do to do the

kinds of tasks that AI models can do and

that's something that has been

increasing steadily as we increase the

capability of AI. This is sort of the

time horizon for for tasks and um an

organization meter studied this very

systematically and found yet another

scaling trend. They found that if you

look at uh the length of tasks that AI

models can do, it's doubling roughly

every 7 months. And so what this means

is that the increasing intelligence that

is being baked into AI by scaling

compute for pre-training and RL is

leading to predictable useful

tasks that the AI models uh can can do,

including longer and longer horizon

tasks. And so you can sort of speculate

about where this is heading. And in AI

2027 folks did. And this kind of picture

suggests that over the next few years we

may reach a point where AI models um can

do tasks that don't just take us minutes

or hours but days, weeks, months, years

etc. Eventually, we imagine AI models or

or millions of AI models perhaps working

together will be able to do the work

that whole human organizations can do.

They'll be able to do the kind of work

that the entire scientific community

currently does. Um, one of the nice

things about math or theoretical physics

is that you can make progress just by by

thinking. Um and so you can imagine AI

systems working together to make the

kind of progress that the theoretical

physics community makes in in say 50

years in a matter of days, weeks etc. So

what is left if if this sort of picture

of scaling can take us very far? What is

left? I think that what may be left in

order to unlock um kind of human level

AI broadly construed is relatively

simple. One of the most important

ingredients I think is relevant

organizational knowledge. So we need to

train AI models that don't just greet

you with a blank slate but can learn to

work within companies, organizations,

governments as though they have the kind

of context that someone who's been

working there for years has. So I think

AI models need to be able to work with

knowledge. They also need memory. What

is memory if not knowledge? I

distinguish it in the sense that as you

do a task that takes you a very very

long time, you need to keep track of

your progress on that specific task, you

need to build relevant memories and you

you need to be able to use them. And

that's something that we've uh we've

begun to build into into Claude 4 and I

think will become increasingly

important. A third ingredient that I

think that we need to get better at and

and we're making progress on is

oversight. the ability of AI models to

understand sort of fine grained nuances

to solve hard fuzzy tasks. So it's easy

right now and you see an explosion of

progress for us to train AI models that

can say write code that passes tests or

that answer math questions correctly

because it's very crisp what's correct

and what's incorrect. So it's very easy

to apply reinforcement learning to make

AI models uh do better and better at

those kinds of tasks. But what we need

and are developing are AI models that

help us to generate much more nuanced

reward signals so that we can leverage

reinforcement learning to do to do

things like tell good jokes, write good

poems, um and have good taste in in

research. The other ingredients that we

need, I think, are are are simpler. We

obviously need to be able to train AI

models to do more and more complex

tasks. We need to work our way up the

y-axis from text models to multimodal

models to robotics. Um, and I expect

that over the next few years, we'll see

increasing uh continued gains from scale

when applied applied to these these

different domains.

And so how should we sort of prepare for

this this future these possibilities? I

think there are a few a few things that

I always recommend. One is I think it's

really a good idea to build things that

don't quite work yet. This is probably

always a good idea. We always want to

have ambition, but I think specifically

AI models right now are getting better

very very quickly. And I think that's

going to continue. That means that if

you build uh a product that doesn't

quite work because Claude 4 is still a

little bit too dumb, um you could expect

that there'll be a Claude 5 coming that

will make that make that product work

and deliver a lot of value. So I think

that's that's something that I always

recommend is sort of experiment on the

boundaries of what AI can do because

those boundaries are moving rapidly. The

next point I think is that AI is going

to be helpful for integrating AI. I

think that one of the main bottlenecks

for AI is really just that it's

developing so quickly that we haven't

had time to integrate it into

products, companies, other thing

everything else that we we we do into

into science. Um, and so I think that in

order to sort of speed that process up,

I think leveraging AI for AI integration

is going to be is going to be very

valuable. And then finally, I mean, I

think this is sort of obvious for for

this crowd, but I think figuring out

where adoption of AI could happen very

very quickly is is key. Um, we're seeing

uh an explosion of AI integration for

coding. And there are a lot of reasons

why software engineering is a great

place for AI, but I think the big

question is sort of what's next? Um,

what beyond software engineering can

grow that that quickly? I don't know the

answer, of course. Um, but hopefully you

guys will figure it out. So that's it

for for for the talk. Um, I want to

invite Diana on stage for uh for a chat.

YC's next batch is now taking

applications. Got a startup in you?

Apply at y combinator.com/apply.

It's never too early and filling out the

app will level up your idea. Okay, back

to the video. That was a awesome talk

about all the scaling laws and recently

Anthropic just launched clot 4 which is

just available. Curious uh how does it

change what is possible as all these

model releases keep compounding for the

next 12 months?

I think that uh we'll be in trouble if

it's 12 months before before an even

better model comes out. But uh I guess

uh a few things with with Cloud 4. I

think that with Cloud 3.7 Sonnet

uh it was already really exciting to use

3.7 for coding. But I think something

that everyone noticed was that 3.7 was a

little bit too eager. Um sometimes it

just really wanted to make your tests

pass. Um and it would do things that

that you you don't really want. Uh there

are a lot of like try excepts things

like that. Um, so with Cloud 4, I think

that we've been able to improve the

model's ability to act as an agent

specifically for coding, but but in a

lot of other ways for search, for all

kinds of other applications. Um, but

also improve its supervision, the sort

of oversight that I I I mentioned in my

talk, so that it uh it follows your

directions and hopefully improves in in

code quality. I think the other thing

that we've worked on is improving its

ability to uh save and store memories

and we hope to see people leveraging

that because Claude 4 can blow through

its context window with a very complex

task but can also uh store memories as

files or records, retrieve them in order

to sort of keep doing work across many

many many context windows. But I guess

finally I think the picture that scaling

laws paint is one of incremental

progress. And so I think that what

you'll see with Claude is that steadily

it gets better in lots of different ways

with each release. Um but I think that

scaling really suggests a kind of smooth

curve towards what I expect is kind of

human level AI or AGI.

Is there some special feature that a lot

of the audience here are going to get

excited? some some beta that you can

some alpha leak you can give everyone on

what you think people are going to fall

in love with the new APIs.

I think the thing that I I'm most

excited about is sort of uh memory

unlocking longer and longer horizon

tasks. I think that like as as time goes

on we're going to see Claude as a

collaborator that can sort of take on

larger and larger chunks of work. This

is to your point of all these future

models being able to take bigger and

bigger tasks right now. At this point,

they're able to do tasks in the hours.

Yeah, I think so. I think it's a very

imprecise measure, but I think that

right now if you look at sort of

software engineering tasks, I think

meter literally benchmarked how long it

would take people to do various tasks

and uh and yeah, I think it's a time

scale of of hours. I think just gen like

broadly as people work with AI,

I think that the people who are skeptics

of AI will say correctly that AI makes

lots of stupid mistakes. Um, it can do

things that are absolutely brilliant and

and surprise you, but it can also make

uh make basic errors. I think one of the

sort of basic features of of AI that's

different about the shape of AI

intelligence compared to human

intelligence is that there are a lot of

things that I can't do but I can at

least judge whether they were done

correctly. I think for AI the judgment

versus the generative capability is much

closer which means that I think that uh

a major role people can play in

interacting with AI is kind of as

managers to sort of sanity check uh

sanity check the the work

which is fascinating because one of the

things we observe through the batches in

YC last year a lot of companies when

they were out and selling products they

were selling it more still as a co-pilot

where you would have a co-pilot let's

say for customer support where you still

need the last human approval before they

would send the reply for a customer but

one thing that has changed just in the

spring batch I think a lot of the AI

models are very capable to do task end

to end to your point that which is uh

remarkable founders are selling now

directly replacements of full workflows

how have you seen this translate to what

you hope the audience will build.

I think there are a lot of

possibilities. Basically, it's a

question of

what level of success or performance is

is acceptable. There are some tasks

where getting it sort of 70% right is is

good enough and others where you need

99.9% to to deploy. I think that

honestly I think it's probably a lot

more fun to build for use cases where uh

70 80% is good enough because then you

can really get to the frontier of what

AI is capable of. But I think that we're

sort of pushing up the the reliability

as well. So I think that uh we will see

more and more of these tasks. I think

that uh right now human AI collaboration

is is going to be the sort of most

interesting place because I think that

for the most advanced tasks you're

really going to need humans in the loop.

But I do think in the longer term there

will be more and more tasks that can be

fully automated.

Can you say more about what you think

the world is going to look like with

this human to AI loop collaboration?

because there's the essay from Dario

with machines of love and grace that he

paints this picture that's very

optimistic and what are the details of

how we get there with with this book?

I think that we already see some of some

of that happening. So at least when I

talk to folks who work in say biomedical

research um with the right sort of

orchestration I think it's possible to

take frontier AI models now and produce

interesting valuable insights for say

drug discovery. Um so I think that's

already starting to happen. I guess an

aspect of it that that I think about is

that like there there's sort of

intelligence that requires a lot of

depth um and and intelligence that

requires a lot of breadth. So for

example in math you can sort of work on

trying to prove one theorem for a decade

like the threemon hypothesis or firmat's

last theorem. Um I think that's that's

sort of solving one very specific very

hard problem. I think there's a lot of

areas of science, probably more so in

biology, maybe interestingly in

psychology or or history, where putting

together a very very large number of

pieces of information um across many

many different areas is kind of where

it's at. And I think that AI models

during the pre-training phase kind of

embibe all of human civilization's

knowledge. And so I suspect that there's

a lot of uh fruit to be picked in using

that sort of feature of AI that it knows

much much more than any one human expert

and therefore you can kind of elicit um

insights putting together many different

uh many different areas of expertise say

across biology for for for research. So

I think that um we're making a lot of

progress on making AI better at deeper

tasks like hard coding problems, hard

math problems, but I suspect that

there's a particular overhang in areas

where putting together knowledge that

maybe no one human expert would have

where that kind of intelligence is is is

very useful. So I think that's something

that I' I'd expect to see more of. Um is

sort of leveraging AI's sort of breadth

of knowledge. In terms of how exactly it

will roll out, I really don't know. It's

really really hard to predict the

future. Scaling laws give you one way of

predicting the future which says this

trend is going to continue. I think a

lot of trends that we see

over the long haul I expect will

continue. I mean the economy, the GDP,

uh the these kinds of trends are really

reliable indicators of the future. But I

think in terms of in detail how will

things be implemented, I think it's

really really hard to say.

Are there specific areas that you think

a lot more builders could go into and

build with these new models? I mean

there's a lot that has been done let's

say for coding tasks but what are some

tasks that have a lot more green field

that are just getting unlocked right now

with the current models

I come from a research background rather

than uh rather than business so I don't

I don't know that I have anything very

uh very deep to say but I think that

like in general any place where um it

requires a lot of skill um and it's a

task that mostly involves sort

sitting in front of a computer

interacting with data. I think finance

uh people who use Excel spreadsheets a

lot. Um I think I I expect law although

maybe maybe maybe law uh is is is more

regulated requires more uh more more

expertise um as a stamp of approval. But

I think all of these areas are probably

green field. I think another that that I

sort of mentioned is how do we integrate

AI into existing businesses? I think

that like when electricity came along,

there was some long adoption cycle and

the very first simplest ways of say

using electricity weren't necessarily uh

the best. You wanted to not just replace

a steam engine with an electric motor.

You wanted to sort of remake the way

that factories work. And I think that

probably leveraging AI to integrate AI

into parts of the economy um as quickly

as possible. I expect there's just a lot

of a lot of leverage there.

Now other question is you have a

extensive training as a physicist and

you were one of the first to really

observe this trend with scaling laws and

it probably comes from being a physicist

and seeing all these exponentials that

happen naturally in nature. How has that

training come about with uh being able

to perform like the best research in the

world with with with with AI?

I think the thing that was useful from a

physics point of view is looking for the

biggest picture, most macro trends and

then trying to make them as precise as

possible. So I remember meeting like

kind of brilliant AI researchers who

would say things like learning is

converging exponentially

and I would just ask really dumb

questions like are you sure it's an

exponential? Could it just be a power

law? Is it quadratic? Like like exactly

how is this thing converging? And it's a

really dumb kind of simple question to

ask, but basically I think there was a

lot of fruit to be picked and and

probably still is in trying to make the

big trends that you see as precise as

possible because that I don't know it

gives you a lot of tools. It allows you

to ask like what does it really mean to

move the needle? I think with scaling

laws, the the holy grail is finding a

better slope to the scaling law because

that means that as you put in more

compute, you're going to get a bigger

and bigger advantage over other AI

developers. Um, but until you've sort of

made precise what the trend is that you

see, you sort of don't know exactly what

it means to beat it and and how much you

can beat it by and how to know

systematically whether you're you're

you're achieving that end. So, I think

those were kind of the tools that that I

think I used. It wasn't necessarily like

literally applying say quantum field

theory to AI. I think that's uh that's a

little bit too specific. Well, are there

specific uh physics heruristics like

reormalization, symmetry that came in

very handy to really keep observing this

trend or or measuring it?

Something that you'll observe if you

look at AI models is that they're big.

Neural networks are big. They have

billions now trillions of parameters.

That means that they're made out of big

matrices. and basically studying uh

approximations

where you

take the limit that neural networks are

very big and specifically that the uh

matrices that compose neural networks

are big. That's actually been kind of

useful and that's something that

actually was a well-known approximation

in in physics um and and in math. Um

that's something that's been applied.

But I think generally it's really asking

very naive dumb questions that gets you

very far. I think AI is really in a

certain sense only like maybe 101 15

years old in terms of the current

incarnation of how we're training AI

models. That means that it's an

incredibly new field. A lot of the most

basic questions haven't been answered

like questions of interpretability, how

AI models really work. And so I think

there's there's really a lot to uh to

learn at that level rather than applying

very very fancy techniques. Are there

specific tools in physics that you apply

for interpretability?

I would say that interpretability is a

lot more like biology. It's a lot more

like neuroscience. So I think those are

kind of the tools. Um there there is

there is some more more more mathematics

there. But I I think it's more like

trying to understand the features of the

brain. Um the benefit that you get with

AI over neuroscience is that um you can

really measure everything in AI. You

can't measure the the activity of every

neuron, every syninnapse in a brain, but

you can do that in AI. So there's much

much much more data for reverse

engineering how AI models work.

Now when aspect about scaling laws,

they've held for over five orders of

magnitude, which is wild. This is a bit

of a contrarian question, but what

empirical sign would convince you that

the curve are changing that maybe we're

getting off the curve?

I think it's a really I think it's a

really hard question, right? Because I

mostly use scaling laws to diagnose

whether AI training is broken or not.

Mh.

So I think that uh once you see

something and you find it very it's a

very compelling trend, it becomes very

very interesting to examine

where it's failing. But I think that my

first inclination is to think if scaling

laws are failing, it's because we've

screwed up AI training in some way.

Maybe we got uh we got the architecture

of the neural network wrong or there's

some bottleneck in training that we

don't see or there's some problem with

precision in the algorithms that we're

using. So I think it would take a lot to

convince me at least that scaling was

really no longer working at the level of

the sort of these empirical laws because

so many times in my experience over the

last 5 years when it seemed like scaling

was broken it was because we were doing

it wrong.

Interesting. So I guess going into

something very specific that goes hand

in hand is a lot of the compute power

required to go keep going on this curve.

What happens uh as compute becomes more

more scarce how far down do you go into

the precision ladder like do you explore

things like FP4 do you explore things

like turnary representations what what

are your thoughts around that? Yeah, I

mean I think that um right now AI is

really inefficient because there's a lot

of value in AI. So um there's a lot of

value in unlocking the most capable

frontier model. Um and so companies like

Anthropic and others are moving as

quickly as we can to both make AI

training more efficient and AI inference

more efficient as well as unlocking

frontier capabilities. But a lot of the

focus really is on uh unlocking the

frontier. I think that over time as AI

becomes more and more widespread, I

think that we're going to really drive

down the cost of inference and training

dramatically from where we are right

now. I mean right now we're seeing sort

of 3x to 10x gains algorithmically and

in sort of scaling up compute um and in

uh inference efficiency per year. I

guess like the joke is that we're going

to get computers back into binary. So I

think that we will see much much lower

precision as one of the many avenues to

make inference more efficient over time.

But sort of we h we're very very very

out of equilibrium with AI development

right now. AI is improving very rapidly.

Things are changing very rapidly. We

haven't fully realized the potential of

current models, but we're unlocking more

and more capabilities. So I think that

what the equilibrium situation looks

like where AI isn't changing that

quickly, I think is one where AI is

extremely inexpensive, but it's sort of

hard to know if we're even going to get

there. like AI may just keep getting

better so quickly that uh sort of

improvements in int intelligence unlock

so much more and so we may continue to

focus on that rather than say getting

precision down to FP2

which is very much uh the Jebans paradox

as intelligence becomes better and

better people are going to want it more

not that is driving the cost down which

is this irony right

yeah absolutely I mean I think that uh

yeah that's that's certainly certainly

something that we've seen that there are

certain uh certain points where AI

becomes accessible enough. That said, um

I think as AI systems become more and

more capable um and can do more and more

of the work that that we do, it's going

to be worth it to pay for uh frontier

capabilities. I think it's a question

that I've always had and can have is

kind of like is all of the value at the

frontier or is there a lot of value with

kind of cheaper systems that aren't

quite as capable? And I think the sort

of time horizon picture is maybe one way

of thinking about this. I think that you

can do a lot of very simple bite-sized

tasks, but I think it's just much more

convenient to be able to use an AI model

that can do a very complex task end to

end rather than requiring us as humans

to sort of orchestrate a much dumber

model to break the task down into very

very small slices and put them together.

So, I do kind of expect that a lot of

the value is going to come from the most

capable models, but I might be wrong. It

it might depend and it might really

depend on the capabilities of AI

integrators to sort of leverage AI

really efficiently.

What advice would you give this audience

which there everyone is early in the

career with lots of potential in terms

of how do you stay relevant in the

future where all these models are going

to become so awesome. What should

everyone be really good at and study and

to still do really good work? I think as

I mentioned there's a lot of value in

understanding how these models work and

being able to really efficiently

leverage them and and integrate them and

I think there's a lot of value in kind

of like building building at the

frontier. Um I don't know we could turn

it over to the audience for for

questions.

Let's turn it out to the audience for

some questions.

I had a quick question on the scaling

loss. You show that a lot of the scaling

laws are like linear that like the more

we have exponential compute going up but

then like we have linear progress in uh

in the scaling loss but then on your

last slide you show that you expect then

suddenly like an exponential growth in

like how much time we save. I want to

ask you like why do you think that

suddenly on this chart we're exponential

and not linear anymore?

Thank you.

Yeah, this is a really good question and

I don't know. Um I mean the meter

finding was kind of an empirical

finding. Um the way that I tend to think

about this is that um in order to do

more and more complex logger horizon

tasks um what you really need is some

ability to self-correct. You need to be

able to sort of identify that you've

you've you make a plan and then you

start executing in the plan. But

everyone knows that our plans are kind

of worthless and uh and we encounter

reality. we get things wrong. And so I

think that a lot of what determines the

horizon length of what models can

accomplish is their ability to notice

that they're doing something wrong and

and correct it. Um, and I think that's

not sort of like a lot of bits of

information. It doesn't necessarily

require a huge change in intelligence to

sort of notice one or two more times

that you've made a mistake and how to

correct that mistake. But if you sort of

fix your mistake, maybe you sort of on

the order sort of double the horizon

length of the task because like instead

of getting stuck here, you get stuck

twice as far twice as far out. So I

think that's sort of the picture that I

have that like you can kind of unlock

longer and longer horizons with

relatively modest improvements in your

kind of ability to understand the task

and self-correct. But that just kind of

like those are just words. I think the

empirical trend is maybe the most

interesting thing. And uh maybe we can

build more detailed models for why that

trend is true, but it's sort of your

guess is as good as mine.

Yeah. So I also have a question over

here. Um so it's an honor. Um so

basically um in terms of um increasing

the time horizon, I feel like so my

mental model of neuronet networks is

very simple. If you want them to do

something, you train on such data. Um so

if you want them to um if you want to

increase the um time horizon you have to

slowly get for example verification

signals. Now um I think one way to do

this is via product. So like for example

um cloud agent and then you use the

verification signal to incrementally

improve the model. Now my question is

basically this works really nicely for

for example coding where you have a

product that is sufficiently good such

that you can deploy it and then get the

verification signal but what about other

domains like in other domains are we

just um scaling data labelers to AGI or

is there a better approach? Yeah, it's a

good question. I mean, um, so when when

sort of skeptics ask me sort of why do I

think we will be able to sort of scale

and get something like broadly human

level AI, it's basically because of of

what you said. there is some sort of

very kind of operationally intensive

path where you just sort of build more

and more different tasks for AI models

to do that are more and more complex,

more and more long horizon and you just

sort of turn the crank and train with RL

on those those more more complicated

tasks. So I sort of feel like that's the

worst case for AI progress. And I mean

given the level of investment in AI and

I think the the sort of level of value

that I think is being created with AI, I

think people will do that if necessary.

That said, I think there are a lot of

ways of sort of making it simpler. The

best is to have an AI model that is

trained to oversee and supervise what uh

claw like you have claude say which

you're training to be clawed when you

have another AI model that's sort of

providing supervision and is not just

saying did you do this incredibly

complicated task correctly like did you

become a faculty member and get tenure

will that take six or seven years is

that like an endto-end task where at the

end you sort of either get tenure or not

over seven that's that's ridiculous.

That's very inefficient. But instead can

provide more detailed supervision that

says you're doing this well, you're

doing this poorly. Um I think that sort

of as we're able to use AI more and more

in that kind of way, we'll probably be

able to make training for very long

horizon tasks more efficient and I think

we're already doing this to some extent.

We'll do one last question.

Yeah, I wanted to build on top of that.

when you're basically developing like

these tasks and then training them with

RL, would are you like like would you

like try creating these tasks like using

large language models like the tasks you

use for RL or are you still using

humans?

Great question. So I would say a mix. Um

I mean obviously we're building the

tasks as much as possible using AI to

sort of like say generate tasks with

code. we do like also uh ask humans to

create tasks. So it's it's basically

some mixture of those things. Um I think

that as AI gets better and better,

hopefully we're able to leverage AI more

and more, but of course the frontier of

the difficulty of these tasks also

increases. So I think humans are are are

still going to be involved.

Okay. Thank you.

All right. Let's give it a round of

applause to Jared.

Thank you so much. Thanks.

Key Vocabulary

Start Practicing

Vocabulary	Meanings
scaling /ˈskeɪlɪŋ/ B2	noun - the process of increasing or decreasing in size or extent
AI /ˌeɪˈaɪ/ B1	noun - Artificial Intelligence, the simulation of human intelligence in machines
models /ˈmɒdəlz/ A2	noun - a simplified representation of a system or process
training /ˈtreɪnɪŋ/ A2	noun - the process of teaching or learning a skill
reinforcement /rɪˈɪnfərsəmənt/ C1	noun - the process of encouraging or strengthening a behavior
learning /ˈlɜːnɪŋ/ A1	noun - the process of acquiring knowledge or skill
compute /kəmˈpjuːt/ B1	verb - to calculate or determine using a computer
data /ˈdeɪtə/ A2	noun - facts and statistics collected together for reference or analysis
intelligence /ɪnˈtelɪdʒəns/ B1	noun - the ability to learn, understand, and think in a logical way
capabilities /kəˈpeɪbɪlɪtiz/ B2	noun - the ability to do something
tasks /tɑːsks/ A1	noun - a piece of work to be done or undertaken
horizon /həˈraɪzən/ B1	noun - the limit of a person's mental perception, experience, or interest
memory /ˈmeməri/ A2	noun - the faculty by which the mind stores and remembers information
oversight /ˈoʊvərsaɪt/ C1	noun - the action of overseeing or the state of being overseen
integration /ˌɪntɪˈgreɪʃən/ B2	noun - the process of combining or coordinating different elements
progress /ˈprəʊɡres/ A2	noun - forward or onward movement toward a destination

🚀 "scaling", "AI" – from “” still a mystery?

Learn trendy vocab – vibe with music, get the meaning, and use it right away without sounding awkward!

Key Grammar Structures

Coming Soon!

We're updating this section. Stay tuned!

Related Songs