Rich Sutton on Intelligence, LLMs, Scaling Laws

Yuxi Li

Mar 08, 2025

Rich gave a talk about a perspective on intelligence.

The path to intelligent agents runs through reinforcement learning, and not through for example large language models.

In reinforcement learning, each intelligent agent has its own goal.

Different agents have different goals.

Flourishing comes from decentralized cooperation.

Watch the talk for more.

A perspective on intelligence.

Rich joined a panel discussion, talking about DeepSeeek, scaling laws, and AGI.

The winner, the progress will come from new ideas, at least as probably more so than access to more energy and more computation.

we’ve been fooling ourselves, the whole idea of a scaling law.

But they are not doing delayed rewards and long term outcomes and all those potentially sophisticated things that you would need to do really AGI, anything close to AGI.

5:37-6:55

I think the main thing is that DeepSeek sort of putting the lie to the idea that all we need is computation, that algorithms are not super important. I’ve always felt that algorithms are super important and the reinforcement learning part is part of that. Yeah, they have some new ways of using reinforcement learning. They are just doing some standard innovation. I won’t say again about putting the lie, putting the lie to the idea that all we need is computation and every company should just be working hard to obtain massive amount of energy and computation and that is going to decide the winner. The winner, the progress will come from new ideas, at least as probably more so than access to more energy and more computation. I mean, it is, we’ve been fooling ourselves, the whole idea of a scaling law. The scaling law is that there is a law that says that you get more computation, your performance will improve. I don’t know how people get away with just calling something a law when it is hardly even an empirical generalization of any validity. Maybe it lasted for two months or something, and it becomes a law for the field we are in.

30:30

We expect sort of a fast trajectory, a trajectory where we have near small breakthroughs like this one along the way. The big question is whether you can push AGI with the generative AI as your primary paradigm. Generative AI and large language models are your primary paradigm. I’ve always been very skeptical of this and I don’t think this really change that. I also wanted to say that you are all right on to point out the degrading of the benchmarks and it’s increasing divergence from actual utility. And this could lead to disenchantment or a disappointment at some point.

31:23

And I also want to take a moment I have the floor a little bit to get back to the reinforcement learning part. The reinforcement learning part I was struck reading through the paper, is that it is very diverse. They use it in many different places in many different ways. It is like a general tool for training and shaping the model. Every time you have a feedback mechanism, which is a little bit imprecise, imprecise as opposed to supervised fine-tuning where someone’s going out and say this is the right action. For lots of feedback mechanisms, you can’t say that. But you can say well, this one is better than, you can rank them, or rate them. And that is when you want to use, they want to use reinforcement learning.

32:15

As someone who is doing reinforcement learning for a long time, it is a little bit disappointed that they are not doing the full generality of delayed reward. All the reinforcement learning … I have tried different things and each one worked out a certain well and I can learn from this. I can rank them and match them. And if I am using reinforcement learning methods, I can learn from that, whereas you couldn’t learn with normal supervised learning methods. But they are not doing delayed rewards and long term outcomes and all those potentially sophisticated things that you would need to do really AGI, anything close to AGI.

The above is from: DeepSeek (The Derby Mill Series ep 02), Jan 30

Intrepid Growth Partners' Substack

DeepSeek (The Derby Mill Series ep 02)

The Chinese tech start-up DeepSeek used reinforcement learning and other creative techniques to develop an ultra-efficient chatbot that was less costly to make, uses fewer chips and requires much less electrical power than better-known alternatives, such as OpenAI’s ChatGPT or Anthropic’s Claude…

Listen now

5 months ago · 5 likes · Intrepid Growth Partners

Yuxi’s Substack

Discussion about this post