Reflection 2024, Guesstimation 2025
1.
RL returns.
Test time compute.
Following OpenAI o1 and o3.
How to reflect the following?
Achieving AGI by scaling up GPT with next token prediction.
Achieving AGI by prompt engineering, e.g., CoT and ReAct, relying on LLMs’ (emergent) capacities.
Why not talk about RL in 2023?
(RLHF is inverse RL, or imitation learning, not “true” RL.)
2.
LLM progress has hit a wall.
Pre-training as we know it will end. Ilya Sutskever @ NeurIPS 2024
3.
2025 may be the year for naked swimmers.
Only when the tide goes out do you discover who's been swimming naked. Warren Buffett
AGI
Autonomous agents
Autonomous software engineering
… …
In the 3rd year of the LLM era, revenues may weigh more than visions.
Of course, great if you can find applications for LLMs, that
do not care about mistakes, and
require human-in-the-loop.
Test time compute may not help, in general.
It may need precise objective / reward function and reliable signals.
Maths and coding may be the easiest in the LLM era.
Yet they are very hard.
4.
In 2025, test time compute may hit a wall.
People are building an information perpetual motion machine, by self-generation of data, self-evaluation of results, and self-improvement of models, all based on imperfect LLMs.
This is the way many people are working with LLMs.
This applies to test time compute too, if without reliable information.
5.
Pre-training is in the very beginning.
GPT with next token prediction may have hit a wall.
Lots of alternatives to explore though:
Alternatives to Transformers
Alternatives to GPT
Alternatives to next token prediction
Alternatives to self-supervised learning
6.
RL may return fully. Or not.
People have finally discovered RL’s prowess for LLMs.
With test time compute.
Why not one step further?
Pre-training with RL?
Esp. for agents!
This requires resources.
This may require a paradigm shift: from generalist to specialist.
This may require another paradigm shift: from large to small.