Seeking reliable signal
Seeking signal, especially reliable signal, and avoiding noise, is universal.
Examples of reliable signals.
In AlphaGo, game scores are perfect.
First principles and axioms.
There are tools like proof assistants, for example, Lean, based on satisfiability modulo theories (SMT).
Physical laws, for example, Einstein's Theory of Special Relativity.
Reliable signals are ground truth.
Easier said than done.
People may mistake noise with signal.
Where are the confusions then?
First example, computer code.
Code is everywhere.
For code evaluation, people usually use a code interpreter or test cases.
This is correct, to some extent, but not enough.
Successfully running in a code interpreter means this very run is correct.
Passing certain test cases means the code passes these very cases.
That is it.
To guarantee that a piece of code satisfies the specification, formal verification is required.
Program verification is a general source of reliable signals.
Second Example: large language models (LLMs).
All current LLMs are not perfect.
All people agree.
But many people use LLMs for self-generation of data, self-evaluation of results, self-improvements of model, …, almost everything.
Imperfect LLMs won’t generate reliable signals.
Why? Because LLMs are imperfect.
Neural networks are function approximations.
In LLMs, the approximation is likely not close to perfection, or far from it.
LLMs are then samplers from imperfect distributions.
Reliable verifiers are required to guarantee the correctness of samples from LLMs.
FunSearch, AlphaGeometry and AlphaProof are examples with reliable verifiers.
In AlphaProof, an LLM (Gemini and fine-tuning) for sampling + a verifier (Lean) + an RL algorithm (AlphaGo) for optimizing decision making. Together with AlphaGeometry 2, AlphaProof solved 4 out of 6 IMO problems, equivalent to a silver medal.
LLMs may generate noises; a verifier is required to guarantee that LLMs generate reliable signals.
Professor Terrence Tao agrees with this point.
How about simulation?
If the simulator is perfect, it provides reliable feedback. Otherwise, we need to bridge the simulation to reality gap.
How about world model, as discussed by Rich Sutton and Yann LeCun?
The argument is similar to simulation.
An imperfect world model makes mistakes.
Third Example: human involvement.
Assessments made by humans are subjective, not objective.
For example: enjoyment from playing a game, or happiness from watching a comedy.
Actually, most NLP problems do not have objective objectives.
For example: translation, summarization, and sentiment analysis.
There are performance metrics, like BLEU score.
But these are heuristics, or best effort.
In such cases, feedback from humans, ideally experts, provides reliable signals, or ground truth.
AI, including LLMs, can not replace humans yet, since AI is still a rough approximation of humans.
To recap, reliable signals include game scores, physical laws, first principles and axioms, program verification, reliable verifier and when humans are involved, human feedback.
Noises come from imperfect evaluation methods, e.g., code correctness with an interpreter or test cases, and imperfect models, e.g., current LLMs.
Let’s seek reliable signal and avoid noise, for the prosperity of AI.