Ground-truth-in-the-loop
Ground-truth-in-the-loop is a basic scientific principle in the design, development and evaluation of systems.
For an AI/machine learning system, this includes trustworthy training data and evaluation feedback. When planning is involved, a reliable world model is necessary. For a system with human users, human data and feedback are paramount, and human-in-the-loop is relevant or may be a must.
Prominent AI systems like search engines and large language models are built on valuable data, from the Internet and from user feedback, which serve as proxy for ground truth. AlphaGo series and games AI have made remarkable achievements, where a perfect game rule, i.e. a model, is a core factor: it can generate high quality or perfect data, including game scores.
We may start building a system with a learned, approximate model or simulator. However, we should not deploy an AI system trained purely from a simulator, especially for high stake systems like healthcare, robotics and autonomous vehicles. When optimizing a system to achieve a heuristic goal, i.e., a non-ground-truth goal, even an optimal solution may be astray. We evaluate a system with ground truth for dependable performance results. A system should not self-evaluate itself, e.g., a student should not self-grade the assignment.
When building a system without ground-truth, researchers, scientists and engineers are responsible to inform non-experts such limitations and potential issues.
For language problems, we may ask
What is ground-truth?
How to have ground-truth?
How to design datasets/benchmarks/evaluation methods to respect ground-truth?
Large language models are approximations, thus are not ground truth. To build successful systems, let’s have ground-truth-in-the-loop.