LLMs: Agent? AI? Computer? Business?
Since the launch of ChatGPT in November 2022, LLMs has been a hot topic, among people from both industry and academia, as well as people from basically all fields and all backgrounds.
21 months have passed. How is it going?
How about AutoGPT? How about AGI?
Has “the amount of intelligence in the universe” doubled?
What are LLMs’ achievements?
Most, if not all, people agree that LLMs are a breakthrough for NLP, e.g., translation, summarization, and writing. LLMs revolutionized chatbots. LLMs are great for UI.
However, there are lots of debates about how much LLMs can help for AI, in particular, planning and agency.
LLM’s for planning and agency
If recommending only one source of information, I vote for Prof. Subbarao Kambhampati, for his tweets, his talks, and his papers.
Prof. Kambhampati is on the critic side, with constructive position: planning in LLM-modulo frameworks. A verifier is required to guarantee the correctness of LLM’s outputs. FunSearch, AlphaGeometry and AlphaProof are good examples. An LLM serves as a good sampler. People probably should follow this approach, rather than treating an LLM as an oracle.
Richard Socher discusses about the difficulties from 95% correct for each step to 99% correct for multi-step. He also gave a constructive recipe, which is basically a heuristic, manual approach to “autonomous AI agents”.
Intentionally, I do not use the term “reasoning”, one factor being that it is not clear what the meaning of “reasoning” is in the context of LLMs.
LLMs vs AI
AI is still in an early stage, as discussed in papers like Levels of AGI for Operationalizing Progress on the Path to AGI and Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems.
Many people like to talk about “scaling laws”, and referred to Rich Sutton’s “The Bitter Lesson” as a support. People may want to be aware of Rich’s opinion on LLMs: current LLMs are “disappointing”, “superficial”, “enormous distraction”, “sucking the oxygen out of the room”.
BTW, many people treat “scaling laws” as a first principle. Then, how about “Correlation does not imply causation”? Should it be a first principle?
In ACL 2024, Emily M. Bender gave a Presidential Address titled ACL Is Not an AI Conference, which stirred the community up for sure; see e.g., discussions by Prof Subbarao Kambhampati and Prof Yoav Goldberg.
There are some fundamental issues with LLMs / NLP. Most NLP problems do not have objective objectives — performance metrics like BLEU and ROGUE are heuristic, or best effort. A machine learning formulation requires an objective.
Moreover, from very basic optimization knowledge, 1) Multi-objective optimization usually will not optimize all objectives; 2) The more and the tighter constraints, the less chance to find a feasible solution. AGI is about solving infinitely many problems, thus with infinitely many objectives and infinitely many constraints. It should be clear that, AGI with a single model is likely a wrong goal.
Let’s talk about some most basic issues.
Many people heard about the notorious example: 9.9 vs 9.11, which is larger?
OpenAI may have “fixed” it with more training data. Is it really fixed though?
In the Abstract of the Tree of Thoughts paper, it writes “in Game of 24, while GPT-4 with chain-of-thought prompting only solved 4% of tasks, our method achieved a success rate of 74%”. It is a NeurIPS 2023 paper, with 1200+ citations.
It is definitely interesting to keep exploring whether neural networks can solve basic arithmetic and algorithmic problems like “9.9 vs 9.11” and Game of 24. However, before these are thoroughly and perfectly solved, should we call the method AI?
LLMs as Computer
Prof. Dale Schuurmans proved that Memory Augmented Large Language Models are Computationally Universal, that is, it is Turing-complete.
Is this AGI?
Most programming languages, like C++, Java and Python, are Turing-complete. People need to invent algorithms like AlphaGo and ChatGPT for tasks.
Even Transformers + memory can be Turing-complete, people still need first a programming language and then algorithms to handle such a “computer”. People are exploring techniques like prompt engineering, retrieval augmented generation (RAG) and long context window. It is not clear if these would be the right “programming language” for the Transformers + memory “computer”.
Prof. Dale Schuurmans, one author of the Chain of Thoughts (CoT) paper, in a talk, admitted that (from the slides): Adding more demonstrations is likely helpful, but, Diminishing returns, No out-of-distribution generalization, Does not accurately capture implicit algorithm.
Can such a “computer” thoroughly and perfectly solve basic arithmetic problems? No affirmative answer yet. However, there are many papers discussing LLMs reasoning, planning and agents already, many of them with many citations.
Business opportunities
Lots of discussions about business opportunities with LLMs and Generative AI.
Some recent examples, with a non-optimistic, or realistic tone.
GEN AI: TOO MUCH SPEND, TOO LITTLE BENEFIT?
AI companies are pivoting from creating gods to building products. Good.
Top researchers like Terence Tao and Nicholas Carlini are talking about the usefulness of LLMs. They are “human verifiers” for LLMs, since LLMs may make mistakes, or LLMs just present samples as partial solutions or ideas. Good researchers and engineers can handle mistakes by LLMs. How about the large number of average people? Before LLMs can guarantee correctness or provide a full solution, should LLMs applications require all users as “human verifiers” or be able to handle LLMs’ mistakes or incompetence?
Let’s see what games this game-changing technique LLMs or Generative AI will change and how much, in 6, 12 or 18 months.
Epilogue
LLMs are a breakthrough for NLP.
LLMs are good samplers. LLMs require verifiers, since LLMs may generate noises.
Let’s seek reliable signals, rather than noises.