Can AI Think? Can AI reason?
Do Large Language Models have cognitive Abilities?
There is finally another interesting debate going on right now whether LLMs can "think" or not.
It ties into the larger debate of whether AI is or can become intelligent. And therefore, and inevitable about what intelligence is. And whether it is innate to us humans or rather not just to säugetiere,
Because dolphins are intelligent, a crow and an octopus are also intelligent, leaving us with the realization that what makes us human is not intelligence, just as we humans are not the only social animals.
So we have to look into the details and maybe even overcome some preconceptions. Yes, once again the discussion about AI and AGI moves us beyond the status quo in our models and once we do that everything seems to make a little more sense… unless of it maybe is not about making sense but about “everything everywhere all at once”.
Interesting is the debate insofar as whether LLMs can "think" or not is compared to Systems 1 and Systems 2 thinking that we all know from Kahneman’s seminal work “Thinking Fast and Slow.”
And of course, I picked it up on twitter. Since I am no AI researcher, I invited my GPT-4 assistant to explain to me what it’s all about and help me better understand.
Do LLMs have Cognitive Abilities?
The debate between Geoffrey Hinton and Yann LeCun on whether Large Language Models (LLMs) like ChatGPT can "think" or understand what they are saying highlights a fundamental divide in perceptions of AI capabilities.
Geoffrey Hinton suggests that LLMs might indeed have a form of understanding, indicating a more optimistic view of their cognitive abilities. On the other hand, Yann LeCun argues that while LLMs possess some level of understanding, it is quite limited and superficial. He points out that LLMs often make mistakes and confabulate, demonstrating a lack of common sense understanding.
LeCun emphasizes the need for AI systems to have internal models of the world, allowing them to predict consequences and plan actions. He notes that current Auto-Regressive LLMs, like ChatGPT, lack this ability. He compares their understanding and planning abilities to be lower than that of a cat, highlighting their limitations in understanding the physical world and planning.
LeCun acknowledges that LLMs can accumulate and retrieve large amounts of textual knowledge, but this is still only an approximation of understanding. He contrasts this with a human child's ability to learn and perform physical tasks, like clearing a dinner table, to illustrate the significant gap between current AI and human-level intelligence.
This discussion underscores the ongoing challenges and complexities in the field of AI, particularly in achieving a level of understanding and intelligence comparable to humans.
Vinod Koshla then adds his perspectives to the debate. Not a researcher himself his thoughts further enrich the debate as his position aligns most with what I encounter when debating AGI, intelligence and the capabilities of LLMs in terms of reasoning and understanding.
Khosla posits that reasoning might emerge as an inherent behavior in LLMs as they evolve. He suggests that with each new iteration of these models, we might see a deeper level of reasoning, despite the knowledge being distributed across billions of parameters.
Yann LeCun however challenges this idea. He defines reasoning as a process beyond the capacity of a system that generates a finite number of tokens with a neural network of a fixed number of layers. According to LeCun, the structural limitations of current LLMs inhibit the kind of reasoning he describes.
Geoffrey Hinton responds to LeCun by questioning the basis of his claim. Hinton suggests that LeCun's view might be rooted in the belief that the errors of an autoregressive model would increase with longer outputs. Hinton refutes this by pointing out that the capacity for models to correct their own outputs could mitigate such an increase in errors.
Human-like Reasoning and Understanding
The debate continues with Vinod Khosla and Yann LeCun delving deeper into the concept of reasoning in the context of Large Language Models.
Koshla challenges the definition of reasoning by proposing a more practical benchmark: whether an LLM can surpass the median TED audience member in a reasoning-based Turing test. This perspective shifts the focus from theoretical capabilities to practical, demonstrable intelligence in a real-world context.
In response, LeCun elaborates on the limitations of LLMs in terms of reasoning. He once more points out that LLMs operate with a fixed amount of computation per token, limiting their ability to devote extensive time and effort to solving complex problems. He compares this to the fast, subconscious decision-making process in humans, known as “System 1 Thinking”.
LeCun argues that true reasoning and planning, akin to the human conscious and deliberate "System 2" process, would allow a system to engage in iterative inference, dedicating potentially unlimited time to finding solutions. He acknowledges that some AI systems, like those used in game playing or robotics, have planning abilities, but values these as still limited.
A key component, according to LeCun, is the development of a "world model" – a subsystem capable of predicting the consequences of actions over time. Such a model would enable an AI system to plan a sequence of actions to achieve a goal. However, he notes that building and training such world models is a largely unsolved problem in AI.
YL also highlights the challenge of hierarchical planning – decomposing complex objectives into a sequence of sub-objectives, something humans and animals do effortlessly. He states that this capability is still beyond the reach of current AI systems, underscoring the gap between human-like reasoning and the capabilities of present-day AI.
This discussion illustrates the complexities involved in advancing AI towards a level of reasoning and understanding comparable to humans, and the significant challenges that still lie ahead in this endeavor.
Computational Capacity and Limited Systems 1 Thinking
Subbarao Kambhampati contributes to the discussion by focusing on the inherent limitations of LLMs in terms of computational complexity and reasoning.
Kambhampati emphasizes that the fixed computational capacity per token in standard LLMs means that these models do not take into account the inherent complexity of a problem when generating solutions. This leads to a limitation in the accuracy and relevance of the solutions they propose.
He then connects this limitation to the concept of System 1 and System 2 thinking, as described in Daniel Kahneman's "Thinking, Fast and Slow". SK asserts that you cannot transform the fast, intuitive System 1 type of thinking into the slow, deliberate System 2 type simply by slowing down the process. This highlights a fundamental difference in the nature of these two systems of thought.
Kambhampati humorously illustrates this point with an analogy about the improbability of two bombs being on a plane, criticizing the conflation of causation and correlation. He argues against the notion that simply slowing down the token generation in LLMs will increase their expressiveness or reasoning capabilities.
He mentions the Chain of Thought (CoT) approach, where a human prompter carefully constructs a sequence to simulate reasoning. However, he also points out that this is often more about converting reasoning into a form of retrieval, with no guarantees of correctness.
He suggests two methods to guide LLMs towards better solutions:
(a) pre-training with complex derivational data from humans, or
(b) increasing task-relevant diversity of guesses through iterative prompting strategies, akin to what is done in Chain of Thought/Tree of Thought/Garden of Thought approaches.
However, Subbarao Kambhampati concludes that the final guarantee of correctness must come from an external verifier, as LLMs inherently lack the capability to guarantee the correctness of their outputs.
So what have we learned so far:
The discussion kicked off between Hinton and LeCun underscores the ongoing challenges and complexities in achieving a level of understanding and intelligence comparable to humans. It highlights the varying opinions among leading AI experts on the potential for LLMs to actually achieve higher levels of reasoning and understanding.
Koshla adds interesting thoughts of practical applications and whether being able to convince what is considered “a smart audience” is just enough to be recognized it as intelligent.
While some see significant limitations due to the structural and functional constraints of these models, others anticipate an evolution towards more sophisticated cognitive abilities.
LeCun, and especially Kambhampati then adds a fantastic depth to the ongoing debate. Beyond highlighting the fundamental limitations of current LLMs in replicating human-like reasoning and understanding, Kambhampati insists on the need for external validation in their outputs.
The fact that standard LLMs produce their answers with a fixed amount of computation per token is another way of understanding why the inherent computational complexity of a problem has no bearing whatsoever on how LLMs guess solutions for them and with what accuracy.. (as I… https://t.co/zS4ZuOwDJG— Subbarao Kambhampati (కంభంపాటి సుబ్బారావు) (@rao2z) November 27, 2023