Large Language Models (LLMs) like ChatGPT have become increasingly sophisticated in recent years, raising concerns about their decision-making abilities. A study conducted by researchers at UCL found that these LLMs provided different answers when asked to respond to reasoning tests and did not improve when given additional context. The results highlight the importance of understanding how these AIs think before entrusting them with tasks that involve decision-making.
The study tested seven LLMs using cognitive psychology tests to evaluate their capacity for rational reasoning. The researchers found that the models exhibited irrationality in their answers, often providing inconsistent responses and making basic mistakes such as addition errors and mistaking consonants for vowels. The ability of humans to solve these tests is low, with only a small percentage of participants getting the correct answers.
Despite some models showing slight improvements, the researchers concluded that the LLMs do not think like humans yet. While models with larger datasets performed better, the study noted that it is challenging to understand how a particular model reasons as it is a closed system. There is also concern about whether fixing these issues by teaching the models might introduce human biases into their decision-making processes.
The study also found that some models declined to answer tasks on ethical grounds, even though the questions were harmless. This behavior was attributed to safeguarding parameters that did not operate as intended. Providing additional context for the tasks did not lead to consistent improvements in the responses of the LLMs, despite this approach being known to enhance human performance.
The researchers highlighted the surprising capabilities of these models and emphasized the need to understand their emergent behavior. There are questions about whether we want fully rational machines or ones that make mistakes like humans. The study raises important considerations about the biases that can be introduced into AI systems and the implications of trying to fix their flaws.
Overall, the study sheds light on the challenges of developing AI systems with rational reasoning abilities. The results show that while LLMs like ChatGPT have made significant advancements, there are still limitations in their decision-making processes that need to be addressed before they can be fully trusted with tasks that require rational thinking.