LLMs struggle with middle school word problems according to Apple researchers

Recently, researchers at Apple conducted a study revealing that large language models (LLMs) fail to solve middle school-level word problems when presented with extraneous information that leads to misdirection in finding the solution. These LLMs, including models such as OpenAI o1-mini and Llama3-8B, demonstrate a lack of genuine logical reasoning capability, as they tend to convert statements to operations without truly understanding the meaning behind the problem. When confronted with additional irrelevant data in a word problem, the LLMs showcased confusion and inability to provide accurate answers

The study also highlighted that the verbosity of a question, measured by the number of AI tokens used, inversely affected the AI's mathematical reasoning, showing that the models struggle when actual reasoning is required rather than merely replicating patterns observed in their training data. While the AI researcher community debates the validity and implications of such findings, it raises concerns about the robustness and limitations of current AI models in understanding and solving complex mathematical problems that involve reasoning and context. ```
https://www.bankinfosecurity.com/llms-fail-middle-school-word-problems-say-apple-researchers-a-26521