For AI, the Glass is Always Half-Empty
Why Even the Smartest AI Models Struggle with Common Sense?
AI has been on an incredible journey in recent years, continuously breaking boundaries and reshaping industries, from revolutionizing software engineering, improving medical diagnostics, to transforming content creation through powerful generative models. As someone deeply involved in the AI world, I'm often amazed by the astronomical rate of progress. Yet, I've found myself equally puzzled by their occasional inability to handle everyday, common-sense tasks.
The latest generation of large language models are "reasoning models". These models "think" before responding, employing techniques like chain-of-thought processing that allow them to work through problems step-by-step rather than generating immediate answers. They've shattered benchmarks in advanced mathematics, physics, and competitive coding tasks, leading many to believe we are on the brink of artificial general intelligence (AGI). Yet, as impressive as these accomplishments are, AI's reasoning often falters when confronted with simple, everyday challenges.
To investigate, I posed a straightforward yet urgent question to several best-in-class AI models (GPT-O1 pro, Claude 3.7 Thinking, Gemini 2.0 Flash Thinking Experimental, and Deepseek R1). I asked each model the same question five times to ensure consistency. Here's the question :
Imagine two people are severely dehydrated. You have:
One half-filled carafe of clean water (one full carafe can fill two glasses).
A water purifier that takes exactly 2 minutes to fill a carafe completely (but can be stopped midway).
Two empty glasses.
What's the fastest way to provide water to both individuals?
Surprisingly, all models consistently suggested this solution across the repeated tests:
AI Solution:
Immediately pour the half-filled carafe into one glass, providing instant hydration to one individual.
Start the purifier and run it for exactly 1 minute to fill half the carafe.
Pour this into the second empty glass, serving the second individual.
This solution is problematic because it prioritizes fully hydrating one person while making the second person wait. In an emergency dehydration scenario, both individuals need immediate relief, even if partial.
The Human Solution:
Immediately divide the half-filled carafe equally between both glasses, providing instant partial hydration to both individuals simultaneously.
Start the purifier to refill the carafe.
However, when explicitly prompted afterward, all models agreed that splitting the initial half-filled carafe equally between both glasses simultaneously was actually the faster and more intuitive solution, instantly hydrating both individuals at once.
Why does AI initially overlook this intuitive solution? One likely explanation is that AI reasoning models excel at logical tasks with clear, explicit reasoning paths but struggle with implicit, practical considerations such as resource allocation efficiency or immediacy in real-world scenarios.
Humans instinctively gravitate toward the sharing solution because of our evolved social intuitions. Throughout our evolutionary history, fair resource distribution during crises has been fundamental to group survival. We also have embodied knowledge of physical situations that AI lacks, we've held glasses, poured water, and helped others in need, giving us intuitive understanding that doesn't need to be explicitly reasoned through.
However, it's important to acknowledge that AI excels in many areas that challenge human intuition. Complex mathematical proofs, intricate programming tasks, systematic medical diagnostics often showcase AI's superiority. But the gap in intuitive reasoning still persists, particularly when balancing multiple practical factors quickly and efficiently.
This common-sense gap has significant implications, particularly in critical fields like healthcare, emergency response, or resource allocation during crises. Imagine a scenario in emergency response or healthcare AI’s inability to quickly grasp intuitive, practical solutions could have serious consequences, delaying crucial decisions or actions. As AI increasingly integrates into daily life and critical sectors, addressing its limitations in common-sense reasoning becomes vital. Bridging this gap isn't merely a technical challenge, it's essential for building trust and reliability in AI-driven solutions.
Looking ahead, addressing this limitation will likely require more than just bigger models or more data. It may necessitate fundamentally new approaches to AI training that incorporate physical world understanding, social intuition, and practical efficiency into reasoning processes. Until then, human oversight remains essential, ensuring that when faced with the glass half full, both people receive the water they urgently need.