When math draws a hard line for large language models → Livsy Code

Greetings, traveler!

For the past two years the AI industry has been driven by a simple belief: if models keep growing, today’s limitations will eventually disappear. Better reasoning will emerge. Autonomous agents will become reliable. General intelligence will follow.

A recent paper by Vishal Sikka and his son Varin Sikka quietly challenges that assumption — using mathematics rather than benchmarks or demos.

Their work shows that large language models have a fixed computational ceiling. Some classes of tasks sit permanently above it, no matter how much data or training you throw at the model.

The paper attracted wider attention only after Wired highlighted its conclusions. Once people started reading the research itself, the implications became harder to ignore.

Impossible computation

The Sikkas treat LLMs as what they really are: bounded computational systems.

Each generated token passes through a fixed number of operations — attention layers, matrix multiplications, nonlinear transformations. However advanced the architecture becomes, the amount of computation per prompt remains limited.

From there, the authors apply classical complexity theory.

Some problems grow slowly as input increases. Others explode combinatorially. Many planning, verification, and multi-step agent tasks fall into this second category.

The paper proves that when a prompt implicitly requires more computation than the model can perform during inference:

• the model cannot correctly solve the task
• the model cannot reliably verify a solution either

At that point incorrect output becomes unavoidable. What we usually call hallucination is not a training flaw. It follows directly from computational limits.

Why agentic AI runs into structural barriers

Much of today’s hype revolves around agent systems.

LLMs that plan long sequences of actions, coordinate tools, reason about outcomes, and operate with minimal human supervision.

The Sikkas focus explicitly on these “computational and agentic tasks” and show that many of them cross the model’s feasible complexity threshold surprisingly quickly.

As tasks accumulate steps, branches, and global constraints, the underlying computation grows beyond what a transformer can represent during inference.

The model may still generate fluent responses. It may sound confident. Yet mathematically it is already operating in a regime where correctness cannot be guaranteed.

This explains a pattern many developers have seen firsthand: agents perform well on small workflows, then gradually drift into inconsistent or wrong behavior as complexity increases.

According to the paper, this drift is structural, not temporary.

How this connects to earlier skepticism

The Sikkas are not the first to question the reasoning abilities of language models.

Last year, Apple researchers showed that LLM-based reasoning systems can collapse as problem complexity rises, even on tasks designed to test logical planning rather than language fluency.

Other researchers have argued that prediction systems, no matter how large, do not turn into true reasoning machines. Studies of creative output often find heavy reliance on pattern remixing rather than original construction.

What makes this new work stand out is its formal grounding. Instead of observing failures, it proves that certain failures must happen.

Scaling improves performance within the same computational class. It does not allow a system to jump into a higher one.

Why more data will not remove the ceiling

One of the paper’s most uncomfortable conclusions is that these limits are not about training quality.

You can add parameters.
You can add data.
You can fine-tune endlessly.

The computational structure during inference stays the same.

Some problems simply require more computation than a transformer can express in a single forward pass, regardless of how well trained it is.

This directly challenges the idea that intelligence will inevitably emerge from scale alone.

What this means in practice

From an engineering perspective, the takeaway is practical rather than pessimistic.

LLMs excel at:

• pattern extraction
• summarization
• code generation
• structured transformation
• language-level reasoning

They struggle when asked to replace full algorithmic systems that hide exponential complexity behind natural language.

Hybrid approaches make sense. Classical algorithms for heavy computation. LLMs for interfaces, guidance, and orchestration.

Expecting a single model to autonomously reason through arbitrarily complex domains runs straight into mathematical limits.

The study conducted confirms that, at least in the near future, creating a high-quality product with the help of AI will require even greater attention to fundamentals and responsibility, as well as the ability to coordinate work.

A colder but healthier view of AI progress

Public narratives still talk about near-term superintelligence and fully autonomous agents. Research like this paints a far more constrained picture. Language models can become extraordinarily capable within their computational domain. They already reshape software development and knowledge work.

They also live inside hard theoretical boundaries.

No amount of prompt engineering crosses them.
No dataset bypasses them.
No scale magically removes them.

Understanding those limits early may be one of the healthiest things for the AI industry.

Progress will continue — just without the illusion that bigger models automatically solve everything.

When math draws a hard line for large language models