AI Agents: what works, what doesn't, and what nobody tells you

If 2024 was the year of language models, 2025 is the year of agents. Every week there’s a new framework, a new startup, a new demo of an agent that “does everything on its own.” And every week I get asked the same thing: “Does this actually work?”

The honest answer: it depends on what you ask it to do.

What is an AI agent (without the marketing)

Stripped of the hype layer, an AI agent is a system that receives an objective, breaks down the tasks needed to achieve it, executes those tasks using available tools, and adjusts its plan based on the results it gets.

That’s it. It’s not magic. It’s not general intelligence. It’s a planning-execution-evaluation loop powered by a language model that’s good at understanding context and generating text.

The difference from a normal prompt is that the agent can act. It can call APIs, read files, execute code, search for information. It doesn’t just respond. It does.

Where they work well

After months working with agents in different contexts, the patterns where they truly deliver value are quite specific:

Tasks with clear, verifiable steps. If you can describe the task as a sequence of steps where each result can be verified, an agent works well. Example: extract data from a document, cross-reference with a database, generate a report. Each step has a concrete output that can be validated.

Repetitive tasks with variations. Not exactly the same every time (that’s what scripts are for), but similar with nuances. Example: processing 200 supplier emails that have different formats but ask for the same thing. A script can’t handle the variation. A human wastes hours. An agent navigates the differences.

Research and synthesis. Searching information across multiple sources, extracting what’s relevant, and synthesizing it. Agents are surprisingly good at this because it’s essentially what language models do best: process large amounts of text and extract signal from noise.

Where they fail (still)

Tasks requiring subjective judgment. “Is this design good?” “Does this email sound professional?” “Is this client worth pursuing?” Agents can attempt to answer, but their judgment is inconsistent and lacks the cultural, emotional, or strategic context a human brings.

Long chains without checkpoints. If an agent needs to execute 15 sequential steps and the error from step 3 propagates invisibly to step 12, you have a problem. Errors accumulate. Without intermediate verification points, the final result can be completely off track without anyone noticing.

In control engineering they call it “error propagation in cascade systems.” Each stage amplifies the deviation from the previous one. The solution is the same in both fields: intermediate verification and early correction.

Decisions with irreversible consequences. Sending an email to a client, deploying to production, executing a financial transaction. Any action you can’t undo needs human oversight. Agents are tools, not decision-makers.

Complex organizational context. “Ask Juan from the sales team why they rejected that prospect” isn’t something an agent can resolve. Human dynamics, internal politics, tacit organizational knowledge aren’t in any system an agent can query.

What nobody tells you

Agents need infrastructure. It’s not enough to connect a model to some tools. You need error handling, retries, logging, cost limits, timeouts, and supervision. An agent without guardrails is a generator of API invoices and unpredictable results.

Cost can scale fast. Each agent step is a model call. A complex agent can make 20-30 calls to complete a task. Multiply that by volume and cost becomes significant. Cost optimization for agents is a skill few people have.

The demo isn’t the product. A demo where an agent books a flight in 30 seconds is impressive. But that demo doesn’t show what happens when the airline has errors, when the user changes their mind mid-process, when the connection fails, or when the model misinterprets an ambiguous instruction. The distance between demo and production is enormous.

Our perspective

At Redstone Labs we use agents internally and build them for clients. But with a clear rule: the agent is a tool, not a replacement. The human defines the objective, supervises critical checkpoints, and makes irreversible decisions.

AI agents today are like the first cars: revolutionary in concept, limited in practice, and requiring an operator who knows what they’re doing. That will change. But today, the difference between an agent that delivers value and one that creates problems lies in who designs it, how they supervise it, and how honest they are about its limitations.

If someone tells you an agent can “do everything on its own,” ask them how many times it failed along the way. If they don’t have an answer, they haven’t tested it in production.