Back to all posts
Saket's Blog

AI Agents, Tool Use, and MCP: What Actually Matters

2026-03-03
12 min read
AIAgentsTool UseMCPLLM

AI Agents, Tool Use, and MCP: What Actually Matters

If you spend even a week around AI Twitter, conference talks, or startup demos, you will hear the word "agent" so often that it starts to lose meaning. Everything is an agent. A prompt is an agent. A loop is an agent. A chatbot with a button is an agent. A workflow with three API calls is apparently an agent too.

That is a problem, because once a word becomes vague enough, it stops helping you build.

The practical question is not "Are agents the future?" The practical question is much more boring and much more useful:

What kind of system am I actually building, when does tool use help, and when should I avoid turning a clean workflow into fake autonomy?

This post is about answering that question clearly.

Start With the Baseline: A Chatbot Is Not Automatically an Agent

A normal chatbot takes input and returns output. It may have conversation history, a system prompt, and maybe some retrieved context. But if it is basically responding turn by turn without taking actions, that is still mostly a chatbot interface.

That is not a criticism. A lot of useful products are exactly that.

Examples:

  • a writing assistant
  • a coding explainer
  • a support bot over documentation
  • a summarization app

These systems can be great without being agents.

The distinction matters because people often reach for "agent" language before they have identified the task clearly.

What Makes a System Agent-Like?

A system starts to look more like an agent when it can:

  • decide on intermediate steps
  • call tools
  • inspect tool results
  • continue based on those results
  • maintain task state
  • stop only when the goal is complete or a boundary is hit

A simple example:

  1. User asks for a meeting summary and action items
  2. System fetches the transcript
  3. System extracts open tasks
  4. System checks the project tracker for ownership
  5. System returns the summary plus assigned next steps

That is more agent-like than a plain chatbot because the model is participating in a multi-step action flow rather than only replying with text.

Still, even here the important part is not the label. The important part is that the system is orchestrating work.

Tool Use Is the Real Turning Point

The biggest practical difference between a basic LLM app and a more capable system is tool use.

Without tools, the model can only generate text based on the context it has.

With tools, it can do things like:

  • search internal documentation
  • query a database
  • call an API
  • read a file
  • write a draft
  • create a ticket
  • check inventory
  • trigger a workflow

This matters because most real work is not purely linguistic. Real tasks involve external systems.

If a user asks, "What is the current status of order 81724?" a pure LLM cannot know that unless the status is in the prompt. A tool-enabled system can query the order system and answer based on live data.

That is the jump from fluent text generation to useful action.

Tool Calling in Plain Language

When people say a model can "call tools," they usually mean this:

The model is given a set of allowed functions or capabilities, along with descriptions of what each tool does and what inputs it expects. Based on the user request, the model can decide that it needs one of those tools, return a structured tool call, wait for the system to execute it, then continue with the result.

In plain English:

The model can ask the surrounding software to do something.

For example, if the available tools are:

  • searchDocs(query)
  • getOrderStatus(orderId)
  • createTask(title, owner)

and the user says:

Check order 81724 and create a follow-up task if it is delayed.

the flow may look like:

  1. Model requests getOrderStatus(orderId="81724")
  2. App executes the tool
  3. Tool returns "Delayed by 3 days"
  4. Model requests createTask(...)
  5. App executes the tool
  6. Model responds to the user with the result

That is the essence of tool-augmented orchestration.

Workflow vs Agent: A Difference Worth Respecting

One of the most useful distinctions in AI engineering is this:

Workflow

A workflow is a system where the steps are mostly predefined.

Example:

  1. Retrieve docs
  2. Summarize
  3. Extract action items
  4. Store result

This is predictable, auditable, and often much easier to trust.

Agent

An agent has more freedom to choose which steps to take and in what order, within some boundary.

Example:

  • decide whether to search docs first
  • decide whether to query a database
  • decide whether the result is sufficient
  • decide whether another tool call is needed

This flexibility can help with open-ended tasks, but it also introduces more variance and more failure modes.

Many teams would be better served by a clean workflow with selective model use than by a fully open-ended agent loop.

That is not less advanced. It is often better engineering judgment.

Memory and State: Another Source of Confusion

People often say they want an agent with memory. Usually they mean one of several different things.

Conversation History

The system remembers recent turns by including them in context.

Persistent User Memory

The system stores facts or preferences across sessions, such as:

  • preferred writing style
  • team role
  • project context
  • previous decisions

Task State

The system keeps track of what it has already done in a multi-step process.

These are not the same thing, and conflating them leads to messy design. A useful agent usually needs deliberate state handling, not a vague promise that "the model remembers stuff."

Where Agents Are Actually Useful

Agents are most useful when:

  • the task is multi-step
  • the next step depends on intermediate results
  • tool use is necessary
  • the environment is dynamic
  • a rigid fixed pipeline would be too brittle

Good examples:

  • research assistants that search, filter, compare, and draft
  • support systems that inspect multiple internal systems before responding
  • coding assistants that read files, run checks, and propose edits
  • operations assistants that gather status from several tools and compose a response

In these cases, some bounded autonomy can be valuable.

Where Agents Are Usually Overkill

Agents are often the wrong choice when:

  • the task is simple and repetitive
  • correctness matters more than flexibility
  • the steps are already known
  • you need strict auditability
  • every extra loop adds cost and latency

Examples:

  • extracting fields from invoices
  • summarizing a single document
  • classifying a ticket
  • formatting meeting notes
  • routing a request to one of five known destinations

For tasks like these, a deterministic workflow with one or two model calls is often the cleanest option.

What MCP Means in Plain Language

MCP, or Model Context Protocol, is easier to understand than the name suggests.

The practical idea is to standardize how models or AI assistants connect to tools, resources, and external systems.

Instead of every tool integration being a one-off custom glue layer, MCP aims to provide a common interface for exposing capabilities to an AI system.

In plain language:

MCP is a way to make tools and context easier for AI systems to discover and use consistently.

That can include things like:

  • files
  • documentation
  • databases
  • internal APIs
  • app-specific actions

Why does this matter? Because once AI systems start using tools seriously, integration sprawl becomes a real engineering problem. A standard interface can reduce that friction.

What Actually Matters More Than the Buzzword

Whether you use MCP specifically or another tool integration pattern, the real engineering questions stay the same:

  • What tools does the system have access to?
  • What arguments can it pass?
  • What outputs come back?
  • What permissions should exist?
  • What should be logged?
  • What should require confirmation?
  • How does the system recover from bad tool results?

This is where practical teams win. Not by saying "agent" more often, but by designing clear contracts between the model and the rest of the software.

The Failure Modes to Watch

Agent-like systems can fail in ways that plain chatbots do not.

Wrong Tool Choice

The model may choose the wrong action path.

Bad Arguments

The tool is correct, but the inputs are incomplete or malformed.

Looping

The system keeps trying steps that do not meaningfully improve the result.

Overreach

The system takes action where a human confirmation should have been required.

Hidden Fragility

The demo works beautifully, but only on happy-path inputs.

This is why guardrails, step limits, approvals, observability, and evaluation matter so much in agentic systems.

A Better Way to Think About Agents

Instead of asking, "Should I build an agent?" ask questions like:

  • What is the task?
  • Which parts require retrieval?
  • Which parts require tool access?
  • Which decisions can be fixed in code?
  • Where is model judgment genuinely useful?
  • Where do I need human approval?

Those questions lead to better architectures than chasing a category label.

Often the best design is a hybrid:

  • deterministic flow for the stable parts
  • model judgment for the fuzzy parts
  • tool access for live data or actions
  • explicit boundaries for safety and cost control

That is how many real production systems end up being built.

The Real Skill Here

The real skill is not agent hype literacy. It is orchestration judgment.

Can you tell the difference between:

  • a chat interface
  • a retrieval system
  • a tool-enabled workflow
  • a bounded agent
  • an overcomplicated demo pretending to be autonomy

If you can, you are already thinking more clearly than a huge portion of the market.

References & Further Reading

Closing Thoughts

Agents matter, but not in the cartoon version. What matters is building systems that can retrieve, reason within limits, use tools safely, and complete useful tasks in the real world. Sometimes that system should be agent-like. Sometimes it absolutely should not.

In the final post of this series, we will bring everything together into a practical learning roadmap: what to study first, what tools to use, what projects to build, and how to go from curiosity to real AI engineering ability without getting lost in noise.

Next in the series: Learning AI the Smart Way: Projects, Stack, and Study Plan.