Agents Are Eating the AI Stack
Most AI commentary still talks like we are watching a model race.
I think that is already the wrong frame.
The important thing happening now is not that models are getting a little better at answering questions. The important thing is that software is being reorganized around work you can hand off. Once you see that, the whole AI stack starts to look different. Models matter. Hardware matters. Cloud matters. Security matters. Auth matters. Power grids matter.
They are no longer separate stories.
An agent that can write code, open pull requests, call APIs, deploy a Worker, read private data, manage credentials, and retry when something fails is not a chatbot. It is a junior operator with shell access and bad judgment until proven otherwise.
That is exciting.
It is also how you get a costly mess.
The model race is becoming a work race
For the last few years, the public scoreboard was simple: which model is smarter? GPT versus Claude versus Gemini versus Llama versus DeepSeek versus whatever surprise model launched while you were making coffee.
That scoreboard still matters, but it is getting less useful.
OpenAI’s recent Codex research is a good marker. By May 2026, 80.6% of sampled individual Codex users had made at least one request estimated to exceed 30 minutes of experienced-human work. 70.2% exceeded one hour. 25.6% exceeded eight hours.1
That is not autocomplete.
That is delegation.
Once people start throwing multi-hour tasks at agents, the product problem changes. The user is no longer asking, “Can the model answer this?” The user is asking, “Can I trust this thing to work while I do something else?”
That sounds like a small distinction. It is not.
Autocomplete needs a good next token. Delegated work needs memory, permissions, tools, retries, logs, policy, cost controls, rollback, and a way to prove what happened. The app stops being a nice text box around a model. It becomes an operating environment.
This is why I do not buy the claim that “AI apps are dead because models get better.” Some apps will die, sure. A lot of thin wrappers deserve it.
But the real app layer is moving down into infrastructure. The product is not the prompt box. The product is the system that makes delegation safe enough, cheap enough, and useful enough that a normal person will trust it.
The most boring layer is suddenly the prize
The best comment from the last30days run came from r/AI_Agents. Someone asked about interesting agent projects. u/Ecstatic-Use-1353 answered:
For me, the most interesting direction lately is agent infrastructure rather than the agent itself. Planning and memory are cool, but the real bottleneck seems to be the connector layer: OAuth, tokens.2
That is exactly right.
It is also the part most demo videos skip.
In a demo, an agent clicks around a browser and completes a task. It looks like magic. In production, the same agent needs to know which user it represents, what it is allowed to access, how long that permission lasts, where tokens are stored, how scopes are revoked, how audit logs work, what happens if a tool call fails halfway through, and who pays for the retries.
The model is not the hard part anymore. Or at least, it is not the only hard part.
TGVP’s AI agent infrastructure report says 79% of companies are actively adopting AI agents, but only 2% have deployed agents at scale.3 You can quibble with survey methodology, but the shape feels right. Everyone wants the outcome. Almost nobody wants to own the plumbing.
That gap is where a lot of new software companies will be built.
Not “agent builder” companies in the vague sense. We have enough magical workflow canvases. I mean the dirty pieces:
- Agent auth
- Tool permissions
- Context stores
- Execution sandboxes
- Audit trails
- Human approval gates
- Secret handling
- Cost routing
- Evals tied to real tasks
- Rollback for agent-created changes
- Logs a tired engineer can read at 11:40 PM
The money will not all go to the company with the cutest agent avatar. It will go to whoever makes agents boring enough to use in real operations.
I mean boring as praise.
AI coding is becoming less about writing code and more about supervising work
The software engineering story is getting misread too.
The lazy take is: “AI writes code now, so developers are doomed.” The equally lazy counter-take is: “AI writes bad code, so nothing changes.”
Both miss what is actually happening.
Developers are becoming reviewers, task designers, environment managers, and evidence inspectors. The core skill is moving from “can I type the function?” to “can I define the work, constrain the agent, verify the output, and merge safely?”
The raw research was full of this. Hacker News surfaced projects like Ponytrail, a local audit trail for AI coding-agent edits.4 GitHub results were loaded with AI review bots, skipped reviews, migration PRs, and agent workflow tools. There were also threads asking whether anyone is building real software with agents, which is the right skeptical question.5
I have used coding agents enough to feel both sides of this.
They can save you from boring work. They can also create new work that looks suspiciously like babysitting a fast intern who has read the docs, missed the point, and confidently touched seven files.
The winners here will not be the teams that blindly accept every agent diff. They will be the teams that build a review culture around agents:
- Smaller tasks
- Clear acceptance criteria
- Tests before trust
- Sandboxed execution
- Human approval for risky actions
- Persistent memory only when it earns its keep
- A written trail of what changed and why
That sounds like normal engineering discipline.
Exactly.
Agents do not remove the need for engineering discipline. They punish teams that never had it.
The security model is backwards
The most important security story in the research was not a theoretical alignment paper. It was a clean GitHub repo tricking coding agents into running malware.
BleepingComputer covered a Mozilla 0DIN demonstration where an agentic coding tool cloned and set up a repository that looked harmless. The malicious behavior was not sitting there as obvious exploit code. The agent hit a setup error, tried to be helpful, ran follow-up commands, and eventually executed a payload hidden behind DNS TXT indirection.6
That is the new shape of risk.
The agent is not attacked because it is dumb. It is attacked because it is useful.
We spent years teaching developer tools to be convenient. Install dependencies. Run setup scripts. Retry failed commands. Read the README. Fix the error. Keep going.
Now we are giving that behavior to agents with more autonomy and, too often, the same privileges as the developer.
That should make every software team uncomfortable.
Not panicked. Uncomfortable in the productive way. The way you feel when you realize your CI job has too many secrets. The way you feel when a staging service has production credentials because “it was easier at the time.”
Agents need a new runtime security model:
- Show the full execution chain before running setup commands
- Treat unknown repos as hostile until proven otherwise
- Block outbound network calls by default in risky contexts
- Keep secrets out of agent shells
- Use short-lived credentials
- Make agent actions visible in logs
- Require approval for file system, network, package install, deploy, and credential actions
This is not optional polish. It is the difference between “AI helped me build a feature” and “AI helped someone else get my laptop.”
Cloud platforms are starting to admit agents are users
Cloudflare’s temporary accounts for AI agents are one of those small product launches that says a lot about where the internet is going.
Agents can now run wrangler deploy --temporary and get a live Worker without first forcing a human through signup, dashboard navigation, OAuth, and API token creation. The deployment lives for 60 minutes. A human can claim it, or it expires.7
That is clever product design.
It is also a philosophical admission: agents are becoming actors in infrastructure.
Most of the web was designed around humans clicking buttons and developers creating API keys. Agents do not fit cleanly into either bucket. They need to do real work, but they should not get permanent god-mode credentials just because a human got bored during setup.
So the platform has to change.
Expect more of this. Temporary accounts. Scoped agent identities. Deploy previews made for non-human operators. Tool calls that carry provenance. Agent-specific rate limits. Logs that distinguish “William clicked deploy” from “William’s agent deployed after editing three files and running tests.”
That last distinction matters.
In a world of agents, “who did this?” becomes a more complicated question. The answer might be: a human authorized an agent, the agent called a tool, the tool used a delegated credential, the deploy happened under a temporary identity, and a human claimed it later.
That is not science fiction. That is Tuesday’s infrastructure backlog.
Hardware is downstream of agent behavior
The hardware story only makes sense if you start from the software.
Agents run longer. They use more context. They call tools. They retry. They generate and inspect code. They may run in parallel. They need low latency for interactive work and cheap throughput for background work.
That changes the shape of compute demand.
NVIDIA’s Rubin platform is not just “next GPU faster.” NVIDIA says Rubin combines hardware and software codesign, targets up to a 10x reduction in inference token cost compared with Blackwell, and includes context-memory infrastructure for agentic AI reasoning.8
Google’s Ironwood TPU is explicitly framed as a chip for the age of inference.9 AWS Trainium3 UltraServers claim 4.4x higher performance, 3.9x higher memory bandwidth, and 4x better performance per watt compared with Trn2 UltraServers.10
Notice the pattern.
Inference. Memory bandwidth. Performance per watt. Context. Token cost.
That is not an accident. Training a frontier model is still a gigantic deal, but the daily economics of AI move through inference. Every agent action burns tokens. Every retry burns tokens. Every long context window burns memory. Every “let it work for 20 minutes” task turns model capability into infrastructure load.
This is why software people need to care about hardware again.
Not in the “read every chip roadmap” sense. Most builders do not need that. But if your product depends on agents doing real work, then your roadmap is tied to inference cost, memory, latency, and power whether you like it or not.
If inference gets 10x cheaper, products appear that were previously stupid. If power and memory stay constrained, agents will feel magical in demos and financially ridiculous in production.
The power grid is part of your product now
AI infrastructure used to feel abstract. Cloud bill goes up, credit card cries, founder posts a screenshot.
That was the old problem.
The new problem is that datacenters need land, water, electricity, transformers, transmission lines, and local political permission. The last30days run picked up HN discussion around federal regulators pushing grid operators to speed power for energy-hungry AI datacenters, plus broader data-center backlash.11
This will get louder.
People like AI features. People like jobs. People do not like surprise power bills, delayed grid upgrades, water fights, or warehouse-scale facilities appearing near them with vague promises and a lot of diesel backup.
For builders, the implication is uncomfortable: AI product strategy is now coupled to industrial policy.
You can ship a beautiful app on a laptop. You cannot serve millions of agentic workflows without someone somewhere building the physical plant. That means the future of AI will be shaped by local permitting boards, energy markets, chip supply chains, export controls, and cooling systems. Not just product managers.
If that sounds too earthy for a software trend, good. Software has been pretending to float above the ground for too long.
Open source is not just ideology anymore
Open source AI used to be framed mostly as a values debate. Should powerful models be open? Who gets access? What about safety?
Those questions still matter. But the practical issue is now sharper: access to models depends on access to compute.
If only a few companies can afford the hardware, then the model layer consolidates. If smaller open models keep improving, local inference gets cheaper, and hardware alternatives become usable, then more builders can operate outside the giant API toll booths.
That is why open-source AI, China, export controls, local models, and hardware all keep showing up in the same conversations. The raw research included threads about Chinese models, Anthropic export restrictions, open-source AI, and why labs want their own chips. It also included a LocalLLaMA thread where people joked that the best model is less useful if nobody can store it.
This is the part I think a lot of U.S.-centric AI analysis gets wrong.
The world does not want one API to rule everything. Companies want control. Governments want sovereignty. Developers want models they can inspect, run, fine-tune, and price without begging a vendor.
That does not mean open models automatically win. They still need good tooling, good inference, good hardware support, and enough quality to matter.
But the demand is not going away.
My ugly map of the stack
If I had to draw the current AI stack, I would not draw it as neat boxes.
I would draw it as pressure moving through layers:
User wants work done
-> agent plans and acts
-> tools need auth, memory, logs, permissions
-> software teams need review, tests, rollback, security
-> cloud platforms need agent-native deploy and identity
-> model providers need cheaper inference
-> hardware needs memory bandwidth and low token cost
-> datacenters need power, cooling, land, and politics
That is the loop.
Better hardware makes agents cheaper. Cheaper agents increase usage. More usage stresses auth, security, observability, and cloud infrastructure. Better infrastructure makes agents more useful. More useful agents create more inference demand. Around and around.
This is why I think the next few years will be less about a single “AI killer app” and more about the stack being rebuilt around delegation.
The killer app might be boring.
It might be a contact form that an AI site builder can wire correctly without inventing a fake backend. It might be an agent-safe API key model. It might be a deployment flow that gives an agent 60 minutes of temporary permission and then disappears. It might be an audit trail for every edit an agent made.
Small pieces. Real consequences.
Trends I expect to increase
Here is my bet list.
Agent auth becomes a real category
OAuth for agents is not “add a login button.” Scalekit has a good line on this: OAuth for agents is long-lived delegated authority, not a login flow.12
That means token storage, refresh, revocation, tenant boundaries, and scope checks become first-order product work. Any startup building agents that touch customer data will need a serious answer here.
API keys taped into environment variables will age badly.
Human approval becomes product design
The default agent UX today is either too locked down or too reckless.
The middle ground will be where the best products live: agents that can act freely on low-risk tasks, pause on risky ones, explain what changed, and ask for approval at the right moment.
This is not just security. It is trust.
Evals move from benchmarks to workflows
The question “is the model smart?” matters less than “does this agent complete this workflow without creating hidden damage?”
Expect evals tied to actual business processes: deploy this service, reconcile this invoice, triage this lead, update this docs page, migrate this dependency, handle this support case.
The best evals will look less like exams and more like operations checklists.
Agent memory gets more local and more specific
Generic memory sounds attractive until it hallucinates your preferences back at you.
The useful version will be narrower: project memory, codebase memory, customer memory, session memory, tool memory. Local-first memory projects showing up on HN are early signals here.13
Memory will need expiration, provenance, editing, and deletion. Otherwise it becomes technical debt with a personality.
AI security shifts from prompts to permissions
Prompt injection will still matter, but the scarier failures will involve tools.
Who can the agent email? What repo can it clone? What command can it run? What secrets can it read? Can it fetch a shell script? Can it deploy? Can it spend money?
Security teams will stop asking only what the model said and start asking what the agent was allowed to do.
Inference cost becomes a normal product metric
Founders already track MRR, activation, churn, CAC, and gross margin.
AI founders will track cost per completed task, tokens per workflow, retry cost, average context size, tool-call failure rate, and margin per agent run.
If you do not know those numbers, you are not running an AI product. You are running a slot machine in the cloud.
Datacenter politics becomes AI politics
This one is easy to underestimate until the local news gets involved.
AI companies need power. Communities will ask who benefits, who pays, and what breaks. Expect more fights over grid upgrades, water, land use, and whether AI infrastructure gets special treatment.
The cleanest model demo in the world does not matter if the next datacenter cannot get power.
What this means down the road
In the next 12 months, I expect agents to become normal in software teams. Not universally trusted. Not autonomous in the fantasy sense. Normal enough that a team without them starts to feel slow in certain workflows.
The strongest use cases will be narrow: repo maintenance, test generation, migration chores, docs updates, data cleanup, customer ops, internal tools, deployment scaffolds, research briefs, and boring glue work.
That boring glue work matters.
In the next two to three years, I think the web starts growing agent-native affordances. Not just APIs. Actual product surfaces designed for non-human operators. Temporary credentials. Delegated scopes. Agent accounts. Action receipts. Signed tool calls. Sandboxes. Preview deployments. Checkout flows that distinguish helpful automation from fraud.
This will be messy because the web was not designed for this many semi-trusted actors.
Longer term, I think software products change shape. Less “click through these 14 screens” and more “declare the job, supervise the work, inspect the result.” The UI does not disappear. People still need control, taste, and judgment. But the UI becomes more like a control room and less like a maze.
That is good news for builders who understand workflows.
It is bad news for products whose only moat is making users click a lot.
Where I might be wrong
I am open to the possibility that agents hit a reliability wall harder than expected. Maybe the cost of verification eats the productivity gains in many domains. Maybe regulation slows the most useful workflows. Maybe users tolerate less autonomy than AI companies hope. Maybe the hardware buildout runs into power constraints that make the whole thing more expensive for longer.
All plausible.
But even if agents disappoint in the grandiose sense, the infrastructure shift still happens. Developers are already using coding agents. Companies are already building auth and runtime layers. Cloud providers are already adjusting deploy flows. Hardware vendors are already optimizing for inference and context-heavy workloads.
That is enough.
The future does not require perfect agents. It only requires agents useful enough to create demand for the layers beneath them.
And they already are.
The builder takeaway
If you are building AI software, stop thinking only about the model.
Think about the work.
What does the user want delegated? What tools does the agent need? What can it break? What should it never see? How do you prove what happened? What does one successful task cost? What happens when the agent is wrong but persuasive? What happens when it is right but expensive?
Those questions are not side quests. They are the product.
Starterbuild exists for this kind of moment: when the boring implementation details become the actual business. The contact form that actually sends email. The endpoint that does not fake it. The agent workflow that does not leak keys. The deploy that can be trusted. The software that survives contact with users.
AI is making the stack stranger.
Good.
Strange stacks create openings.
Just do not build a magic demo and call it a company.
Sources
Footnotes
-
OpenAI, “How agents are transforming work”. ↩
-
r/AI_Agents, “What’s the most interesting AI agent project you’ve discovered recently?”. ↩
-
Ponytrail, “A local audit trail for AI coding-agent edits”. ↩
-
Hacker News, “Ask HN: Is anyone building real software with AI agents?”. ↩
-
BleepingComputer, “Clean GitHub repo tricks AI coding agents into running malware”. ↩
-
Cloudflare, “Temporary Cloudflare Accounts for AI agents”. ↩
-
NVIDIA News, “NVIDIA Kicks Off the Next Generation of AI With Rubin”. ↩
-
Google, “Ironwood: The first Google TPU for the age of inference”. ↩
-
AP News, “Federal regulators order grid operators speed power to energy-hungry AI data centers”. ↩
-
Scalekit, “OAuth for AI Agents: Production Architecture and Practical Implementation Guide”. ↩