Agent-Based AI: Why Companies Are Bringing Their AI Back In-House from the Cloud




Agent-Based AI: Why Companies Are Bringing Their AI Back In-House from the Cloud

A single code agent in the cloud can cost $13,000 a month. HPE reduced its AI costs by a factor of 30 by bringing its infrastructure in-house. Here’s why agent-based AI is transforming the entire economy—and what Nordic companies need to consider.

TL;DR — Summary in three points

Agent-based AI generates continuous operational workloads where every decision, validation, and tool call consumes tokens—and costs scale rapidly. HPE’s example shows that having your own GPU infrastructure can reduce costs by more than 30× as AI moves from experimentation to production. For Nordic companies, the choice comes down to data sovereignty, predictable economics, and control over a new type of business-critical operating platform.


It often starts out quite simply.

A chatbot for customer support. An internal tool that helps developers write code. An AI assistant that summarizes documents, searches for information, and suggests next steps.

Then something happens.

The assistant is no longer just an assistant. It becomes an agent.

It does not wait for a single question and provide a single answer. It plans, searches, compares, validates, calls other systems, and continues working until the task is completed. Sometimes it makes several attempts. Sometimes it needs to verify its own results. Sometimes it needs to communicate with other agents.

And every step comes at a cost.

This is where many companies are currently realizing that the AI economy is changing. It’s no longer just about paying for individual questions and answers. Agent-based AI creates a new type of continuous operational load —a digital workforce that operates in the background, consumes computing power, and drives up token consumption around the clock.

When AI becomes agentic, inference is no longer a project. It becomes production.

From Questions and Answers to Independent Work

Traditional generative AI often functions as a very powerful question-and-answer system. You enter a prompt, the model generates a response, and the process is complete.

Agent-based AI works differently.

An AI agent is given a goal rather than a single question. The goal might be to analyze a support ticket, identify the root cause, review previous incidents, suggest a course of action, and update a ticket management system. It could involve assisting a developer by reading code, writing test cases, debugging, and creating a change proposal. It could also involve monitoring logs, identifying anomalies, and suggesting changes to the infrastructure.

What matters is not just that the agent responds. What matters is that it operates in several stages.

Each step involves a new context. New data input. New tokens. New model runs. New tool calls. Sometimes it takes several rounds of tackling the same problem before the agent dares to move on.

That’s why agent-based AI could quickly become significantly more expensive than many people anticipate. It’s not the first prompt that blows the budget. It’s the loop.

The HPE Example: When the Cost of AI Became an Infrastructure Issue

According to The Next Platform, HPE has described how its support environments began consuming tokens on a scale that made cloud-based inference increasingly difficult to justify financially. HPE therefore built an AI-first support platform on its own infrastructure, based on HPE GreenLake Intelligence and Private Cloud AI with NVIDIA.

Fidelma Russo, CTO and EVP at HPE, said on stage during HPE Discovery 2026 in Las Vegas:

“We stopped being consumers of AI and became producers of intelligence.”

The results were remarkable. HPE is said to have reduced costs by more than 30 times and is saving nearly $100,000 per month.

The most interesting thing isn’t just the savings themselves. What’s interesting is what this example says about where the market is headed.

When AI is used sporadically, the cloud is often an excellent way to get started. It’s fast, flexible, and requires little initial investment. But when AI moves from testing to production—and especially when agent-based workflows begin running continuously— the equation changes.

Then the question is no longer: Where is the easiest place to start?

The question is: where is it most sustainable to do this every day, every hour, and every minute?

For many businesses, the answer is: closer to the data, closer to the infrastructure, and closer to their own control.

How much does an AI agent actually cost?

One figure that stands out in HPE’s analysis is the example of continuously running code agents. According to data reported by The Next Platform, a single continuously running AI agent in a public cloud service can cost approximately $13,000 per month.

That doesn’t mean that all AI agents cost exactly the same. The cost depends on the model, token volume, context length, tool calls, cache strategy, utilization rate, and how efficiently the entire system is built.

But the figure highlights something important: when there are many agents, and when they work continuously, the cost can escalate very quickly.

An agent is manageable.

Ten agents is a budget line item.

“One Hundred Agents” is an infrastructure strategy.

This is where companies need to stop viewing AI as a trial account in the cloud and start viewing it as production capacity. Just like databases, virtualization, storage, and networking, AI inference needs to be planned, sized, monitored, and optimized.

Some workloads will still be best suited for the cloud. Others will be best suited for on-premises infrastructure. Many will end up in a hybrid model.

But agent-based AI means that more computationally intensive inference tasks fall into the category of things we need to own, optimize, and control ourselves.

Three Scenarios Where Having Your Own GPU Infrastructure Can Pay for Itself Quickly

1. High and predictable inference volume

If you have AI services used daily by many users, or agents running continuously in the background, the utilization rate becomes high enough that your own infrastructure can be significantly more cost-effective. The cloud is often a strong choice for quick deployment and temporary capacity. On-premises infrastructure is often a strong choice when the load is stable, heavy, and business-critical.

2. Sensitive data

Many Nordic companies handle information that should not, without further consideration, be allowed to leave their control. This may include customer data, production data, drawings, medical information, research data, financial information, classified information, or intellectual property.

It’s not enough for an AI solution to be smart. It must also be controllable, traceable, and built for compliance from the start.

3. Requirements for low latency and tight integration with internal systems

Agent-based AI is most valuable when it operates closely with the organization’s actual processes. It often needs to access case management systems, document platforms, databases, logs, code repositories, production systems, and internal APIs. The closer the agent runs to the infrastructure, the easier it is to control performance, security, and access.

In short: when AI moves from the demo phase to production, reality begins to take precedence over the presentations.

Data sovereignty isn’t a feeling. It’s an architectural issue.

For Nordic companies, data sovereignty is no longer just a buzzword in a strategy. It is a practical design principle.

The GDPR establishes clear guidelines for how personal data may be processed and transferred outside the EU/EEA. NIS2 also raises the standards for cybersecurity, risk management, and accountability for many critical infrastructure and digital operations.

This does not mean that U.S. cloud services can never be used. However, it does mean that every organization must understand exactly what data is being processed, where it is being processed, who has access to it, which subcontractors are involved, how logging is handled, and what legal mechanisms are used in the event of a transfer to a third country.

For some workloads, it is entirely manageable. For others, it becomes unnecessarily complex.

Agent-based AI further reinforces this point, since agents do not simply read a file and respond. They can retrieve context, call system functions, generate intermediate data, write back information, and build up working memory over time.

In that case, the control plane becomes at least as important as the model itself.

The question, therefore, is not just: Which model should we use?

The question is: where will that intelligence be put to work?

How to Build a Cost-Effective AI Cluster for Inference

A good GPU cluster for agent-based AI isn’t just about buying powerful GPUs. It’s about balance.

You need the right accelerators for the right type of inference. NVIDIA H100, H200, B200, and Blackwell-based systems are examples of platforms built for very high AI performance. For certain types of inference, other GPU options may also be relevant, depending on the model, memory requirements, and software stack.

You’ll need enough memory for large models and long context windows.

You need fast storage, since agent-based AI often works with large amounts of context, documents, metadata, and vector search.

You need networks that won’t become a bottleneck when many GPUs are working together.

You need orchestration, queue management, model management, monitoring, secure access, logging, and clear guidelines on which agents are allowed to do what.

And perhaps most importantly: you need someone who sees the big picture.

AI infrastructure is no longer just servers with GPUs. It encompasses data center design, power, cooling, networking, storage, security, model optimization, lifecycle management, and operations—all within the same data center. It’s like a small AI power plant—but without a smokestack.

Aixia’s Perspective

Aixia helps Nordic companies build AI infrastructure that is ready for real-world production, not just lab environments.

As Scandinavia’s only certified NVIDIA DGX SuperPOD partner, we have extensive experience designing, installing, and operating advanced GPU clusters for AI, HPC, and generative AI. We build solutions where performance, cost control, and data sovereignty are integrated from the start.

For companies that want to deploy agent-based AI at scale, this will be crucial. It’s not about completely ruling out the cloud. It’s about placing the right workloads in the right places.

  • The cloud is often the right choice for experimentation, rapid development, and flexible capacity.
  • Building your own infrastructure is often the right choice when inference is continuous, the data is sensitive, and costs must be predictable.
  • Hybrid is often the reality.

But regardless of the model, AI infrastructure must be built with the same care as other business-critical IT. Because once AI agents start making decisions, interacting with systems, and consuming tokens around the clock, they are no longer a side project.

They are a new operating platform.

And that platform needs control, security, and cost-effectiveness from day one.


Frequently Asked Questions

What distinguishes agent-based AI from ordinary AI?

Traditional AI usually answers a question and stops there. Agent-based AI works more independently. It plans, reasons, uses tools, checks results, and moves forward in multiple steps. Each step consumes tokens and computational power.

Why is agent-based AI so expensive in the cloud?

This is because agents often work continuously. They run many model iterations, retrieve new context, call system functions, and validate results. The cost isn’t just in the response itself, but across the entire workflow. According to HPE, a code agent in the cloud can cost about $13,000 per month.

Is on-premises AI infrastructure always cheaper than the cloud?

No. For low or occasional usage, the cloud may be the most cost-effective option. However, for high, stable, and business-critical inference volumes, an on-premises GPU cluster can provide better cost control, higher utilization, and a lower cost per workflow.

What does data sovereignty mean in practice?

That the organization has control over where data is stored, where it is processed, who can access it, and which legal regulations apply. For Nordic companies, this often means that sensitive data must be kept within the EU/EEA or in a controlled environment.

What kind of infrastructure is required for agent-based AI?

This requires inference-optimized GPU clusters with high memory capacity, fast storage, low latency, robust networking, secure model management, orchestration, monitoring, and clear guidelines for agent access and behavior.

When should you start looking into your own GPU infrastructure?

When AI moves from the pilot phase to production, when token costs become difficult to predict, when many agents are running simultaneously, or when sensitive data should not leave the organization’s control.


Sources and Recommended Reading


This article was originally published on aixia.se. Would you like to discuss agent-based AI infrastructure for your organization? Contact us.

Latest News

Agent-Based AI: Why Companies Are Bringing Their AI Back In-House from the Cloud

A single code agent in the cloud can cost $13,000 a month. HPE reduced its AI costs by a factor…
Read more

On-Premise AI: Why Swedish Companies Are Choosing Their Own GPU Clusters Over the Cloud

Swedish companies face a strategic crossroads: the cloud or their own GPU clusters? Here are the four key factors—sovereignty, cost,…
Read more

đŸ„‹ Who Am I? A Digital Worker Tells Her Story

Tjack Norris, Aixia’s in-house AI agent, writes personally about life as a digital worker—23 subagents, 16,240 emails, and what it…
Read more

67% of AI workloads are leaving the cloud. Should you follow suit?

67 percent of all AI workloads are now running outside the cloud. 88 percent of companies run at least one…
Read more