NVIDIA Dynamo – new framework for AI inference

NVIDIA Dynamo: Next-generation AI infrastructure solution for more efficient and scalable inference

NVIDIA has recently launched Dynamo, an open source AI inference solution designed to manage and optimize large language models (LLMs) in distributed environments. This software represents a significant step forward for organizations looking to maximize the performance and cost efficiency of their GPU-based AI infrastructures.

What is NVIDIA Dynamo?

Dynamo is a modular and low-latency inference platform that enables efficient management of generative AI models across large GPU clusters. It is designed to scale seamlessly from single GPUs to thousands, making it ideal for companies running large-scale AI applications.

Technical benefits for IT and AI specialists

  • Disaggregated Serving: Separate preprocessing and generation of LLMs across different GPUs to optimize resource usage and increase throughput.

  • Smart Router: Intelligent traffic routing that minimizes redundant computations and balances load efficiently across GPU fleets.

  • Dynamic GPU scheduling: Automatically allocate GPU resources based on real-time demand, eliminating bottlenecks and improving performance.

  • Support for multiple inference engines: Compatible with TensorRT-LLM, vLLM, SGLang, PyTorch and others, providing flexibility in backend selection.

Business benefits for decision-makers

  • Cost efficiency: By increasing the number of inference requests per GPU, Dynamo reduces the overall operational costs of AI applications.

  • Scalability: Ability to quickly adapt to changing business needs through dynamic scaling of GPU resources.

  • Future-proof investment: Dynamo is an open and modular platform that easily integrates with existing AI stacks, protecting past investments and simplifying future upgrades.

Performance in practice

When tested with the DeepSeek-R1 671B open model on the NVIDIA GB200 NVL72, Dynamo increased throughput by up to 30x per GPU. When the Llama 70B model was run on the NVIDIA Hopper platform, throughput doubled. These improvements mean businesses can deliver AI services faster and at lower cost.

How Aixia can support your transition to Dynamo

At Aixia, we offer expertise in implementing and optimizing AI infrastructures. We can help your company to.

  • Evaluate compatibility: Analyze your current GPU infrastructure to ensure it is ready for Dynamo.

  • Implement Dynamo: Support the installation and configuration of Dynamo to maximize performance and efficiency.

  • Train staff: Provide training for your team in the use and maintenance of the new platform.

Contact us to discuss how we can help your business benefit from NVIDIA Dynamo and take your AI infrastructure to the next level.

For more information about NVIDIA Dynamo, visit the official NVIDIA website.

Latest News

Before investing in AI – is your data ready?

Most AI projects do not fail because of the models. They fail because of the data underneath. It’s an uncomfortable…

Read more

From automation to industrial intelligence: 2030 is closer than you think

Digitalization is often talked about as something that will happen “later”, but the truth is that we are currently in…

Read more

Backup is not DR – and the difference could cost you business

Almost all companies have backups. But ask an IT manager when they last actually tested the recovery – not just…
Read more

Agentic AI: Is your infrastructure ready for ’employees’ of code?

We have just left the phase where AI was something we chatted with to get answers. Now we are entering…

Read more