{"id":120462,"date":"2025-04-29T15:27:01","date_gmt":"2025-04-29T13:27:01","guid":{"rendered":"https:\/\/aixia.se\/nvidia-dynamo\/"},"modified":"2026-03-09T13:47:10","modified_gmt":"2026-03-09T12:47:10","slug":"nvidia-dynamo","status":"publish","type":"post","link":"https:\/\/aixia.se\/en\/nvidia-dynamo\/","title":{"rendered":"NVIDIA Dynamo &#8211; new framework for AI inference"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"120462\" class=\"elementor elementor-120462 elementor-118483\" data-elementor-post-type=\"post\">\n\t\t\t\t<div class=\"elementor-element elementor-element-4450781 e-flex e-con-boxed e-con e-parent\" data-id=\"4450781\" data-element_type=\"container\" data-e-type=\"container\">\n\t\t\t\t\t<div class=\"e-con-inner\">\n\t\t<div class=\"elementor-element elementor-element-fe0c848 e-con-full e-flex e-con e-child\" data-id=\"fe0c848\" data-element_type=\"container\" data-e-type=\"container\">\n\t\t\t\t<div class=\"elementor-element elementor-element-4c959ef elementor-widget elementor-widget-text-editor\" data-id=\"4c959ef\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<h2 data-start=\"40\" data-end=\"185\"><strong data-start=\"40\" data-end=\"141\">NVIDIA Dynamo: Next-generation AI infrastructure solution for more efficient and scalable inference<\/strong><\/h2><h5 data-start=\"187\" data-end=\"572\">NVIDIA has recently launched <em data-start=\"215\" data-end=\"223\">Dynamo<\/em>, an open source AI inference solution designed to manage and optimize large language models (LLMs) in distributed environments. This software represents a significant step forward for organizations looking to maximize the performance and cost efficiency of their GPU-based AI infrastructures. <\/h5>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t<div class=\"elementor-element elementor-element-44f3f5f e-con-full e-flex e-con e-child\" data-id=\"44f3f5f\" data-element_type=\"container\" data-e-type=\"container\">\n\t\t\t\t<div class=\"elementor-element elementor-element-0b2bd73 elementor-widget elementor-widget-image\" data-id=\"0b2bd73\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img fetchpriority=\"high\" decoding=\"async\" width=\"800\" height=\"800\" src=\"https:\/\/aixia.se\/wp-content\/uploads\/2025\/04\/ChatGPT-Image-29-apr.-2025-15_23_12.png\" class=\"attachment-large size-large wp-image-118488\" alt=\"\" srcset=\"https:\/\/aixia.se\/wp-content\/uploads\/2025\/04\/ChatGPT-Image-29-apr.-2025-15_23_12.png 1024w, https:\/\/aixia.se\/wp-content\/uploads\/2025\/04\/ChatGPT-Image-29-apr.-2025-15_23_12-300x300.png 300w, https:\/\/aixia.se\/wp-content\/uploads\/2025\/04\/ChatGPT-Image-29-apr.-2025-15_23_12-150x150.png 150w, https:\/\/aixia.se\/wp-content\/uploads\/2025\/04\/ChatGPT-Image-29-apr.-2025-15_23_12-768x768.png 768w\" sizes=\"(max-width: 800px) 100vw, 800px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t<div class=\"elementor-element elementor-element-cf98926 e-flex e-con-boxed e-con e-parent\" data-id=\"cf98926\" data-element_type=\"container\" data-e-type=\"container\">\n\t\t\t\t\t<div class=\"e-con-inner\">\n\t\t\t\t<div class=\"elementor-element elementor-element-5464de2 elementor-widget elementor-widget-text-editor\" data-id=\"5464de2\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<h3 data-start=\"579\" data-end=\"607\"> <\/h3><h3 class=\"\" data-start=\"579\" data-end=\"607\">What is NVIDIA Dynamo?<\/h3><p class=\"\" data-start=\"609\" data-end=\"728\"><span class=\"relative -mx-px my-[-0.2rem] rounded px-px py-[0.2rem] transition-colors duration-100 ease-in-out\">Dynamo is a modular and low-latency inference platform that enables efficient management of generative AI models across large GPU clusters.<\/span> <span class=\"relative -mx-px my-[-0.2rem] rounded px-px py-[0.2rem] transition-colors duration-100 ease-in-out\">It is designed to scale seamlessly from single GPUs to thousands, making it ideal for companies running large-scale AI applications.<\/span><\/p><h3 class=\"\" data-start=\"735\" data-end=\"787\">Technical benefits for IT and AI specialists<\/h3><ul data-start=\"789\" data-end=\"1244\"><li class=\"\" data-start=\"789\" data-end=\"899\"><p class=\"\" data-start=\"791\" data-end=\"899\"><strong data-start=\"791\" data-end=\"816\">Disaggregated Serving<\/strong>: <span class=\"relative -mx-px my-[-0.2rem] rounded px-px py-[0.2rem] transition-colors duration-100 ease-in-out\">Separate preprocessing and generation of LLMs across different GPUs to optimize resource usage and increase throughput.<\/span><\/p><\/li><li class=\"\" data-start=\"901\" data-end=\"1002\"><p class=\"\" data-start=\"903\" data-end=\"1002\"><strong data-start=\"903\" data-end=\"919\">Smart Router<\/strong>: <span class=\"relative -mx-px my-[-0.2rem] rounded px-px py-[0.2rem] transition-colors duration-100 ease-in-out\">Intelligent traffic routing that minimizes redundant computations and balances load efficiently across GPU fleets.<\/span><\/p><\/li><li class=\"\" data-start=\"1004\" data-end=\"1119\"><p class=\"\" data-start=\"1006\" data-end=\"1119\"><strong data-start=\"1006\" data-end=\"1032\">Dynamic GPU scheduling<\/strong>: <span class=\"relative -mx-px my-[-0.2rem] rounded px-px py-[0.2rem] transition-colors duration-100 ease-in-out\">Automatically allocate GPU resources based on real-time demand, eliminating bottlenecks and improving performance.<\/span><\/p><\/li><li class=\"\" data-start=\"1121\" data-end=\"1244\"><p class=\"\" data-start=\"1123\" data-end=\"1244\"><strong data-start=\"1123\" data-end=\"1157\">Support for multiple inference engines<\/strong>: <span class=\"relative -mx-px my-[-0.2rem] rounded px-px py-[0.2rem] transition-colors duration-100 ease-in-out\">Compatible with TensorRT-LLM, vLLM, SGLang, PyTorch and others, providing flexibility in backend selection.<\/span><\/p><\/li><\/ul><h3 class=\"\" data-start=\"1251\" data-end=\"1291\">Business benefits for decision-makers<\/h3><ul data-start=\"1293\" data-end=\"1631\"><li class=\"\" data-start=\"1293\" data-end=\"1406\"><p class=\"\" data-start=\"1295\" data-end=\"1406\"><strong data-start=\"1295\" data-end=\"1319\">Cost efficiency<\/strong>: <span class=\"relative -mx-px my-[-0.2rem] rounded px-px py-[0.2rem] transition-colors duration-100 ease-in-out\">By increasing the number of inference requests per GPU, Dynamo reduces the overall operational costs of AI applications.<\/span><\/p><\/li><li class=\"\" data-start=\"1408\" data-end=\"1511\"><p class=\"\" data-start=\"1410\" data-end=\"1511\"><strong data-start=\"1410\" data-end=\"1424\">Scalability<\/strong>: <span class=\"relative -mx-px my-[-0.2rem] rounded px-px py-[0.2rem] transition-colors duration-100 ease-in-out\">Ability to quickly adapt to changing business needs through dynamic scaling of GPU resources.<\/span><\/p><\/li><li class=\"\" data-start=\"1513\" data-end=\"1631\"><p class=\"\" data-start=\"1515\" data-end=\"1631\"><strong data-start=\"1515\" data-end=\"1544\">Future-proof investment<\/strong>: <span class=\"relative -mx-px my-[-0.2rem] rounded px-px py-[0.2rem] transition-colors duration-100 ease-in-out\">Dynamo is an open and modular platform that easily integrates with existing AI stacks, protecting past investments and simplifying future upgrades.<\/span><\/p><\/li><\/ul><h3 class=\"\" data-start=\"1638\" data-end=\"1666\">Performance in practice<\/h3><p class=\"\" data-start=\"1668\" data-end=\"1833\"><span class=\"relative -mx-px my-[-0.2rem] rounded px-px py-[0.2rem] transition-colors duration-100 ease-in-out\">When tested with the DeepSeek-R1 671B open model on the NVIDIA GB200 NVL72, Dynamo increased throughput by up to 30x per GPU.<\/span> <span class=\"relative -mx-px my-[-0.2rem] rounded px-px py-[0.2rem] transition-colors duration-100 ease-in-out\">When the Llama 70B model was run on the NVIDIA Hopper platform, throughput doubled.<\/span> <span class=\"relative -mx-px my-[-0.2rem] rounded px-px py-[0.2rem] transition-colors duration-100 ease-in-out\">These improvements mean businesses can deliver AI services faster and at lower cost.<\/span><\/p><h3 class=\"\" data-start=\"1840\" data-end=\"1892\">How Aixia can support your transition to Dynamo<\/h3><p class=\"\" data-start=\"1894\" data-end=\"2019\"><span class=\"relative -mx-px my-[-0.2rem] rounded px-px py-[0.2rem] transition-colors duration-100 ease-in-out\">At Aixia, we offer expertise in implementing and optimizing AI infrastructures.<\/span> <span class=\"relative -mx-px my-[-0.2rem] rounded px-px py-[0.2rem] transition-colors duration-100 ease-in-out\">We can help your company to.<\/span><\/p><ul data-start=\"2021\" data-end=\"2363\"><li class=\"\" data-start=\"2021\" data-end=\"2138\"><p class=\"\" data-start=\"2023\" data-end=\"2138\"><strong data-start=\"2023\" data-end=\"2051\">Evaluate compatibility<\/strong>: <span class=\"relative -mx-px my-[-0.2rem] rounded px-px py-[0.2rem] transition-colors duration-100 ease-in-out\">Analyze your current GPU infrastructure to ensure it is ready for Dynamo.<\/span><\/p><\/li><li class=\"\" data-start=\"2140\" data-end=\"2252\"><p class=\"\" data-start=\"2142\" data-end=\"2252\"><strong data-start=\"2142\" data-end=\"2165\">Implement Dynamo<\/strong>: <span class=\"relative -mx-px my-[-0.2rem] rounded px-px py-[0.2rem] transition-colors duration-100 ease-in-out\">Support the installation and configuration of Dynamo to maximize performance and efficiency.<\/span><\/p><\/li><li class=\"\" data-start=\"2254\" data-end=\"2363\"><p class=\"\" data-start=\"2256\" data-end=\"2363\"><strong data-start=\"2256\" data-end=\"2276\">Train staff<\/strong>: <span class=\"relative -mx-px my-[-0.2rem] rounded px-px py-[0.2rem] transition-colors duration-100 ease-in-out\">Provide training for your team in the use and maintenance of the new platform.<\/span><\/p><\/li><\/ul><p class=\"\" data-start=\"2365\" data-end=\"2450\"><span class=\"relative -mx-px my-[-0.2rem] rounded px-px py-[0.2rem] transition-colors duration-100 ease-in-out\">Contact us to discuss how we can help your business benefit from NVIDIA Dynamo and take your AI infrastructure to the next level.<\/span><\/p><p class=\"\" data-start=\"2457\" data-end=\"2571\"><em data-start=\"2457\" data-end=\"2571\">For more information about NVIDIA Dynamo, visit <a class=\"\" href=\"https:\/\/www.nvidia.com\/en-us\/ai\/dynamo\/\" target=\"_new\" rel=\"noopener noreferrer nofollow\" data-start=\"2502\" data-end=\"2569\" target=\"_blank\">the official NVIDIA website<\/a>.<\/em><\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>Read about NVIDIA Dynamo. 13\u219244 characters. <\/p>\n","protected":false},"author":4,"featured_media":118490,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"inline_featured_image":false,"footnotes":""},"categories":[46,77],"tags":[],"class_list":["post-120462","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-news","category-techblog"],"acf":[],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/aixia.se\/en\/wp-json\/wp\/v2\/posts\/120462","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/aixia.se\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/aixia.se\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/aixia.se\/en\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/aixia.se\/en\/wp-json\/wp\/v2\/comments?post=120462"}],"version-history":[{"count":2,"href":"https:\/\/aixia.se\/en\/wp-json\/wp\/v2\/posts\/120462\/revisions"}],"predecessor-version":[{"id":121240,"href":"https:\/\/aixia.se\/en\/wp-json\/wp\/v2\/posts\/120462\/revisions\/121240"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/aixia.se\/en\/wp-json\/wp\/v2\/media\/118490"}],"wp:attachment":[{"href":"https:\/\/aixia.se\/en\/wp-json\/wp\/v2\/media?parent=120462"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/aixia.se\/en\/wp-json\/wp\/v2\/categories?post=120462"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/aixia.se\/en\/wp-json\/wp\/v2\/tags?post=120462"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}