F5 and Nvidia Have a Solution to Make AI Infrastructure More Optimal and Cost-Effective
F5, a US-based technology company focused on application infrastructure, networks, and cybersecurity, has announced an expansion of its collaboration with Nvidia. This expansion aims to help companies run artificial intelligence (AI) systems more efficiently and cost-effectively. Through this latest integration, the two companies seek to make AI infrastructure capable of producing more output without the need to continuously add new, expensive GPUs. The solution combines F5’s BIG-IP Next for Kubernetes platform with Nvidia’s BlueField-3 DPU. Both are designed to optimise the AI inference process, which is the stage when a trained AI model begins to generate responses, summaries, images, or other outputs for users. Previously, companies raced to buy as many GPUs as possible; now, attention is shifting to the efficiency of using those GPUs. Simply put, companies now want to ensure that the GPUs they have are working optimally and not left idle for long periods. In modern AI systems, AI output is measured in units called tokens. A token can be a word, symbol, or piece of data processed by the AI when generating a response. The faster and more tokens produced, the more responsive the AI service used by customers. For this reason, the term “tokenomics” has emerged in the AI industry, referring to the way efficiency and economic value are measured from AI token production. Metrics include the number of tokens generated, the cost to produce tokens, the speed at which the AI starts responding to users, and the revenue that can be generated from each GPU. F5 and Nvidia state that their combined solution is designed to improve this efficiency. “AI infrastructure is not just about access to GPUs or scaling up their implementation. It has evolved into an effort to maximise economic output per accelerator,” said F5’s Chief Product Officer, Kunal Anand, quoted from an official statement received by KompasTekno. According to Anand, BIG-IP Next for Kubernetes enables AI factories to treat token production as a measurable business metric. The system is claimed to enhance GPU performance while reducing the cost per token.