Smiley face
حالة الطقس      أسواق عالمية

Summarize this content to 2000 words in 6 paragraphs in Arabic Nvidia’s challengers are seizing a new opportunity to crack its dominance of artificial intelligence chips after Chinese start-up DeepSeek accelerated a shift in AI’s computing requirements. DeepSeek’s R1 and other so-called “reasoning” models, such as OpenAI’s o3 and Anthropic’s Claude 3.7, consume more computing resources than previous AI systems at the point when a user makes their request, a process called “inference”. That has flipped the focus of demand for AI computing, which until recently was centred on training or creating a model. Inference is expected to become a greater portion of the technology’s needs as demand grows among individuals and businesses for applications that go beyond today’s popular chatbots, such as ChatGPT or xAI’s Grok. It is here that Nvidia’s competitors — which range from AI chipmaker start-ups such as Cerebras and Groq to custom accelerator processors from Big Tech companies including Google, Amazon, Microsoft and Meta — are focusing their efforts to disrupt the world’s most valuable semiconductor company. “Training makes AI and inference uses AI,” said Andrew Feldman, chief executive of Cerebras. “And the usage of AI has gone through the roof . . . The opportunity right now to make a chip that is vastly better for inference than for training is larger than it has been previously.”Nvidia dominates the market for huge computing clusters such as Elon Musk’s xAI facility in Memphis or OpenAI’s Stargate project with SoftBank. But its investors are looking for reassurance that it can continue to outsell its rivals in far smaller data centres under construction that will focus on inference. Vipul Ved Prakash, chief executive and co-founder of Together AI, a cloud provider focused on AI that was valued at $3.3bn last month in a round led by General Catalyst, said inference was a “big focus” for his business. “I believe running inference at scale will be the biggest workload on the internet at some point,” he said. Analysts at Morgan Stanley have estimated more than 75 per cent of power and computational demand for data centres in the US will be for inference in the coming years, though they warned of “significant uncertainty” over exactly how the transition will play out. Still, that means hundreds of billions of dollars’ worth of investments could flow towards inference facilities in the next few years, if usage of AI continues to grow at its current pace. Analysts at Barclays estimate capital expenditure for inference in “frontier AI” — referring to the largest and most advanced systems — will exceed that of training over the next two years, jumping from $122.6bn in 2025 to $208.2bn in 2026. While Barclays predicts Nvidia will have “essentially 100 per cent market share” in frontier AI training, it will serve only 50 per cent of inference computing “over the long term”. That leaves the company’s rivals with almost $200bn in chip spending to play for by 2028. “There is a huge pull towards better, faster, more efficient [chips],” said Walter Goodwin, founder of UK-based chip start-up Fractile. Cloud computing providers are eager for “something that cuts out over-dependence” on Nvidia, he added. Nvidia chief executive Jensen Huang insisted his company’s chips are just as powerful for inference as they are for training, as he eyes a giant new market opportunity. The US company’s latest Blackwell chips were designed to handle inference better and many of those products’ earliest customers are using them to serve up, rather than train, AI systems. The popularity of its software, based on its proprietary Cuda architecture, among AI developers also presents a formidable barrier to competitors. “The amount of inference compute needed is already 100x more” than it was when large language models started out, Huang said on last month’s earnings call. “And that’s just the beginning.” The cost of serving up responses from LLMs has fallen rapidly over the past two years, driven by a combination of more powerful chips, more efficient AI systems and intense competition between AI developers such as Google, OpenAI and Anthropic. “The cost to use a given level of AI falls about 10x every 12 months, and lower prices lead to much more use,” Sam Altman, OpenAI’s chief executive, said in a blog post last month. DeepSeek’s v3 and R1 models, which triggered a stock market panic in January largely because of what was perceived as lower training costs, have helped bring down inference costs further, thanks to the Chinese start-up’s architectural innovations and coding efficiencies. At the same time, the kind of processing required by inference tasks — which can include far greater memory requirements to answer longer and more complex queries — opened the door to alternatives to Nvidia’s graphics processing units, whose strengths lie in handling very large volumes of similar calculations. “The performance of inference on your hardware is a function of how fast you can [move data] to and from memory,” said Cerebras’s Feldman, whose chips have been used by French AI start-up Mistral to accelerate performance of its chatbot, Le Chat. Speed is vital to engaging users, Feldman said. “One of the things that Google [search] showed 25 years ago is that even microseconds [of delay] reduce the attention of the viewer,” he said. “We are producing answers for Le Chat in sometimes a second while [OpenAI’s] o1 would have taken 40.” Nvidia maintains its chips are just as powerful for inference as for training, pointing to 200-fold improvement in its inference performance over the past two years. It says hundreds of millions of users access AI products through millions of its GPUs today. “Our architecture is fungible and easy to use in all of those different ways,” Huang said last month, for both building large models or serving up AI applications in new ways.Prakash, whose company counts Nvidia as an investor, said Together uses the same Nvidia chips for inference and training today, which is “pretty useful”. Unlike Nvidia’s “general purpose” GPUs, inference accelerators work best when they are tuned to a particular type of AI model. In a fast-moving industry, that could prove a problem for chip start-ups which bet on the wrong AI architecture. “I think the one advantage of general purpose computing is that as the model architectures are changing, you just have more flexibility,” Prakash said, while also adding: “My sense is there will be a complex mix of silicon over the coming years.” Additional reporting by Michael Acton in San Francisco

شاركها.
© 2025 جلوب تايم لاين. جميع الحقوق محفوظة.