Smiley face
حالة الطقس      أسواق عالمية

Summarize this content to 2000 words in 6 paragraphs in Arabic For anyone wanting to train an LLM on analyst responses to DeepSeek, the Temu of ChatGPTs, this post is a one-stop shop. We’ve grabbed all relevant sellside emails in our inbox and copy-pasted them with minimal intervention. Backed by High-Flyer VC fund, DeepSeek is a two-years-old, Hangzhou-based spinout of a Zhejiang University startup for trading equities by machine learning. Its stated goal is to make an artificial general intelligence for the fun of it, not for the money. There’s a good interview on ChinaTalk with founder Liang Wenfeng, and mainFT has this excellent overview from our colleagues Eleanor Olcott and Zijing Wu. Mizuho’s Jordan Rochester takes up the story . . . [O]n Jan 20, [DeepSeek] released an open source model (DeepSeek-R1) that beats the industry’s leading models on some math and reasoning benchmarks including capability, cost, openness etc. Deepseek app has topped the free APP download rankings in Apple’s app stores in China and the United States, surpassing ChatGPT in the U.S. download list. What really stood out? DeepSeek said it took 2 months and less than $6m to develop the model – building on already existing technology and leveraging existing models. In comparison, Open AI is spending more than $5 billion a year. Apparently DeepSeek bought 10,000 NVIDIA chips whereas Hyperscalers have bought many multiples of this figure. It fundamentally breaks the AI Capex narrative if true.Sounds bad, but why? Here’s Jefferies’ Graham Hunt et al: With DeepSeek delivering performance comparable to GPT-40 for a fraction of the computing power, there are potential negative implications for the builders, as pressure on Al players to justify ever increasing capex plans could ultimately lead to a lower trajectory for data center revenue and profit growth.The DeepSeek R1 model is free to play with here, and does all the usual stuff like summarising research papers in iambic pentameter and getting logic problems wrong. The R1-Zero model, DeepSeek says, was trained entirely without supervised fine tuning.Here’s Damindu Jayaweera and team at Peel Hunt with more detail.Firstly, it was trained in under 3 million GPU hours, which equates to just over $5m training cost. For context, analysts estimate Meta’s last major AI model cost $60-70m to train. Secondly, we have seen people running the full DeepSeek model on commodity Mac hardware in a usable manner, confirming its inferencing efficiency (using as opposed to training). We believe it will not be long before we see Raspberry Pi units running cutdown versions of DeepSeek. This efficiency translates into hosted versions of this model costing just 5% of the equivalent OpenAI price. Lastly, it is being released under the MIT License, a permissive software license that allows near-unlimited freedoms, including modifying it for proprietary commercial useDeepSeek’s not an unanticipated threat to the OpenAI Industrial Complex. Even The Economist had spotted it months ago, and industry mags like SemiAnalysis have been talking for ages about the likelihood of China commoditising AI. That might be what’s happening here, or might not. Here’s Joshua Meyers, a specialist sales person at JPMorgan:It’s unclear to what extent DeepSeek is leveraging High-Flyer’s ~50k hopper GPUs (similar in size to the cluster on which OpenAI is believed to be training GPT-5), but what seems liklely is that they’re dramatically reducing costs (inference costs for their V2 model, for example, are claimed to be 1/7 that of GPT-4 Turbo). Their subversive (though not new) claim – that started to hit the US AI names this week – is that “more investments do not equal more innovation.” Liang: “Right now I don’t see any new approaches, but big firms do not have a clear upper hand. Big firms have existing customers, but their cash-flow businesses are also their burden, and this makes them vulnerable to disruption at any time.” And when asked about the fact that GPT5 has still not been released: “OpenAI is not a god, they won’t necessarily always be at the forefront.”Best for now that no-one tells Altman. Back to Mizuho:Why this comes at a painful moment? This is happening after we just saw a Texas Hold’em ‘All In” push of the chips with respect to the Stargate Announcement (~$500B by 2028E) and Meta taking up CAPEX officially to the range of $60-$65B to scale up Llama and of course MSFT’s $80B announcement…..The markets were literally trying to model just Stargate’s stated demand for ~2mln Unis from NVDA when their total production is only 6mn…..(Nvidia’s European trading is down 9% this morning, Softbank was down 7%). Markets are now wondering if this is a AI bubble popping moment for markets or not (i.e. a dot-com bubble for Cisco). Nvidia Is the largest individual company weight of S&P500 at 7%.And Jefferies again. 1) We see at least two potential industry strategies. The emergence of more efficient training models out of China, which have been driven to innovate due to chip supply constraints, is likely to further intensify the race for AI dominance between the US and China. The key question for the data center builders, is whether it continues to be a “Build at all Costs” strategy with accelerated model improvements, or whether focus now shifts towards higher capital efficiency, putting pressure on power demand and capex budgets from the major AI players. Near term the market will assume the latter.2) Derating risk near term, earnings less impacted. Although data center exposed names are vulnerable to derating on sentiment, there is no immediate impact on earnings for our coverage. Any changes to capex plans apply with a lag effect given duration (>12M) and exposure in orderbooks (~10% for HOT). We see limited risk of alterations or cancellations to existing orders and expect at this stage a shift in expectations to higher ROI on existing investments driven by more efficient models. Overall, we remain bullish on the sector where scale leaders benefit from a widening moat and higher pricing power.Though it’s the Chinese, so people are suspicious. Here’s Citi’s Atif Malik:While DeepSeek’s achievement could be groundbreaking, we question the notion that its feats were done without the use of advanced GPUs to fine tune it and/or build the underlying LLMs the final model is based on through the Distillation technique. While the dominance of the US companies on the most advanced AI models could be potentially challenged, that said, we estimate that in an inevitably more restrictive environment, US’ access to more advanced chips is an advantage. Thus, we don’t expect leading AI companies would move away from more advanced GPUs which provide more attractive $/TFLOPs at scale. We see the recent AI capex announcements like Stargate as a nod to the need for advanced chips.People, such as Bernstein’s Stacy A Rasgon and team, also question the estimates for cost and efficiency. The Bernstein team says today’s panic is about a “fundamental misunderstanding over the $5mn number” and the way in which DeepSeek has deployed smaller models distilled from the full-fat one, R1.“It seems categorically false that ‘China duplicated OpenAI for $5M’ and we don’t think it really bears further discussion,” Bernstein says:Did DeepSeek really “build OpenAI for $5M?” Of course not…There are actually two model families in discussion. The first family is DeepSeek-V3, a Mixture-of-Experts (MoE) large language model which, through a number of optimizations and clever techniques can provide similar or better performance vs other large foundational models but requires a small fraction of the compute resources to train. DeepSeek actually used a cluster of 2048 NVIDIA H800 GPUs training for ~2 months (a total of ~2.7M GPU hours for pre-training and ~2.8M GPU hours including post-training). The oft-quoted “$5M” number is calculated by assuming a $2/GPU hour rental price for this infrastructure which is fine, but not really what they did, and does not include all the other costs associated with prior research and experiments on architectures, algorithms, or data. The second family is DeepSeek R1, which uses Reinforcement Learning (RL) and other innovations applied to the V3 base model to greatly improve performance in reasoning, competing favorably with OpenAI’s o1 reasoning model and others (it is this model that seems to be causing most of the angst as a result). DeepSeek’s R1 paper did not quantify the additional resources that were required to develop the R1 model (presumably they were substantial as well).[ . . . ][S]hould the relative efficiency of V3 be surprising? As an MoE model we don’t really think so…The point of the mixture-of-expert (MoE) architecture is to significantly reduce cost to train and run, given that only a portion of the parameter set is active at any one time (for example, when training V3 only 37B out of 671B parameters get updated for any one token, vs dense models where all parameters get updated). A survey of other MoE comparisons suggests typical efficiencies on the order of 3-7x vs similarly-sized dense models of similar performance; V3 looks even better than this (>10x), likely given some of the other innovations in the model the company has brought to bear but the idea that this is something completely revolutionary seems a bit overblown, and not really worthy of the hysteria that has taken over the Twitterverse over the last several days.Nevertheless, talk of a price war is enough to knock a hole in the Mag7’s already sketchy ROI. “Is absolutely true that DeepSeek’s pricing blows away anything from the competition, with the company pricing their models anywhere from 20-40x cheaper than equivalent models from OpenAI,” Bernstein says.Of course, we do not know DeepSeek’s economics around these (and the models themselves are open and available to anyone that wants to work with them, for free) but the whole thing brings up some very interesting questions about the role and viability of proprietary vs open-source efforts that are probably worth doing more work on…Is any of this a good reason for a wider market selloff? On sentiment, maybe.Per SocGen, Nvidia plus Microsoft, Alphabet, Amazon and Meta, its top-four customers, “have contributed approximately 700 points to the S&P 500 over the last 2 years. “In other words, the S&P 500 excluding the Mag-5s would be 12% per cent lower today. Nvidia alone has contributed 4 per cent to the performance of the S&P 500. This is what we find to be the ‘American exceptionalism’ premium on the S&P 500.”Deutsche Bank’s Jim Reid narrows it down to Nvidia alone, and its stunningly quick transformation from a maker of video games graphics cards to the turboprop of economic prosperity:it’s gone from LTM earnings of around $4bn two years ago to around $63bn in the last quarterly release. For context, this is around half the total earnings made by listed stocks in each of UK, Germany and France over the last 12 months. The forecasts are for Nvidia to continue to see significant earnings growth.So this is a company that has gone from relative earnings obscurity to one of the most profitable in the world inside two years and the largest company in the world as of Friday night. The problem is that the AI industry is embryonic. And it’s almost impossible to know how it will develop or what competition current winners might face even if you fully believe in its potential to drive future productivity. The stratospheric rise of DeepSeek reminds us of this.Hang on though. Cheap Chinese AI means more productivity benefits, lower build costs and an acceleration towards the Andreesen Theory of Cornucopia so maybe . . . good news in the long run? JPMorgan’s Meyers again:This strikes me not about the end of scaling or about there not being a need for more compute, or that the one who puts in the most capital won’t still win (remember, the other big thing that happened yesterday was that Mark Zuckerberg boosted AI capex materially). Rather, it seems to be about export bans forcing competitors across the Pacific to drive efficiency: “DeepSeek V2 was able to achieve incredible training efficiency with better model performance than other open models at 1/5th the compute of Meta’s Llama 3 70B. For those keeping track, DeepSeek V2 training required 1/20th the flops of GPT-4 while not being so far off in performance.” If DeepSeek can reduce the cost of inference, then others will have to as well, and demand will hopefully more than make up for that over time.That’s also the view of Morgan Stanley, the most AI-enthusiastic of the big banks. Frontier AI capabilities might be achievable without the massive computational resources previously thought necessary. Efficient resource use – with clever engineering and efficient training methods – could matter more than sheer computing power. This may inspire a wave of innovation in exploring cost-effective methods of AI development and deployment. This means that the ROI of LLM that is of today’s concern could improve meaningfully without giving away the quality or the time line for the deployment of AI applications. The achievement also suggests the democratization of AI by making sophisticated models more accessible to eventually drive greater adoption and proliferations of AI.Bottom line. The restrictions on chips may end up acting as a meaningful tax on Chinese AI development but not a hard limit. China has demonstrated that cutting-edge AI capabilities can be achieved with significantly less hardware, defying conventional expectations of computing power requirements. A model that achieves frontier-grade results despite limited hardware access could mean a shift in the global AI landscape, redefining the competitive landscape of global AI enterprises, and fostering a new era of efficiency-driven progress.And Peel Hunt again:We believe the impact of those advantages will be twofold. In the medium to longer term, we expect LLM infrastructure to go the way of the telco infrastructure and become a ‘commodity technology’. The financial impact on those deploying AI capex today depends on regulatory interference – which had a major impact on Telcos. If we think of AI as another ‘tech infrastructure layer’, like the internet, the mobile, and the cloud, in theory the beneficiaries should be companies that leverage that infrastructure. While we think of Amazon, Google, and Microsoft as cloud infrastructure, this emerged out of the need to support their existing business models: e-commerce, advertising and information-worker software. The LLM infrastructure is different in that, like the railroads and telco infrastructure, these are being built ahead of true product/market fit.And Bernstein:If we acknowledge that DeepSeek may have reduced costs of achieving equivalent model performance by, say, 10x, we also note that current model cost trajectories are increasing by about that much every year anyway (the infamous “scaling laws…”) which can’t continue forever. In that context, we NEED innovations like this (MoE, distillation, mixed precision etc) if AI is to continue progressing. And for those looking for AI adoption, as semi analysts we are firm believers in the Jevons paradox (i.e. that efficiency gains generate a net increase in demand), and believe any new compute capacity unlocked is far more likely to get absorbed due to usage and demand increase vs impacting long term spending outlook at this point, as we do not believe compute needs are anywhere close to reaching their limit in AI. It also seems like a stretch to think the innovations being deployed by DeepSeek are completely unknown by the vast number of top tier AI researchers at the world’s other numerous AI labs (frankly we don’t know what the large closed labs have been using to develop and deploy their own models, but we just can’t believe that they have not considered or even perhaps used similar strategies themselves). To that end investments are still accelerating. Right on top of all the DeepSeek newsflow last week we got META substantially increasing their capex for the year. We got the Stargate announcement. And China announced trillion yuan (~$140B) AI spending plan. We are still going to need, and get, a lot of chips…The most pressing question, at least for Morgan Stanley’s Adam Jonas, is what it all means for Tesla. Never one for understatement, his note is called “Precursor to a Humanoid ‘Sputnik’ Moment?”The DeepSeek ‘moment’ ultimately brings forward the market’s appreciation of what Tesla brings to the table in the emerging AI/Robotics ‘Embodied AI’ arena.[ . . . ]Advancement in genAI/LLMs directly impacts the advancement of foundation model training for robotics (AVs, eVTOL, AMRs, humanoids, etc). More than any other factor, the growing investor interest in embodied AI has been driven by recent advancements in genAI/supercomputing.[ . . . ]EVs are the ‘sockets’ for the forthcoming physical (‘embodied’) AI. In our opinion, if the US wants to be a leader in autonomy, it must ultimately embrace electric mobility. In a future where geopolitical rivals demonstrate increasing competency/rate-of-change we would expect to see an acceleration of government/lawmaker and investor attention on the embodied theme. In our view, this could enhance the market’s appreciation for what Tesla brings to the table beyond the EV market.[ . . . ]While we continue to view Tesla as sort of an embodied AI ‘portfolio’ and well positioned due to the underpinnings of our DREAMS framework, we note that we do not currently ascribe any value to Tesla for embodied AI in our $430 price target or $800 bull case.🙃As for broad markets strategy, there’s a lot, but here’s a taster. US stocks have not begun trading at the time of writing, but futures on the big indices and ETFs are indicating a grisly opening, as Bespoke Investment Group notes:If the reports of DeepSeek’s success at such low costs are true, and this is a big if as there is still a lot we don’t know in terms of how it was developed, it would pose problems for some of the biggest AI winners over the last two years. As we type this, the S&P 500 (proxied by SPY) is trading down about 2.25% which would be the largest downside gap since early August and the 60th largest downside gap in the ETF’s history dating back to 1993.For the Nasdaq 100 (QQQ), the declines are even steeper. With the ETF poised to gap down 3.8% at the open, it would be QQQ’s largest downside gap since early August and the 20th largest downside gap since its inception in 1999. As shown in the chart below, before last August’s downside gap, the last time QQQ gapped down as much as it on pace to today was back in September 2020.Nomura’s Charlie McElligott is also worried that this could escalate into a “monster de-risking” today. We’ve kept his italics and bolding below to preserve his unique voice:I’m not gonna try to play Semi- / AI- expert of the long-term viability and potential AI paradigm shift here…but there are heavy “modern market structure” and mechanical flow implications here for the Stock Market…and the “US Exceptionalism” trade positioning, as “innovation” is a core component to that view…But the larger issue is that Megacap Tech IS the US Equities Market, and anybody with a mandate to own Equities is by default stuffed on these names in order to survive recent years, where Mag8 are 35% of SPX and 49% of NDX index weights, respectivelyAdditionally, we’ve seen substantial “Spot Up, Vol Up” Upside chasing into Calls (e.g. 95%ile + Call Skews across Semi names) and general demand for Calls in MegaCap Tech / AI -names recently again in recent weeks…which can then now “collapse under the weight of their own Delta” on the Spot pullbacksAnd when you add-in the massive allocation that these “Tech Animal Spirits” =-names and “concentric themes” hold within Leveraged ETF product universe at record AUM, there is a potential monster “de-risking” flow today as 1) Options see Calls go out of the money and Dealers adjust hedges / Puts are bought with chunky NEGATIVE $Delta flow…and as 2) Leveraged ETF’s will sell huge $notional to rebalance the products vs these single-name moves (we estimate using pre-mkt prices at -$22B)…which will inherently then “feedback” with Discretionary risk-management and potential front-running of those flows.The corollary to the budding equity market puke is that Treasuries are rallying hard, with the 10-year US government bond yield now down to 4.53 per cent, the lowest in over a month.Ian Lyngen, a rates analyst at BMO, points out that this matters for Treasury market-specific reasons as well:The Treasury market rally itself is notable for several reasons. First, the outright magnitude of the drop in 10-year yields at >12 bp indicates that investors are anxious about the potential ramifications from a wholesale reevaluation of the tech sector – and potentially equity market more broadly. Moreover, a challenge of 4.50% in 10s has ramifications from a technical perspective and we’ll note that 4.488% represents two standard deviations below the 20-day moving-average. Said differently, entering a sub-4.50% trading range for 10s will represent a material challenge to the prior bond bearish narrative that has been in place since the November election. The overnight move has created an opening gap in 10s that comes in at 4.599% to 4.621% – the existence of which could persist for longer than might typically be the case in the event that the current stock market selloff has further to run. Monday’s departure point will make this week particularly relevant from a technical perspective insofar as the weekly close could readily recast near-term expectations. Today’s Treasury supply (2s and 5s) will undoubtedly benefit from the reversal in risk assets and the absence of meaningful economic data ahead of the Fed suggests that any move will have plenty of room to run. We’re also anticipating a dovish pause, a dynamic that should reinforce any further tone-shift favoring lower yields from here.For the rates/FX angle, back to Mizuho:We long expected Trump’s inauguration to be the peak of tariff fears and a “buy the rumour, sell the fact” moment for trades, but this Deepseek story provides a more fundamental equity flow story that in the short term will dominate any macro argument.I wouldn’t pay US 2s until we’re below 4.1% and closer to 4% (currently 4.17%), markets are likely to continue to buy duration as they rotate out of stocks with 4.4% for US 10s in mind. If we get down to 4.2% it’s due to a) Deepseek’s revelations being proven completely true and b) US data surprises continue to turn lower.An equity market selloff like this increases the probability of 1) a more dovish FOMC meeting this week, March is pricing at 10bps for a cut – we expect 25bps. 2) Trump perhaps holding off from more aggressive tariff actions at the end of this week on Canada/Mexico and waiting for the more likely timing of April’s deadlines.When equity markets selloff like this the first thing to happen is a position squeeze with margin calls forcing profit taking on winning trades elsewhere. Making consensus calls very quickly unravel. Long USD positioning has climbed to $33.8bn in the futures market and whilst it has been higher in previous USD cycles (up to $48bn in 2015/16) long USD is the obvious position unwind candidate with short EUR, GBP, CHF and JPY the beneficiaries.This could open us up 1.07 in EURUSD. From this level of EUR short positioning it would take another 2% rally before the mkts net short position becomes flat.We had expected the end of the US exceptionalism to be a Q2 story. But if we combine a) US equity market outflows with b) seasonal quirks in European PMIs likely to see better growth data in Q1 and c) not enough priced for a Fed cut in March – perhaps that time is now, and we all need to revise up Q2 EUR/USD forecasts.We’ll keep adding to this post as the emails keep landing, so if you’re reading this via a cache site you’re likely to be missing a lot. Sign up for a free Alphaville login here.Further reading:— Chinese start-ups such as DeepSeek are challenging global AI giants (FT)— How small Chinese AI start-up DeepSeek shocked Silicon Valley (FT)

شاركها.
© 2025 جلوب تايم لاين. جميع الحقوق محفوظة.