Member-only story

Debunking 10 Popular Myths About DeepSeek

I’ve heard too many wrong takes that need correction

Alberto Romero
5 min readFeb 4, 2025
  1. DeepSeek-R1 is more lobotomized than US models. DeepSeek’s models have indeed been trained to adhere to the narratives of the CCP (unsurprisingly). However, you can always download and fine-tune the open-source weights with any data you want or wait for someone else to build an app on top of DeepSeek and remove the censorship. So even if they’re by default unable to answer some questions (or even mention Xi Jinping’s name), they’re more malleable than ChatGPT, Gemini, or Claude.
  2. With just $5 million, DeepSeek can achieve what OpenAI needs billions to do. The tweet I linked is an intentionally exaggerated meme but I’ve seen people believe subtler forms of the same idea. No: DeepSeek can’t replicate OpenAI’s business (or Google’s or Anthropic’s) with that amount of money. The figure — $5.576 million to be precise — is what DeepSeek spent on pre- and post-training DeepSeek-V3. Deployment and inference costs aren’t included (inference scales with the number of users, which has surely increased orders of magnitude in the last few days). Staff salaries, failed runs, architecture experiments, data preprocessing, data gathering, GPUs, infrastructure costs — none of that is included. Sorry, but no one can compete — neither DeepSeek nor OpenAI — in this field at the highest level with $5 million. $500 billion is a crazy amount nonetheless but the DeepSeek’s $5 million miracle has been taken out of context.
  3. DeepSeek’s cheap achievements mean the scaling laws don’t work. Wrong. First, DeepSeek used plenty of compute to train V3 and R1 even if it’s little compared to ChatGPT. You can’t do what they did at home with your gaming GPU. You need the base model to be quite big (V3 has 671 billion parameters). Second, inference is the most expensive factor when you have high demand (I expect DeepSeek won’t be able to keep up), which requires providers to scale datacenter size. Third, the fact that you can optimize technology to “get more bang for your buck” doesn’t say anything about whether larger models provide better results. They do. Fourth, better base models (V3 or GPT-4o) are integral to creating better reasoning models (R1 or o1/o3). Fifth, there’s only so much you can squeeze out of existing components in the stack. Once you exhaust scrappy optimization you have to grow. Sixth, although all of this is true, companies shouldn’t be wasteful.
  4. Nvidia’s $600 billion crash happened because the stock was overpriced. I don’t think so. There are two explanations I think are better. A) DeepSeek used this premise strategically, with R1’s release serving two purposes: to create panic among US investors (did they really short Nvidia?) and to advance the pursuit of AGI to eventually “predict the entire financial market.” Like a fork in chess — brilliant. However, the idea makes no rational sense. DeepSeek proved it’s possible to make good AI models with “modest” compute resources — 2,788K H800 GPU hours is still unaffordable for 99.99% of people — which would make it easier for small companies to train and deploy their own, which further spreads AI usage, which in turn increases GPU demand, which benefits Nvidia. In this sense, Nvidia is, if anything, an underpriced asset. Perhaps DeepSeek agrees. Perhaps they realized the market reaction was, not rational, but inevitable. High-Flyer — DeepSeek’s backing Hedge Fund (also founded by CEO Liang Wenfeng) — is, after all, specialized in this exact thing. Hats off. B) Alternatively, the NASDAQ drop may have had little to do with DeepSeek and everything to do with China’s geopolitical movements. This isn’t brilliant but scary instead.
  5. Compared to U.S. AI labs, DeepSeek’s team is tiny. Not really. OpenAI built ChatGPT with less than 770 people (that’s the figure we were given during the boardroom coup in November 2023, a year after ChatGPT was released, so I’d say there were around 500 employees). Anthropic has ~630 employees today. DeepSeek employs ~200 people; fewer but not even by an order of magnitude. Google Brain + DeepMind is a bit larger, though (~2600) as are Meta’s AI divisions (FAIR + the generative AI team, totaling 1000–3000 people). So if you want to draw the line somewhere it shouldn’t be China vs US but startups vs incumbents.
  6. The app takes all of your private data. Yes, if you use the app, your IP, keystroke patterns, device info, etc. are stored on their servers. But this isn’t different from what Western AI/tech companies do. Again, you can use a distilled version of DeepSeek-R1 or Llama 3 locally (no risk there either from China or the US) or a fine-tuned version from a provider you trust.
  7. DeepSeek models are open source. Yes and no. V3 and R1 are open-weights models, which means you can download the model and do fine-tuning on top of it but that’s it. The data it was trained on hasn’t been published. Or the code. There’s a hierarchy within the open-source category that’s usually overlooked because it can get a bit technical (the curse of going for simplicity over precision). Neither V3 nor R1 are on the top ladder of that hierarchy. It’s not possible for scientists to fully reverse engineer DeepSeek’s models (HuggingFace is trying) or retrain them from scratch because information about data provenance, manipulation, preprocessing, synthetic sets, etc. is missing. (It’s still better than OpenAI and Anthropic, whose models rest at the bottom of the open-source hierarchy.)
  8. DeepSeek is Liang Wenfeng’s side project. There’s no debunking here. Yes, they used “leftover GPUs” (reportedly a cluster with 2048 H800 GPUs out of 50k Hoppers they have in total) to create V3 and R1. I wanted to add this here because it’s a rather humiliating fact for US AI companies. I find it funny.
  9. The hardware export controls don’t work. Wrong. I dedicated an entire section to this in my latest post but it bears repeating. Export controls aren’t a black-or-white matter. As Miles Brundage says, they can be imperfect and still not be counterproductive. DeepSeek CEO admitted that limited access to Nvidia chips was the main bottleneck to training a better model. They had to scramble to find adequate architectural and algorithmic changes. They still require additional chips to develop more advanced base models (V4, V5) and to provide adequate inference capacity for the DeepSeek app.
  10. DeepSeek is China’s “ChatGPT moment.” I wrote this myself but I have to add a caveat: DeepSeek is a ChatGPT moment in intensity but not in sentiment. Whereas ChatGPT was “Wow, this is amazing,” DeepSeek is more like “China is winning! The AI bubble has popped! The US market is crashing! Tech companies are dumb!” DeepSeek has taken the role of a catalyst of existing antipathies. That’s the very reason why I’m writing a post like this: people love to buy into simplistic narratives as long as they make them feel good and reinforce what they want to believe. (Some are rightly praising the company, the models, and the staff’s technical prowess but not many people know enough to make such discernments.)

Join The Algorithmic Bridge, a blog about AI that’s actually about people.

Alberto Romero
Alberto Romero

Written by Alberto Romero

AI & Tech | Weekly AI Newsletter: https://thealgorithmicbridge.substack.com/ | Contact: alber.romgar at gmail dot com

Responses (14)

What are your thoughts?