The Media is Trying to Scare You About DeepSeek—Don't let them

DeepSeek’s disruption of the LLM-as-a-Service market isn’t just another headline in the tech press—it’s a wake-up call.

Feb 03, 2025

For companies that use AI to solve real problems rather than wrap endless chatbot features, this is a game-changer. And yes, it’s making a few big tech executives very uncomfortable.

But why?

Ever since OpenAI unleashed ChatGPT on the world, the AI industry followed a simple formula: bigger is better. More parameters, more data, more compute. DeepSeek flips that model on its head. By leaning into techniques like distillation and quantization, they’re proving that smaller, cheaper models can be just as effective—if not more so—for targeted, high-value applications. This isn’t just a cost reduction; it’s an invitation for businesses to rethink their AI strategies.

This leaves businesses asking an important question, the one that is making those tech executives (and their investors) very uncomfortable: Why rent a cloud-sized LLM when a tailored, efficient one will do?

Commoditization Sparks Innovation

When a resource gets cheaper we can use more of it. As smaller, more efficient models become available, and compute costs continue to drop, we’ll see more experimentation and risk-taking with LLM-based features. Today, building a business on an LLM (or a core new feature) is expensive. The most powerful models are closed-source and only available as a service, and hosting your own open-source LLM requires massive compute resources.

This is changing—fast. Let’s explore some of the technical developments that DeepSeek has helped bring attention to.

Distillation: Learning from the Best, Faster

Distillation isn’t new, but DeepSeek’s execution is. The idea is straightforward: train a smaller model to mimic a larger one by "interviewing" it on thousands of examples. What’s new is the effectiveness. So effective, in fact, that researchers at Berkeley replicated the technique almost immediately.

This isn’t great news for companies like OpenAI, which rely on sheer scale to justify their costs. If a distilled model can approximate a multi-billion-dollar system, the whole “bigger is better” argument starts to fall apart. The moat around AI is eroding, and fast.

Quantization: Efficiency Without Excess

Distillation trims the fat; quantization makes what remains even leaner. By reducing the memory footprint of models—storing parameters in lower precision at a slight accuracy cost—quantization enables these smaller models to run on everyday hardware.

Here is the alt text for the image: A bar chart comparing the performance of five AI models—DeepSeek-R1, OpenAI-o1-1217, DeepSeek-R1-32B, OpenAI-o1-mini, and DeepSeek-V3—across six different benchmarks: AIME 2024, Codeforces, GPQA Diamond, MATH-500, MMLU, and SWE-bench Verified. Each model is represented by a different color or pattern in the bars. • AIME 2024 (Pass@1): DeepSeek-R1 (79.8%), OpenAI-o1-1217 (79.2%), DeepSeek-R1-32B (72.6%), OpenAI-o1-mini (63.6%), DeepSeek-V3 (39.2%). • Codeforces (Percentile): DeepSeek-R1 (96.3%), OpenAI-o1-1217 (96.6%), DeepSeek-R1-32B (90.6%), OpenAI-o1-mini (93.4%), DeepSeek-V3 (58.7%). • GPQA Diamond (Pass@1): DeepSeek-R1 (71.5%), OpenAI-o1-1217 (75.7%), DeepSeek-R1-32B (62.1%), OpenAI-o1-mini (60.0%), DeepSeek-V3 (59.1%). • MATH-500 (Pass@1): DeepSeek-R1 (97.3%), OpenAI-o1-1217 (96.4%), DeepSeek-R1-32B (94.3%), OpenAI-o1-mini (90.0%), DeepSeek-V3 (90.2%). • MMLU (Pass@1): DeepSeek-R1 (90.8%), OpenAI-o1-1217 (91.8%), DeepSeek-R1-32B (87.4%), OpenAI-o1-mini (85.2%), DeepSeek-V3 (88.5%). • SWE-bench Verified (Resolved): DeepSeek-R1 (49.2%), OpenAI-o1-1217 (48.9%), DeepSeek-R1-32B (36.8%), OpenAI-o1-mini (41.6%), DeepSeek-V3 (42.0%). The chart uses a legend at the top indicating each model’s color/pattern representation. DeepSeek-R1 is shown with blue diagonal stripes, OpenAI-o1-1217 in light gray, DeepSeek-R1-32B in solid blue, OpenAI-o1-mini in darker gray, and DeepSeek-V3 in light blue. The y-axis represents accuracy or percentile values, while the x-axis lists the benchmarks. — From: https://ollama.com/library/deepseek-r1:32b

Take, for example, a 32-bit distilled version of DeepSeek R1, quantized using the qwen2 architecture. It delivers performance comparable to OpenAI’s o1-mini and isn’t far behind o1-1217. It runs on an M1 MacBook Pro with 32GB of memory, spitting out 6–7 tokens per second. Not bad for a personal-use model—and a preview of what’s coming next.

What’s Next: Rethinking Compute Requirements

The old assumption: more compute = better results. DeepSeek is proving that thoughtful, well-scoped models can punch above their weight.

Focused Use Cases: A finely tuned model, deployed for a specific task, beats a general-purpose giant.
The Rise of Reasoning Models: Smaller models incorporating reasoning capabilities dramatically improve output quality. More tokens might be needed, but continued efficiency gains make that tradeoff increasingly viable.

Think back to the early 2000s: desktop computing power exploded, but software didn’t need to get significantly more efficient to keep up. LLMs are on a similar trajectory.

A Bright Future for Businesses Building on LLMs

I’m not advocating that any business use DeepSeek, but the focus on distillation and quantization is a huge win for those using LLMs for real business problems, not just for chasing GenAI hype. These innovations accelerate the availability of open-source reasoning models and improve the performance of mix-of-expert architectures.

For businesses—especially those optimizing operations, refining product development, or improving pricing strategies—this shift is both a cost-saving opportunity and a leap forward in practical AI application.

How do you see this playing out? Will smaller, cheaper models change the way your industry uses AI? Drop a comment or reach out—I’d love to hear your take.

Looking for help with your AI project? I’ve been working on AI tech for almost 10 years, starting with IBM Watson. I can help you see beyond the hype curve and find the real value in AI for your business.

If you’d like to learn more, schedule a 30 minute intro call with me below.

Schedule Now