Why I Removed My Local LLM and Went All-In on Cloud APIs

TL;DR: I ran Ollama with Gemma models on my i5-8350U laptop for months. The dream of "free, private, offline AI" crashed into reality: 2 tokens/second inference, mediocre output quality, and a security audit that flagged the whole setup. I switched to cloud APIs (Z.AI, Gemini, Claude) and never looked back.

The Dream of Local LLMs

Let me be honest — the idea of running an LLM on my own hardware felt right. No API keys to manage. No usage limits. No sending my code to someone else's servers. Total privacy. Offline access during my commute. What's not to love?

I installed Ollama, pulled Gemma 2 9B, and felt like a hacker from a sci-fi movie:

# The install that started it all
curl -fsSL https://ollama.com/install.sh | sh
ollama pull gemma2:9b
ollama run gemma2:9b

And it worked. The model loaded. It responded. I had AI on my laptop.

For about five minutes, I was thrilled.

Reality Check: My Hardware Wasn't Built for This

Here's my setup: an Intel i5-8350U with 24GB RAM and no dedicated GPU. That's a solid dev machine for running Docker, Laravel, and a React dev server simultaneously. But for LLM inference? It's like entering a Formula 1 race with a Honda Civic.

The numbers were brutal:

# Gemma 2 9B — barely usable
ollama run gemma2:9b
# >>> "Write a Laravel migration for a blog post table"
# ...waits 8 seconds...
# ...types at ~2 tokens/second...
# ...laptop fan sounds like a jet engine...

# Gemma 2 2B — faster but useless
ollama run gemma2:2b
# Faster inference (~8 tokens/sec) but the output quality?
# Let's just say it suggested I use `mysql_query()` in Laravel.

A 9B parameter model gave okay output at glacial speed. A 2B model was fast but produced code that looked like it was trained on PHP 4 tutorials. There was no winning here.

Meanwhile, my laptop was melting. The CPU sat at 95°C during inference. Battery? Gone in 40 minutes. I was trading my machine's lifespan for mediocre AI responses.

Cloud APIs: Fast, Cheap, and Actually Good

Then I started comparing. Same prompts, different providers:

Provider	Model	Speed	Quality	Cost
Local (Ollama)	Gemma 2 9B	~2 tok/s	Mediocre	Free
Z.AI	GLM-5.1	~40 tok/s	Excellent	Free tier
Google	Gemini 2.5 Flash	~35 tok/s	Great	Generous free tier
Anthropic	Claude Sonnet	~30 tok/s	Exceptional	Pay-per-use

Cloud APIs weren't just faster — they were orders of magnitude better. Code actually worked. Explanations made sense. Complex refactoring tasks that Gemma 2 9B couldn't handle were trivial for cloud models.

And the cost? Many providers offer free tiers that cover daily personal use. Even paid APIs cost pennies per day for typical developer workloads. I spend more on coffee than on API calls.

The Security Wake-Up Call

Here's the thing that really tipped the scales. I ran a security audit on my setup (shoutout to OpenClaw's healthcheck), and it flagged something I hadn't considered:

Small local models lack alignment safeguards. Without sandboxing (which adds another performance layer), running local models that can execute code or access files is a genuine risk.

A 2B parameter model doesn't have the training to consistently refuse harmful instructions. A 9B model is better, but still nowhere near what Claude or Gemini offers in terms of safety guardrails. And I was piping its output directly into my terminal.

# My OpenClaw setup — the security audit flagged this
models:
  local:
    provider: ollama
    model: gemma2:9b
    # ⚠️ No sandboxing configured
    # ⚠️ Model too small for reliable instruction refusal

The irony? I chose local for privacy and security, but small models without proper sandboxing are arguably less secure than trusting established API providers with robust safety training.

The Hybrid Compromise (That I Also Abandoned)

I tried the "best of both worlds" approach: cloud APIs as primary, local LLM as an emergency fallback for when the internet goes down.

# The hybrid config that gathered dust
providers:
  primary: zai
  fallback: google-gemini
  emergency: ollama  # Used exactly once in 3 months

Turns out, my internet is more reliable than Ollama. In three months, the "emergency fallback" triggered once — and the response was so bad I waited for connectivity to return and asked Claude instead.

The hybrid setup also had a hidden cost: maintaining Ollama, updating models, and keeping ~12GB of disk space reserved for model weights I barely used. It was like paying rent for an apartment I never visited.

The Final Decision

One Saturday morning, I opened my terminal and ran:

# The uninstall
ollama rm gemma2:9b
ollama rm gemma2:2b
sudo systemctl stop ollama
sudo systemctl disable ollama
# Freed 14GB of disk space. Felt like deleting an ex's number.

And honestly? I haven't missed it once.

What I Actually Use Now

My current setup is dead simple:

Z.AI (GLM-5.1): My daily driver for coding, writing, and general tasks. Fast, capable, and the free tier covers most of my usage.
Google Gemini: For research and multi-modal tasks (analyzing screenshots, reading PDFs).
Claude: For complex architecture decisions and code reviews where reasoning depth matters.

All routed through OpenClaw, which handles failover, rate limits, and context management:

# What my config looks like now — clean and reliable
defaults:
  imageGenerationModel:
    primary: openai/gpt-image-1
  imageModel: zai/glm-5.1

No fans screaming. No battery anxiety. No model updates. Just fast, high-quality AI that works.

The Takeaway

If you're a solo developer, indie hacker, or part of a small team — cloud APIs are the rational choice. The "run it locally" dream makes sense for enterprises with GPU clusters and data compliance requirements. For the rest of us?

Free cloud tiers outperform local models on consumer hardware
No maintenance — no model pulls, no updates, no config tweaking
Better safety — large providers invest heavily in alignment
Your laptop stays cool — worth more than any API savings

Local LLMs will keep getting better, and maybe in a few years, a MacBook M4 will run a model that rivals Claude. But today? The math doesn't lie. Cloud wins.

I'm not anti-local-AI. I'm pro-getting-things-done. And right now, cloud APIs let me get things done faster, cheaper, and better.

What's your setup? Still running local models, or have you made the switch? Drop a comment — I'm genuinely curious where people land on this.

Masud Rana is a Senior Software Engineer specializing in Laravel, Shopify, and React. He writes about developer tools, AI workflows, and practical engineering at notes.masud.pro.

Why I Removed My Local LLM and Went All-In on Cloud APIs

Why I Removed My Local LLM and Went All-In on Cloud APIs

The Dream of Local LLMs

Reality Check: My Hardware Wasn't Built for This

Cloud APIs: Fast, Cheap, and Actually Good

The Security Wake-Up Call

The Hybrid Compromise (That I Also Abandoned)

The Final Decision

What I Actually Use Now

The Takeaway

More from this blog

Building a Shopify App Backend with Laravel: OAuth, Webhooks, and Multi-Tenancy

Adding AI-Powered Product Descriptions to Your Shopify App

How I Built a Shopify Custom App with Laravel: A Step-by-Step Guide

I Shipped a SaaS MVP in 14 Days Using AI Coding Agents

Command Palette

Why I Removed My Local LLM and Went All-In on Cloud APIs

The Dream of Local LLMs

Reality Check: My Hardware Wasn't Built for This

Cloud APIs: Fast, Cheap, and Actually Good

The Security Wake-Up Call

The Hybrid Compromise (That I Also Abandoned)

The Final Decision

What I Actually Use Now

The Takeaway

More from this blog