Skip to main content

Command Palette

Search for a command to run...

Developers Guide to Running LLMs Locally with Ollama

Updated
5 min read
Developers Guide to Running LLMs Locally with Ollama

Introduction: Why Run Your Own LLMs Locally?

Ever wondered what actually happens when you type a prompt into ChatGPT?
Or better — what if you could run the same experience locally, without APIs, rate limits, or data leaving your machine? 🤯

That curiosity is exactly what led me to Ollama.

Running LLMs locally is no longer a “research lab only” thing. With modern tools and optimized models, developers can now spin up a personal AI assistant that feels shockingly close to ChatGPT — all on a laptop.

In this article, we’ll build our own ChatGPT-like setup locally using Ollama, understand what’s happening under the hood, and explore how far you can push it.


What Problem Are We Solving?

Cloud-based LLMs are amazing, but they come with trade-offs:

❌ Data privacy concerns
❌ API costs & usage limits
❌ Internet dependency
❌ Limited control over models

Local LLMs solve this by giving you:

✅ Full control
✅ Offline usage
✅ Zero per-request cost
✅ Customization freedom

That’s where Ollama shines.


What Is Ollama?

Ollama is a developer-friendly tool that lets you download, manage, and run large language models both locally and via Ollama Cloud using a simple CLI and API.

Think of it as:

Docker for LLMs 🐳🤖

Instead of building inference pipelines from scratch, Ollama abstracts away:

  • Model downloads

  • Quantization

  • GPU/CPU optimizations

  • Runtime management

You just run a command — and boom, your local ChatGPT is alive.


Step 1: Install Ollama

macOS / Linux

curl -fsSL https://ollama.com/install.sh | sh

Windows

Download directly from:
👉 https://ollama.com

Once installed, verify:

ollama --version

If that works, you’re good to go ✅


Step 2: Log In to Ollama (Local vs Cloud Models)

Before running models, it’s important to understand that Ollama supports both local and cloud-based models.

To access cloud features, you’ll need to log in:

ollama login

This connects your CLI to your Ollama account.

Why this matters

  • Local models

    • Run entirely on your machine

    • Fully offline

    • No usage limits

    • Maximum privacy

  • Cloud models

    • Run on Ollama’s infrastructure

    • Faster startup for large models

    • Subject to hourly and weekly usage limits

    • Useful when your hardware is limited

The key thing I love:
you choose per model whether it runs locally or in the cloud.


Step 3: Run Your First Local LLM

Let’s start with a popular ChatGPT-like local model:

ollama run llama3

That’s it. This will download and run llama3 on your local

No API keys.
No environment variables.
No cloud setup.

You’re now chatting with a local LLM running on your own machine.


Step 4: Understanding What’s Happening Under the Hood

Behind the scenes, Ollama:

  • Downloads the model weights

  • Optimizes them for your hardware

  • Starts a local inference server

  • Streams responses token-by-token

If you’ve used ChatGPT before, the experience feels… familiar 👀


Step 5: Running Ollama as a Local ChatGPT API

Ollama also exposes a local HTTP API, which means you can connect it to:

  • Web apps

  • Desktop apps

  • VS Code extensions

  • Custom UIs

Start the Ollama server:

ollama serve

Example API request:

curl http://localhost:11434/api/generate -d '{
  "model": "llama3",
  "prompt": "Explain transformers like I am a developer"
}'

Congrats — you now have a self-hosted ChatGPT backend 🔥


Step 6: Building a Simple Chat UI (Concept)

At this point, the architecture looks like this:

Frontend (React / Next.js / CLI)
        ↓
Local Ollama API
        ↓
LLM Model (LLaMA, Mistral, etc.)

You can:

  • Build a React chat UI

  • Store conversation history

  • Add system prompts

  • Create AI agents

  • Run everything offline

This is where things get really fun.


Useful Models to Try

List installed models:

ollama list

Popular options:

  • llama3 – Best ChatGPT-like experience

  • mistral – Fast & lightweight

  • codellama – Code-focused

  • phi – Small & efficient

Switch instantly:

ollama run mistral

Common Pitfalls (Learned the Hard Way)

🧠 RAM matters — 8GB works, 16GB is smoother
🔥 CPU-only works, GPU is a big win
📦 Larger models ≠ always better
📝 Prompt quality matters more than model size

Local LLMs reward clear intent, not brute force.


When Should You Use Local LLMs?

Perfect for:

  • Learning how LLMs work

  • Prototyping AI products

  • Internal tools

  • Privacy-sensitive data

  • AI experimentation & agents

Not ideal for:

  • Massive production scale

  • Multi-user public apps (yet)


What I Personally Love About This Setup

Running ChatGPT locally feels empowering.

You stop being a consumer of AI
and start becoming a builder with AI 🛠️🤖

It completely changes how you think about:

  • App architecture

  • Privacy

  • AI UX

  • Tooling

And honestly?
Once you try this, cloud-only AI feels limiting.


Conclusion: Your AI, Your Rules

With Ollama, building your own ChatGPT is no longer magic — it’s just good tooling.

If you’re a developer exploring AI, this is one of those setups that:

  • Teaches you fast

  • Unlocks experimentation

  • Makes AI feel tangible

Next steps

  • Build a chat UI

  • Add memory

  • Connect tools

  • Create AI agents

Local AI is here — and it’s 🔥


Key Takeaways

  • Ollama supports both local and cloud models

  • Cloud usage comes with hourly & weekly limits

  • Local models give full privacy and control

  • You can build a ChatGPT-like experience in minutes

  • Perfect playground for hands-on AI experimentation


More from this blog

A

Akash Blog

7 posts