02 Mar 2025 3 min read

How to Run Local LLMs with Ollama: Quick Setup Guide

If you've already checked out my guide on running local LLMs with LM Studio, you might be interested in another excellent option: Ollama. This tool offers a streamlined, command-line focused approach to running LLMs locally.

We can also use ollama in our code, integrate with Open WebUI or Continue (VS Code Extension) for a custom cursor IDE-like behaviour. I will publish something on these in the future.

What is Ollama?

Ollama is a lightweight, command-line tool that makes running open-source LLMs locally incredibly simple. It handles model downloads, optimisation, and inference in a single package with minimal configuration needed.

Quick Installation Guide

Windows

Visit Ollama's official website
Download and run the Windows installer
Follow the installation prompts
Once installed, Ollama runs as a background service

macOS

Download the macOS app from Ollama's website
Open the .dmg file
Drag Ollama to your Applications folder
Launch the app (it will run in the background)

Linux

Run this command in your terminal:

curl -fsSL https://ollama.com/install.sh | sh

Using Ollama (Basic Commands)

Once installed, using Ollama is as simple as opening a terminal or command prompt and running:

ollama run deepseek-r1:8b

This command downloads the deepseek-r1 8B model (if not already downloaded) and starts an interactive chat session.

If AI thinks this is funny, then there is a very slim chance of them taking over the world!

If you notice I added a --verbose flag to the run command, this shows us the stats at the end, such as tokens/s.

Popular Models to Try

Ollama makes it easy to experiment with different models:

ollama run llama3            # Meta's latest Llama 3 8B model
ollama run gemma:2b          # Google's lightweight Gemma model
ollama run codellama         # Specialised for coding tasks
ollama run neural-chat       # Optimised for conversation
deepseek-r1:8b               # My go to model

To see all available models:

ollama list remote

To see models you've already downloaded:

ollama list

Hardware Considerations

As explained in my LM Studio article, your hardware still matters. The same considerations apply:

Smaller models (2B-7B) work on modest hardware
Larger models (13B+) benefit greatly from a GPU
At least 16GB RAM recommended for a smooth experience

Ollama will automatically optimise for your available hardware.

Integrating with Other Tools

Ollama works well with several companion tools:

Open WebUI: A user-friendly chat interface for Ollama
Continue VS Code Extension: For coding assistance

Integrating with Code

One of Ollama's strongest features is its API, making it easy to integrate into your applications:

Python Integration

import requests

# Simple completion request
response = requests.post('http://localhost:11434/api/generate', 
    json={
        'model': 'deepseek-r1:8b',
        'prompt': 'Explain quantum computing in simple terms'
    }
)

print(response.json()['response'])

JavaScript/Node.js Integration

You can build complete chat applications, coding assistants, or content generation tools using this simple API.

const response = await fetch('http://localhost:11434/api/generate', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    model: 'deepseek-r1:8b',
    prompt: 'Write a function to calculate fibonacci numbers'
  })
});

const data = await response.json();
console.log(data.response);

You can build complete chat applications, coding assistants, AI agents or content generation tools using this simple API. I will be covering this in future posts also.

For More Details

For a deeper dive into running local LLMs, model selection, and optimisation techniques, check out my full guide on running LLMs with LM Studio. The same principles apply, though the interface is different.

Final Thoughts

Ollama offers a more developer-oriented approach to local LLMs compared to LM Studio's GUI-focused experience. It's perfect for those comfortable with the command line or looking to integrate local AI into their development workflows.