Local LLMs

Phase 20

Prerequisite: n8n

Running local language models is one of the most hardware-sensitive things in this guide.

If you have a capable GPU, this section can be genuinely useful.

If you do not, it can still be educational, but keep your expectations realistic:

CPU-only inference works
it is just much slower

The stack here is:

Ollama for local model serving
Open WebUI for a browser-based chat interface

Decide Whether This Is Worth It on Your Hardware

Before deploying anything, be honest about your system:

No GPU: useful mostly for experimentation and small models
Midrange NVIDIA GPU: practical for a lot of local inference use cases
High-VRAM GPU: much better for larger models and multiple users

This is one of the few workloads in the guide where “buy more hardware” is often the real answer.

Create the Directories

mkdir -p ~/docker/appdata/{ollama,openwebui}
mkdir -p ~/docker/compose/llm

Create the Baseline Compose File

Create ~/docker/compose/llm/ollama.compose.yml:

services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    ports:
      - "11434:11434"
    volumes:
      - /home/<your-user>/docker/appdata/ollama:/root/.ollama
    restart: unless-stopped
    networks: [proxy]

  openwebui:
    image: ghcr.io/open-webui/open-webui:latest
    container_name: openwebui
    ports:
      - "3001:8080"
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
    volumes:
      - /home/<your-user>/docker/appdata/openwebui:/app/backend/data
    depends_on:
      - ollama
    restart: unless-stopped
    networks: [proxy]

networks:
  proxy:
    external: true

Start it:

docker compose -f ~/docker/compose/llm/ollama.compose.yml up -d

Pull a model:

docker exec -it ollama ollama pull llama3.2

Then open:

http://<nixos-ip>:3001

GPU Acceleration on NixOS

If the NixOS VM has an NVIDIA GPU passed through to it, you will want GPU-aware container support on the host.

On NixOS, that generally means configuring NVIDIA support in the VM and enabling the NVIDIA container toolkit so Docker containers can see the device.

The exact configuration depends on:

your GPU generation
whether you are using proprietary vs open NVIDIA drivers
how the PCI passthrough is presented to the VM

Because of that variability, I recommend treating GPU enablement as a separate sub-task after the baseline CPU-only stack works.

That may sound cautious, but it is the pragmatic path.

Storage Planning

Models can consume a lot of disk space.

Keep that in mind before you start pulling everything that looks interesting.

If local SSD space is limited, be selective. Unlike photos or documents, LLM model blobs are replaceable. You do not need to hoard them.

Keep It Private

Open WebUI is very tempting to expose publicly because it feels like “just a chat app”.

Do not treat it that casually.

It is still an admin-adjacent service tied to your local model host and data.

My recommendation:

LAN only
or Tailscale only

unless you have a clear public-access use case and understand the implications.

Next Steps

Next, we will move to document management with Paperless-ngx.

Proceed to Paperless-ngx.

Last updated: March 2026

← n8n Paperless-ngx →