28 Dec 2025 11 min read DevOps

Bare-Metal Kubernetes Storage: Longhorn Setup (Part 2)

After setting up NFS storage in Part 1, I still had a problem: if my Unraid NAS went down, everything stopped. In this guide, I install Longhorn v1.10.1 on my bare-metal Kubernetes cluster for distributed storage with automatic failover. I tested it by shutting down a worker node, the data survived.

In Part 1, I walked through setting up NFS storage with Unraid for my bare-metal Kubernetes cluster. That gave me two NFS storage classes (nfs-unraid and nfs-unraid-retain) with ReadWriteMany support. That solved the immediate problem of shared storage, but it introduced a single point of failure: my Unraid NAS. If that server goes down, everything depending on those NFS volumes stops working.

That's when I started looking at distributed storage solutions.

The Problem: I Still Had a Single Point of Failure

Don't get me wrong, NFS with Unraid is fantastic for certain workloads. I use it for config files, shared media libraries, and anything that needs ReadWriteMany access. But here's what kept me up at night:

Database workloads: My PostgreSQL instances were on local-path storage (node-local, no redundancy)
Application state: If a worker node died, pods would restart but lose their data
No automatic failover: I couldn't just drain a node and expect volumes to follow pods elsewhere

I needed storage that could:

Replicate data across multiple nodes automatically
Survive node failures without manual intervention
Move with pods when they reschedule
Not depend on external infrastructure (no NAS required)

That's distributed block storage, and for Kubernetes, that means Longhorn.

What is Longhorn (and Why Should You Care)?

Longhorn is a distributed block storage system built specifically for Kubernetes. Think of it as turning your cluster's worker node disks into a resilient storage pool that works like this:

┌─────────────────────────────────────────────────────────┐
│  Pod requests 10GB volume (3 replicas)                  │
└───────────────────┬─────────────────────────────────────┘
                    │
        ┌───────────▼──────────────┐
        │  Longhorn Manager        │  (Decides where replicas go)
        └───────────┬──────────────┘
                    │
    ┌───────────────┼───────────────┐
    │               │               │
┌───▼────┐    ┌────▼───┐    ┌─────▼──┐
│Worker-1│    │Worker-2│    │Worker-3│
│10GB    │    │10GB    │    │10GB    │
│Replica │◄───►Replica │◄───►Replica │  (Data synchronised)
└────────┘    └────────┘    └────────┘

If Worker-2 dies, Longhorn automatically:

Detects the failure
Promotes one of the remaining replicas to primary
Creates a new replica on another healthy node
Your application keeps running (brief I/O pause during failover)

I learned this the hard way when I accidentally shut down a worker during testing. The PostgreSQL pod paused for about 10 seconds, reconnected, and kept running like nothing happened. That's when I knew this was the solution I needed.

Distributed Storage vs. NFS: Which Do You Actually Need?

Here's how I think about my storage now:

Storage Type	StorageClasses	Best For	Avoid For	Why I Use It
Local-path	local-path	Caches, temp data, build artifacts	Anything stateful	Fastest, but data dies with the node
NFS (Unraid)	nfs-unraid, nfs-unraid-retain	Config files, media, ReadWriteMany	Databases, high IOPS	Shared access, massive capacity, but single point of failure
Longhorn	longhorn, longhorn-gp, longhorn-ha, longhorn-io	Databases, stateful apps, HA workloads	Massive files (TB+)	Survives node failures, follows pods

The sweet spot: Use all three. Longhorn for resilience, NFS for sharing, local-path for speed.

Prerequisites: What You Need Before Installing

Longhorn uses iSCSI under the hood, so each worker node needs a few packages. I'm running Ubuntu 22.04 on my worker nodes, adjust package names if you're on a different distro.

Quick Check: Do You Have What You Need?

SSH into one of your worker nodes and run:

# Check for required packages
dpkg -l | grep open-iscsi
dpkg -l | grep nfs-common

# Check if iscsid service exists
systemctl status iscsid

If you see "unit not found," you'll need to install the packages.

Installing Prerequisites on All Workers

I have 6 worker nodes, so I automated this with a quick loop. Update the IP range to match your cluster:

# From your local machine (not inside the cluster)
# Replace with your actual worker node IPs
for ip in 192.168.1.10 192.168.1.11 192.168.1.12 192.168.1.13 192.168.1.14 192.168.1.15; do
  echo "=== Configuring $ip ==="
  ssh ubuntu@$ip "sudo apt update && sudo apt install -y open-iscsi nfs-common"
  ssh ubuntu@$ip "sudo systemctl enable iscsid && sudo systemctl start iscsid"
  ssh ubuntu@$ip "sudo systemctl is-active iscsid"  # Should return "active"
done

Why nfs-common? Longhorn can export volumes as NFS for ReadWriteMany scenarios. You don't have to use this feature, but it's nice to have the option.

Installing Longhorn via Helm

I prefer Helm for Longhorn because it makes upgrades and configuration changes much cleaner than raw YAML manifests.

Step 1: Add the Longhorn Helm Repository

helm repo add longhorn https://charts.longhorn.io
helm repo update

Step 2: Install Longhorn

This command sets a few important defaults:

helm install longhorn longhorn/longhorn \
  --namespace longhorn-system \
  --create-namespace \
  --version 1.10.1 \
  --set defaultSettings.defaultReplicaCount=2 \
  --set defaultSettings.storageMinimalAvailablePercentage=15 \
  --set defaultSettings.storageOverProvisioningPercentage=100

What those settings mean:

defaultReplicaCount=2: Each volume gets 2 copies by default (survives 1 node failure)
storageMinimalAvailablePercentage=15: Stop scheduling replicas if a node has less than 15% free space
storageOverProvisioningPercentage=100: Allow volumes totaling 200% of actual space (thin provisioning)

Step 3: Wait for Pods to Start

Longhorn deploys a bunch of components. Watch them come up:

kubectl get pods -n longhorn-system -w

You'll see:

longhorn-manager: One pod per worker node (DaemonSet)
longhorn-ui: The web dashboard (2 replicas)
longhorn-driver-deployer: Sets up the CSI driver
longhorn-csi-plugin: One per node for volume attachment

Wait until everything shows Running. On my 6-node cluster, this took about 3 minutes.

The Hotfix You Need to Apply (Seriously, Don't Skip This)

Longhorn v1.10.1 shipped with some critical bugs (nil-pointer crashes, volume migration issues, and replica balancing stalls). The Longhorn team released hotfix-2 to address these, but the upgrade process is a little tricky because Longhorn blocks "downgrades" by default.

Here's the one-liner that works:

helm upgrade longhorn longhorn/longhorn \
  --namespace longhorn-system \
  --reuse-values \
  --set preUpgradeChecker.upgradeVersionCheck=false \
  --set image.longhorn.manager.tag=v1.10.1-hotfix-2

Why this works:

--reuse-values: Keeps your existing config (replica count, storage settings, etc.)
upgradeVersionCheck=false: Disables the version validation that would block the "downgrade"
image.longhorn.manager.tag: Switches to the hotfix image

Wait for the rollout:

kubectl rollout status daemonset/longhorn-manager -n longhorn-system

Verify the hotfix is running:

kubectl get daemonset longhorn-manager -n longhorn-system \
  -o jsonpath='{.spec.template.spec.containers[0].image}'

Should output: longhornio/longhorn-manager:v1.10.1-hotfix-2

Understanding Storage Classes: Why Longhorn Creates Two Automatically

When you install Longhorn via Helm, it automatically creates two StorageClasses:

kubectl get storageclass | grep longhorn

Output:

longhorn (default)     driver.longhorn.io   Delete   Immediate   true    5m
longhorn-static        driver.longhorn.io   Delete   Immediate   true    5m

Here's what they're for:

1. `longhorn` (The Default)

This is your go-to storage class. When you create a PVC without specifying a storageClassName, it uses this.

Default settings (from my installation):

3 replicas (survives 2 node failures)
Delete reclaim policy (volume deleted when PVC is deleted)
ext4 filesystem

Check the details:

kubectl get storageclass longhorn -o yaml

You'll see numberOfReplicas: "3" in the parameters section.

2. `longhorn-static`

This is for edge cases where you're manually creating Longhorn volumes and want to bind them to specific PVCs. I haven't needed this yet, but it's there if you're migrating from another storage system.

Creating Custom Storage Classes (When and Why)

The default longhorn class works great for most things, but I created a few custom classes for specific use cases:

Class	Replicas	Reclaim Policy	When I Use It
`longhorn` (default)	3	Delete	General workloads (databases, app state)
`longhorn-gp`	2	Delete	Less critical data where I want to save space
`longhorn-ha`	3	Retain	Critical data that must survive PVC deletion
`longhorn-io`	1	Delete	Apps with built-in replication (Redis Cluster, Cassandra)

Here's how I created them:

Creating a 2-Replica Class (Space-Optimized)

Save this as longhorn-gp.yaml:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: longhorn-gp
provisioner: driver.longhorn.io
allowVolumeExpansion: true
reclaimPolicy: Delete
volumeBindingMode: Immediate
parameters:
  numberOfReplicas: "2"
  staleReplicaTimeout: "2880"
  fsType: "ext4"
  dataLocality: "best-effort"  # Try to keep one replica on the pod's node

Apply it:

kubectl apply -f longhorn-gp.yaml

Why 2 replicas? My worker nodes have limited disk space (workers 1-3 have ~20GB free each). Using 2 replicas instead of 3 saves 33% of storage while still surviving a single node failure.

Creating a Retain-Policy Class (Critical Data)

For databases I can't afford to lose accidentally, I created this:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: longhorn-ha
provisioner: driver.longhorn.io
allowVolumeExpansion: true
reclaimPolicy: Retain  # Volume survives even if PVC is deleted
volumeBindingMode: Immediate
parameters:
  numberOfReplicas: "3"
  staleReplicaTimeout: "2880"
  fsType: "ext4"
  dataLocality: "best-effort"

The Retain policy saved me once: I accidentally deleted a PVC for a test database. With Delete policy, the data would've been gone instantly. With Retain, the Longhorn volume still existed. I just recreated the PVC and bound it to the orphaned volume. Crisis averted.

Creating a Single-Replica Class (Maximum Performance)

For applications that handle their own replication (like CockroachDB, Cassandra, or Elasticsearch), you don't need Longhorn to replicate. It's just overhead:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: longhorn-io
provisioner: driver.longhorn.io
allowVolumeExpansion: true
reclaimPolicy: Delete
volumeBindingMode: Immediate
parameters:
  numberOfReplicas: "1"
  staleReplicaTimeout: "2880"
  fsType: "ext4"
  dataLocality: "disabled"

Use case: I run a 3-node Redis Cluster. Each Redis instance uses longhorn-io because Redis itself handles replication. No point in Longhorn duplicating that work.

Exposing the Longhorn UI

Longhorn includes a web UI for monitoring volumes, replicas, and node storage. By default, it's only accessible inside the cluster. I exposed it using MetalLB (my LoadBalancer solution from the NFS guide).

Create longhorn-ui-lb.yaml:

apiVersion: v1
kind: Service
metadata:
  name: longhorn-frontend-lb
  namespace: longhorn-system
spec:
  type: LoadBalancer
  ports:
  - port: 80
    targetPort: 8000
    protocol: TCP
    name: http
  selector:
    app: longhorn-ui

Apply and get the IP:

kubectl apply -f longhorn-ui-lb.yaml
kubectl get svc longhorn-frontend-lb -n longhorn-system

In my case, MetalLB assigned an IP from my configured pool. Now I can access the Longhorn dashboard at that IP address (check with kubectl get svc longhorn-frontend-lb -n longhorn-system).

What you'll see in the UI:

Dashboard: Storage usage across nodes
Volume: List of all Longhorn volumes and their replica distribution
Node: Health status and available space per worker
Setting: Global Longhorn configuration

Testing Longhorn: Does It Actually Work?

Theory is great, but I wanted to see failover in action.

Test 1: Create a Volume and Write Data

Save as test-longhorn.yaml:

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: test-longhorn-pvc
  namespace: default
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: longhorn  # Use the default class
  resources:
    requests:
      storage: 1Gi
---
apiVersion: v1
kind: Pod
metadata:
  name: test-longhorn-pod
  namespace: default
spec:
  containers:
  - name: test
    image: nginx:alpine
    volumeMounts:
    - name: data
      mountPath: /data
    command: ["/bin/sh"]
    args: ["-c", "echo 'Testing Longhorn replication' > /data/test.txt && tail -f /dev/null"]
  volumes:
  - name: data
    persistentVolumeClaim:
      claimName: test-longhorn-pvc

Apply and verify:

kubectl apply -f test-longhorn.yaml

# Wait for pod to be running
kubectl get pod test-longhorn-pod -w

# Check that data was written
kubectl exec test-longhorn-pod -- cat /data/test.txt
# Output: Testing Longhorn replication

Test 2: Check Replica Distribution

# See which node the pod is running on
kubectl get pod test-longhorn-pod -o wide

# Check replica distribution in Longhorn
kubectl get volumes -n longhorn-system
kubectl get replicas -n longhorn-system | grep test-longhorn

You should see 3 replicas spread across different worker nodes. That's Longhorn doing its job.

Test 3: Simulate Node Failure (The Real Test)

This is where it gets fun. I wanted to see if Longhorn actually handled failover.

What I did:

Noted which node the pod was running on (k8s-worker-6)
SSH'd into the worker and ran sudo shutdown now
Watched what happened

What actually happened:

The process wasn't as instant as I expected, but the failover did work:

Node goes down: Worker shut down immediately
Detection delay (~2-3 minutes): Kubernetes marked the node as NotReady
Pod eviction (5+ minutes): Kubernetes waited for the default pod eviction timeout
Pod rescheduled: Kubernetes placed it on a healthy node
Volume reattached: Longhorn reattached the volume with data intact
Pod started: Application came back online

Manual intervention needed: Because the node went down hard, the volume was stuck "attached" to the dead node. I had to delete the stale VolumeAttachment to allow it to reattach elsewhere:

# Find the attachment
kubectl get volumeattachments | grep <volume-id>

# Delete it to allow reattachment
kubectl delete volumeattachment <attachment-name>

Total failover time: About 3-4 minutes (most of that was Kubernetes pod eviction timeout, not Longhorn).

I checked the data:

kubectl exec test-longhorn-pod -- sh -c "cat /data/test.txt"
# Output: Testing Longhorn replication

Still there. No data loss. The replicas on the remaining nodes (worker-4 and worker-5) kept the data safe.

Important notes about this test:

I was testing with a standalone Pod. In production, you'd use a Deployment or StatefulSet, which automatically recreates pods when nodes fail.
The manual VolumeAttachment deletion was needed because the node went down abruptly. In a graceful shutdown scenario, this cleanup happens automatically.
Longhorn did its job perfectly: the data was intact and accessible once the volume reattached. The delays were all Kubernetes-side, which has conservative timeouts to avoid prematurely killing pods on nodes that might come back.

Check replica status:

kubectl get replicas -n longhorn-system | grep <volume-name>

You'll see replicas marked as "stopped" on the failed node, and "running" on the healthy nodes. When the failed node comes back online, Longhorn automatically rebuilds the replica.

Cleanup

kubectl delete -f test-longhorn.yaml

Longhorn automatically deletes the volume (because we used the Delete reclaim policy).

Storage Comparison: What I Use Where

After running Longhorn for a couple of weeks alongside NFS and local-path, here's how my storage setup looks:

┌──────────────────────────────────────────────────────────────────┐
│  Application          StorageClass         Why                   │
├──────────────────────────────────────────────────────────────────┤
│  PostgreSQL           longhorn             HA, survives failures │
│  Redis Cluster        longhorn-io          Fast, app-level HA   │
│  Odoo App Data        nfs-unraid           Shared across pods   │
│  Odoo PostgreSQL      nfs-unraid-retain    DB data, must retain │
│  Config Maps          nfs-unraid           Easy to edit from NAS│
│  Build Caches         local-path           Speed, don't care    │
│  Prometheus Data      longhorn-gp          HA but not critical  │
└──────────────────────────────────────────────────────────────────┘

Current capacity:

Longhorn: ~309GB distributed across 6 workers
NFS-Unraid: 10TB+ (with parity protection)
Local-path: ~40GB per node (fast NVMe, no redundancy)

Monitoring Disk Usage: The One Thing That'll Bite You

Longhorn uses /var/lib/longhorn on each worker node by default. That's your root disk. I learned this the hard way when worker-2 started throwing disk pressure warnings.

My worker node disk situation:

Workers 1-3: 40GB total (~20GB free after OS)
Workers 4-6: 100GB total (~80GB free after OS)

Longhorn won't schedule replicas on nodes below 15% free space (our setting from installation). Keep an eye on this, especially if you're running on smaller VMs like I am.

Check disk usage from the Longhorn UI or:

kubectl get nodes.longhorn.io -n longhorn-system -o wide

Pro tip: If you have dedicated disks on your worker nodes, you can configure Longhorn to use those instead of the root disk. Check the Longhorn docs for "multiple disks per node."

Troubleshooting: Issues I Ran Into

Volume Stuck in "Attaching"

Symptom: Pod stays in ContainerCreating, volume never attaches.

What I did:

kubectl describe pvc <pvc-name>
kubectl describe volume <volume-name> -n longhorn-system
kubectl logs -n longhorn-system -l app=longhorn-manager --tail=100

Root cause (in my case): Worker node had run out of disk space. Longhorn couldn't create the replica.

Fix: Freed up space by cleaning old container images (docker system prune -a).

Replicas Not Scheduling

Symptom: Volume created, but only 1 replica instead of 3.

What I did:

kubectl get nodes.longhorn.io -n longhorn-system

Root cause: Two of my worker nodes had less than 15% free space (below the threshold).

Fix: Adjusted the threshold temporarily:

kubectl edit settings.longhorn.io storage-minimal-available-percentage -n longhorn-system
# Changed from 15 to 10

Not ideal long-term, but it got me through until I expanded disk space on those VMs.

What's Next: Stateful Workloads and Failure Scenarios

At this point, you've got:

NFS storage for shared files (Part 1)
Longhorn for distributed, fault-tolerant block storage (this post)
Multiple storage classes for different use cases

Now comes the interesting part: running actual stateful applications on Longhorn and seeing how it handles real-world failure scenarios.

Things I'm planning to cover in future posts:

Stateful applications: Running PostgreSQL, Redis, and other databases on Longhorn
Failure testing: What happens when a node crashes during a database write operation?
Volume snapshots: Creating point-in-time backups before risky operations (like schema migrations or major upgrades)
Backup strategies: Exploring Longhorn's backup feature with S3-compatible storage (~~MinIO~~ Garage on my Unraid server, because MinIO went full enterprise and abandoned their community, but hey, they'll still accept your bug reports on Slack!)
Storage performance: How does Longhorn compare to local-path for database workloads?

I'm still learning how these pieces work together in production. Some of this might be Part 3, some might be separate deep dives. We'll see what breaks first and what's worth writing about.

Questions or issues? Drop a comment below. I'm happy to help troubleshoot. My cluster is still evolving, and I learn something new every week.

Cluster specs (for reference):

7 nodes: 1 control plane + 6 workers
Kubernetes v1.34.2
Proxmox VMs across 2 physical servers
Calico CNI, MetalLB, NGINX Ingress
Storage Classes:
- Longhorn: longhorn (default), longhorn-gp, longhorn-ha, longhorn-io, longhorn-static
- NFS-Unraid: nfs-unraid, nfs-unraid-retain
- Local: local-path

The Problem: I Still Had a Single Point of Failure

What is Longhorn (and Why Should You Care)?

Distributed Storage vs. NFS: Which Do You Actually Need?

Prerequisites: What You Need Before Installing

Quick Check: Do You Have What You Need?

Installing Prerequisites on All Workers

Installing Longhorn via Helm

Step 1: Add the Longhorn Helm Repository

Step 2: Install Longhorn

Step 3: Wait for Pods to Start

The Hotfix You Need to Apply (Seriously, Don't Skip This)

Understanding Storage Classes: Why Longhorn Creates Two Automatically

1. longhorn (The Default)

2. longhorn-static

Creating Custom Storage Classes (When and Why)

Creating a 2-Replica Class (Space-Optimized)

Creating a Retain-Policy Class (Critical Data)

Creating a Single-Replica Class (Maximum Performance)

Exposing the Longhorn UI

Testing Longhorn: Does It Actually Work?

Test 1: Create a Volume and Write Data

Test 2: Check Replica Distribution

Test 3: Simulate Node Failure (The Real Test)

Cleanup

Storage Comparison: What I Use Where

Monitoring Disk Usage: The One Thing That'll Bite You

Troubleshooting: Issues I Ran Into

Volume Stuck in "Attaching"

Replicas Not Scheduling

What's Next: Stateful Workloads and Failure Scenarios

You might also like...

Kubernetes 1.35: In-Place Pod Vertical Scaling Reaches GA

Baremetal Kubernetes Storage: NFS Setup with Unraid (Part 1)

LocalStack: Run AWS Services Locally for Free

Set Up Your Own S3-Compatible MinIO Server in Under 5 Minutes with Docker (No AWS Bills)

Managing Docker Compose Applications with systemd - A Portainer Example

1. `longhorn` (The Default)

2. `longhorn-static`