Runbook: vLLM Runtime
Status: living
Use this when Dubnium’s NixOS configuration manages vllm.service, but the
vLLM Python/CUDA runtime is installed outside the Nix store.
NixOS owns:
vllm.service/var/lib/vllm/var/lib/dubnium/modelsCUDA_VISIBLE_DEVICESai.dubnium- Tailscale-only firewall exposure
The external runtime owns:
/var/lib/vllm/venv- Python, PyTorch, vLLM, and CUDA wheel packages inside that venv
This keeps rebuilds fast and avoids compiling PyTorch, CUDA, CuPy, MAGMA,
OpenCV CUDA, or vLLM during nixos-rebuild.
Scope
This runbook covers the current hybrid-Nix phase. NixOS is authoritative for
the service contract, host alias, firewall exposure, users, directories,
environment, and health checks. The Python/CUDA package runtime is mutable
operator-managed state under /var/lib/vllm/venv.
A pure-Nix vLLM runtime is a separate later phase. That phase should be treated as build-infrastructure work: it likely needs a dedicated CUDA builder, an Attic/Cachix/nix-serve cache, or an upstream Nixpkgs packaging path that avoids rebuilding the full CUDA/PyTorch/vLLM stack on every workstation.
Preconditions
- the host has been switched to a Dubnium generation with
dubnium.vllm.runtime = "external" uvis available in the operator shell- NVIDIA GPU access works on the host
- model weights are already seeded under
/var/lib/dubnium/models
Check GPU visibility first:
nvidia-smi
1. Create The Runtime Directory
sudo install -d -m 0755 -o root -g root /var/lib/vllm
sudo install -d -m 0755 -o root -g root /var/lib/dubnium/models
The NixOS module also declares these directories. These commands are safe to
run before or after nixos-rebuild switch.
2. Install vLLM Into The Managed venv
Create a fresh venv:
sudo uv venv --python /run/current-system/sw/bin/python3.12 --python-preference only-system /var/lib/vllm/venv
Install vLLM with CUDA/PyTorch wheels selected by uv:
sudo env UV_TORCH_BACKEND=auto uv pip install --python /var/lib/vllm/venv/bin/python vllm
This is intentionally the only default install command. Do not install audio,
JAX, TPU, or broad framework extras during workstation bring-up. In particular,
avoid commands that reinstall torchvision, torchaudio, or jax unless a
specific workload requires them and the host has enough memory to resolve,
download, install, and import that dependency set. The default Dubnium vLLM
path is text inference against a local model bundle.
The upstream vLLM GPU install docs recommend uv pip install vllm --torch-backend=auto so uv can select the PyTorch backend from the installed
CUDA driver. If that flag is not supported by the installed uv, use the
environment variable form above or update uv.
If the installed uv supports newer PyTorch backends, use a specific CUDA
backend that matches the host driver. For CUDA 13.0:
sudo uv pip install --python /var/lib/vllm/venv/bin/python --torch-backend=cu130 vllm
Some packaged uv versions may not list cu130 yet. On those versions, keep
the default install command above, or upgrade uv to a version that supports
the host CUDA backend. Do not use a broad PyTorch-family reinstall as a
workstation bring-up workaround; it can pull optional packages such as
torchaudio and exceed available memory.
If PyTorch CUDA selection is wrong after the default install, recreate the venv
and rerun the vLLM install with a supported UV_TORCH_BACKEND or
--torch-backend value rather than layering more framework packages into the
same environment.
Host config adds the venv’s PyTorch and NVIDIA wheel library directories to
LD_LIBRARY_PATH. That is required because the external venv is outside the Nix
store and vLLM’s CUDA extension must be able to find libtorch, libcudart,
and the CUDA wheel libraries at runtime.
The service also sets CC to Nix’s C compiler wrapper. Triton may compile a
small runtime helper during vLLM startup even when vLLM itself is installed in
the external venv.
Keep dubnium.vllm.runtime = "package" available for the future pure-Nix
phase, but do not use it for this external-runtime path.
3. Verify The Runtime
Check the executable:
/var/lib/vllm/venv/bin/vllm --version
Check CUDA through PyTorch:
/var/lib/vllm/venv/bin/python -c "import torch; print(torch.cuda.is_available())"
Expected:
True
If this prints False, fix the venv/PyTorch/CUDA wheel selection before
debugging Dubnium’s systemd service.
4. Verify The Local Model Bundle
Dubnium keeps model weights out of Git and out of the Nix store. The vLLM service should point at a local model bundle.
MODEL_DIR=/var/lib/dubnium/models/qwen2.5-coder-14b-instruct
If the model bundle was seeded from removable media, verify that the local bundle exists:
test -f "$MODEL_DIR/config.json"
test -f "$MODEL_DIR/model.safetensors.index.json" || test -f "$MODEL_DIR/model.safetensors"
If SHA256SUMS exists, verify it:
cd "$MODEL_DIR"
sudo sha256sum -c SHA256SUMS
If vLLM tries to download model files on first start, the configured model path or local bundle is wrong.
5. Start The Service
Start compute mode or restart the service directly:
sudo systemctl start compute.target
sudo systemctl restart vllm.service
Inspect service state:
systemctl status vllm --no-pager
journalctl -u vllm -n 100 --no-pager
systemctl show vllm.service -p ExecStart --value
systemctl show vllm.service -p Environment --value
If /var/lib/vllm/venv/bin/vllm does not exist or is not executable,
vllm.service should fail before startup with an executable check error. That
means the NixOS service contract is present but the external runtime has not
been installed yet.
6. Verify The API
From the Dubnium host:
getent hosts ai.dubnium
curl http://ai.dubnium:8000/v1/models
From another tailnet machine:
curl http://<dubnium-tailnet-name>:8000/v1/models
ai.dubnium is host-local unless the tailnet DNS or client hosts file also
maps that name to the Dubnium node’s Tailscale IP.
References
- vLLM GPU installation docs: https://docs.vllm.ai/en/latest/getting_started/installation/gpu/
- Model seeding policy: ADR-0008
- Tailscale exposure: Tailscale