Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Dubnium Documentation

Use this site as the operator entrypoint for installing, bringing up, and understanding the Dubnium workstation.

Primary target: the NixOS host named workstation. WSL is a headless validation target for shared modules and docs, not the deployed workstation.

Choose Your Path

Current Defaults

  • dubnium.vllm.enable = false: vLLM is opt-in for explicit compute testing.
  • dubnium.plano.enable = false: Plano routing is opt-in until its runtime is installed and validated.
  • Runtime and user secrets stay outside Nix source. See Runtime Secrets.
  • User-level Home Manager configuration comes from external/dotfiles.
  • Generated documentation is committed under web/docs/.
  • Flake input operations use dubctl, exposed as nix run .#dubctl and installed by default on the workstation.

Install Source Contract

  • Installer media labels are DUB-ISO and DUB-SEED.
  • Install bootstrap uses local source from media or checkout state, not an install-time GitHub token.
  • Post-install source reconciliation is explicit. See Post-Install Source Reconciliation.

Ownership Boundaries

OwnerResponsibility
DubniumNixOS system config, workstation services, install media, runtime units
DotfilesHome Manager user config, user shell, user-level tool configuration
Runtime secret providerHost and user secrets outside the Git-tracked Nix source
Model/router reposClient policy, routing schemas, and model-router behavior

Sanity Checks

nix flake check
sudo nixos-rebuild build --flake .#workstation
mdbook build

When building docs from Windows, run mdbook build inside the NixOS WSL distro with mdbook and mdbook-mermaid in the shell.

Known Warnings

  • mdbook-mermaid may warn about a minor mdBook version mismatch; that warning is non-fatal when the HTML backend finishes successfully.
  • vLLM runtime setup should avoid broad PyTorch, audio, JAX, or TPU extras unless they are explicitly required.

Start Here

Local Inference

Memory System

Architecture

External Sources

Local Docs Viewer

This repository includes mdBook config for local browsing only. mdBook is not a Dubnium OS dependency and does not need to be installed in the target system configuration.

nix shell nixpkgs#mdbook nixpkgs#mdbook-mermaid
mdbook serve --open

Generated output goes to web/docs/.

Decisions

Runbooks

WSL

Runbook: Fresh Install

Status: living

Use this when installing Dubnium from a NixOS live USB onto a fresh machine.

Primary checklist:

Key Rules

  • Decide disk layout before writing partitions.
  • After booting from the USB installer, verify the tools needed to inspect or extract the prepared repo source are available.
  • Because dubnium is private, use the custom installer USB as the preferred source path instead of assuming live GitHub access will work.
  • The custom installer USB bakes a source export into the live image; use unpack-dubnium to extract it to ~/local/src/dubnium.
  • The same physical USB should carry the materialized model bundle on DUB-SEED, so first boot does not depend on a model-provider download.
  • Generate hosts/workstation/hardware-configuration.nix from the real target mount layout.
  • After first boot, reconcile install-time source changes into a normal Git checkout before treating them as repo history.
  • Review host options before install.
  • Boot into a desktop-default system first.
  • Validate mode status before testing transitions.

First Boot Expectations

  • current mode should classify as desktop
  • the selected user’s Home Manager configuration should be present from the Dubnium dotfiles profile
  • vLLM should not be active
  • studio-local overlay services should not be active unless requested
  • /run/mode-controller should exist

Do not start compute testing until the desktop baseline is observable and repeatable.

Custom Installer Quick Path

If booted from the Dubnium custom installer USB:

install-dubnium-from-usb

The one-shot command partitions and formats the selected disk, unpacks the baked source snapshot, generates the workstation hardware config, and runs nixos-install. By default it then sets the normal user’s password inside the installed system with passwd; use --password-mode hash for the older host-local hash flow or --password-mode skip when another login path already exists. With no arguments it prints lsblk, prompts for the target whole disk, defaults to btrfs, copies the install snapshot into the installed system, and requires final y/N confirmation unless --yes is passed.

Manual path:

unpack-dubnium
cd ~/local/src/dubnium

Then follow the fresh-install checklist from the partitioning step onward and install with:

sudo nixos-install --flake .#workstation

After first boot, restore the selected model seed from the USB model bundle as described in Model Seeding.

If the install used the custom source snapshot or another export without .git history, follow Post-Install Source Reconciliation before committing or pushing install-time changes.

Runbook: First Bring-Up

Status: living

Use this when the target machine already runs NixOS or can build/switch from the repo.

Primary checklist:

Success Criteria

  • nixos-rebuild build --flake .#workstation succeeds.
  • nixos-rebuild switch --flake .#workstation succeeds.
  • configctl doctor succeeds.
  • mode status, mode current, and mode desired work.
  • /run/mode-controller exists and contains live state files.
  • desktop.target and compute.target exist.
  • vLLM is inactive in desktop.
  • studio-local can be requested and removed as a desktop overlay.

Immediate Failure Buckets

  • generated hardware configuration does not match the host
  • NVIDIA/CUDA evaluation or runtime issue
  • graphical target/session mismatch
  • mode controller tools not installed
  • observer reports false success or conflicting state

If mode state looks wrong, prefer fixing observation before adding transition logic.

Runbook: Custom Installer USB

Status: living

Use this when installing Dubnium from private installer media without relying on GitHub credentials during the live install.

The current Dubnium installer flow writes the custom ISO to one physical USB stick as a raw disk image, matching Rufus “DD image mode” behavior:

dubnium-installer.iso -> whole USB disk

The installer image bakes an exported source snapshot of this repo and the external/dotfiles submodule into the live system. The snapshot excludes .git directories, so it is source content rather than a Git working copy with history. Treat the USB as private media because it contains the private Dubnium source.

Model seed bundles are separate from raw USB writing. Put the materialized bundle on separate media, or build it into a future image format explicitly.

What This Provides

  • no GitHub token during install
  • no install-time private GitHub clone
  • git, jq, rsync, vim, and install helpers in the live environment
  • unpack-dubnium, which unpacks the baked source snapshot to:
~/local/src/dubnium
  • raw whole-disk USB writing for the custom installer ISO

Build The Installer ISO

Before baking, make sure the repo and submodule state are intentionally clean or intentionally staged. The flake source snapshot only sees tracked files.

git status --short
git -C external/dotfiles status --short

scripts/build-installer-iso.sh \
  --iso ./dubnium-installer.iso

By default the script ensures the current Dubnium default seed bundle idempotently for separate seed media. The seed contract is model-agnostic: the seed must be a materialized model directory with config.json and SHA256SUMS.

Detection first checks DUBNIUM_SEED_MODEL, then common paths beside the repo for the current default bundle.

Use --seed-model to override detection, --no-seed-download to require a pre-existing bundle, or --no-seed-model to build installer-only media.

The script is a wrapper around this build:

nix --extra-experimental-features 'nix-command flakes' \
  build .#nixosConfigurations.installer.config.system.build.isoImage

The ISO appears under:

result/iso/

The ISO build uses Nix’s flake source snapshot and bakes that source into the installer image. This is an export-style payload: no .git directories and no Git history.

Create A Standalone Git Export Payload

If you want a source artifact separate from the ISO, use the git-export helper:

scripts/export-installer-source.sh dubnium-installer-source.tar.gz

The helper requires the main repo and external/dotfiles submodule to be clean. It uses git archive for both sources and writes a payload shaped like:

dubnium/
└── external/
    └── dotfiles/

This payload is useful for inspection, offline transfer, or alternate installer media. The custom ISO still bakes its own payload from the same flake source that Nix evaluates.

Verify The Baked Payload

The built payload should contain the workstation host, dotfiles submodule source, and USB helpers:

payload="$(find /nix/store -maxdepth 1 -name '*-dubnium-installer-source.tar.gz' | head -n 1)"

tar -tzf "$payload" | grep -E \
  '^dubnium/(flake.nix|hosts/workstation/default.nix|external/dotfiles/flake.nix|scripts/build-installer-iso.sh|scripts/export-installer-source.sh|scripts/write-installer-usb.ps1|scripts/write-installer-usb.sh)$'

if tar -tzf "$payload" | grep -q '/\.git/'; then
  echo "unexpected .git directory in payload"
  exit 1
fi

Prepare The USB From Windows PowerShell

After building dubnium-installer.iso, use this helper only when preparing the USB from Windows PowerShell. It writes the ISO bytes directly to the whole USB disk, like Rufus DD image mode:

.\scripts\write-installer-usb.ps1 `
  -IsoPath .\dubnium-installer.iso `
  -DiskNumber 7 `
  -ExpectedFriendlyName "USB SanDisk 3.2Gen1"

The script refuses to continue unless the selected disk is the expected USB device. It overwrites the whole disk with the ISO image. -SeedModelPath is intentionally rejected in raw mode because there is no separate writable seed partition to copy into.

After writing, eject and reinsert the USB if Windows does not refresh the new ISO layout immediately. Verify the installer media from whichever drive letter Windows assigns:

Test-Path I:\EFI\BOOT\BOOTX64.EFI
Test-Path I:\nix-store.squashfs
Get-Volume -DriveLetter I

Prepare The USB From macOS Or Linux

The Bash helper performs the same raw whole-disk image write on macOS or Linux. Pass the whole USB disk, not a partition.

Linux example:

lsblk -o NAME,SIZE,MODEL,TRAN,TYPE,MOUNTPOINTS

scripts/write-installer-usb.sh \
  --iso ./dubnium-installer.iso \
  --disk /dev/sdX \
  --expected SanDisk

macOS example:

diskutil list
diskutil info /dev/diskN

scripts/write-installer-usb.sh \
  --iso ./dubnium-installer.iso \
  --disk /dev/diskN \
  --expected SanDisk

The script refuses to write non-removable media, requires the selected device identity to contain --expected when provided, and asks for y at the Proceed? [y/N]: prompt before erasing the disk unless --yes is passed.

Optional One-Shot Wrappers

The older wrappers still exist for convenience, but they are not the preferred boundary:

.\scripts\build-installer-usb.ps1 `
  -DiskNumber 7 `
  -ExpectedFriendlyName "USB SanDisk 3.2Gen1"
bash scripts/build-installer-usb.sh \
  --disk /dev/sdX \
  --expected SanDisk

Use the Bash one-shot path only when the whole USB disk is visible inside the Linux environment.

Seamless USB Acceptance Check

Before leaving the build machine, verify the USB contains everything needed for a token-free install:

EFI/BOOT/BOOTX64.EFI
nix-store.squashfs

The install path should not require:

  • a GitHub token
  • a private SSH key
  • a Hugging Face download during install when separate model seed media is used
  • copying model weights into the Dubnium Git tree

Keep the USB physically private. It contains private source code in the installer payload.

Add The Model Seed Bundle

Do not copy the raw Hugging Face cache directory as the seed. The cache uses refs, blobs, snapshots, and symlinks. Seed media should contain a normal local model bundle.

Use separate writable media for the model seed bundle. Mount that media and copy a materialized model directory:

sudo mkdir -p /mnt/e
sudo mount -t drvfs E: /mnt/e

sudo mkdir -p /mnt/e/models
sudo rsync -a --info=progress2 \
  /path/to/selected-model-bundle/ \
  /mnt/e/models/selected-model-bundle/

Expected seed path:

models/selected-model-bundle/

Lightweight bundle check:

test -f /mnt/e/models/selected-model-bundle/config.json
test -f /mnt/e/models/selected-model-bundle/SHA256SUMS

See Model Seeding for creating the bundle and checksum manifest.

Install From The USB

Boot the target machine from the USB. Prefer the UEFI entry for the Dubnium installer USB.

For the guarded one-shot path, run the helper with no arguments:

install-dubnium-from-usb

The helper prints lsblk, prompts for the target whole disk, and then prompts for install options. Defaults are btrfs for the root filesystem, dubnium for the Home Manager machine profile, passwd for password setup, and copying the install snapshot to /root/dubnium-install-snapshot in the installed system.

This command erases the selected whole disk, unpacks the baked source snapshot, generates hosts/workstation/hardware-configuration.nix, and runs:

sudo nixos-install --flake .#workstation

Use --dry-run to print the plan without touching disks. Use --user USER to write hosts/workstation/user.nix before install. Use --home-profile dubnium|technetium to select the Home Manager machine profile that installs the matching ~/.config/hypr/adopted.d/machine.conf. Use --password-mode hash to write a host-local initial password hash before install, or --password-mode skip when another login path already exists; the default passwd mode sets the password inside the installed system after nixos-install. Use --no-copy-source if you do not want the install snapshot preserved for post-install reconciliation.

The one-shot command still prints the plan and requires final confirmation:

Proceed? [y/N]:

Use --yes only for rehearsed installs where the disk identity was already verified.

Manual path:

In the live installer terminal:

unpack-dubnium
cd ~/local/src/dubnium

Confirm the baked source exists:

test -f flake.nix
test -f hosts/workstation/default.nix
test -f external/dotfiles/flake.nix

Then continue the fresh-install flow from the local checkout:

sudo nixos-install --flake .#workstation

The workstation target imports the Dubnium Home Manager module from external/dotfiles, so the Dubnium dotfiles profile is applied to the selected normal user as part of the system install.

Install For Another User

To choose the installed normal user, create hosts/workstation/user.nix in the unpacked source before running nixos-install:

{
  dubnium.user.name = "alice";
  dubnium.user.description = "Example User";
}

Then install normally:

sudo nixos-install --flake .#workstation

The same dotfiles Dubnium Home Manager profile is applied to the selected user. The profile source lives in the dotfiles submodule, but the username and home directory are supplied by dubnium.user.name.

unpack-dubnium --user USER only changes where the source is unpacked in the live installer session. It does not change the installed NixOS user; use dubnium.user.name for that.

After First Boot

After the installed system boots, seed the vLLM model store from the bundle on separate seed media and verify the checksum manifest before starting compute mode. See Model Seeding for the exact restore commands.

What Not To Put On The USB

Avoid storing:

  • long-lived private SSH keys
  • reusable GitHub credentials
  • generated age identity files
  • decrypted SOPS files
  • model weights inside the Git repo or ISO payload
  • raw Hugging Face cache directories as the seed shape

The source snapshot and a separate materialized model bundle are enough for this installer flow.

Model Seeding

Dubnium keeps model weights out of Git and out of the Nix store. Nix owns the runtime policy and vLLM service definition; model bytes are runtime data under /var/lib/dubnium/models.

The workstation configuration selects a vLLM model, but the USB seed format does not depend on one specific model. Use the configured model’s local bundle name where the examples say selected-model-bundle.

The installed workstation serves a local model bundle from:

/var/lib/dubnium/models/selected-model-bundle

This avoids depending on the Hugging Face hub cache layout at runtime. The USB seed carries a normal directory of model files plus a checksum manifest.

Runtime Model Store

Dubnium creates:

/var/lib/dubnium/models

The workstation vLLM service passes this local path to vllm serve:

/var/lib/dubnium/models/selected-model-bundle

Do not commit model weights to the Dubnium repo. Do not put model weights inside the Nix store or custom ISO payload.

USB Seed Layout

Use a stable USB layout so the same seed can be used during fresh install, recovery, or rebuild:

DUB-SEED/
└── models/
    └── selected-model-bundle/
        ├── config.json
        ├── generation_config.json
        ├── model-00001-of-000NN.safetensors
        ├── model-00002-of-000NN.safetensors
        ├── model.safetensors.index.json
        ├── tokenizer.json
        ├── tokenizer_config.json
        ├── vocab.json
        ├── merges.txt
        ├── LICENSE
        ├── README.md
        └── SHA256SUMS

The exact file set may vary by model revision, but the directory must be a materialized model snapshot, not a Hugging Face refs / blobs / snapshots cache tree.

Create A Local Bundle

If the source already exists as a normal model directory, copy it directly to the seed partition:

mkdir -p /run/media/$USER/DUB-SEED/models
rsync -a --info=progress2 \
  /path/to/selected-model-bundle/ \
  /run/media/$USER/DUB-SEED/models/selected-model-bundle/

Preferred source: a materialized model directory from a trusted local store or previously prepared artifact. Do not make the fresh install depend on Hugging Face availability.

Legacy fallback: if the only available source is an existing Hugging Face cache on the build machine, materialize the current snapshot once by following symlinks. This is a build-machine preparation step, not an install-time dependency:

MODEL_CACHE=/var/lib/vllm/.cache/huggingface/hub/models--OWNER--MODEL
REVISION="$(cat "$MODEL_CACHE/refs/main")"

mkdir -p /run/media/$USER/DUB-SEED/models/selected-model-bundle
rsync -aL --info=progress2 \
  "$MODEL_CACHE/snapshots/$REVISION/" \
  /run/media/$USER/DUB-SEED/models/selected-model-bundle/

Then create the checksum manifest:

cd /run/media/$USER/DUB-SEED/models/selected-model-bundle
find . -type f ! -name SHA256SUMS -print0 \
  | sort -z \
  | xargs -0 sha256sum \
  > SHA256SUMS

Seed From USB

After the workstation has booted into NixOS and the USB is mounted, copy the bundle into the Dubnium model store:

sudo mkdir -p /var/lib/dubnium/models
sudo rsync -a --info=progress2 \
  /run/media/$USER/DUB-SEED/models/selected-model-bundle/ \
  /var/lib/dubnium/models/selected-model-bundle/
sudo chown -R root:root /var/lib/dubnium/models/selected-model-bundle

Adjust the mount path if the USB is mounted somewhere else.

Verify the checksum manifest:

cd /var/lib/dubnium/models/selected-model-bundle
sudo sha256sum -c SHA256SUMS

Then verify the local model path exists:

test -f /var/lib/dubnium/models/selected-model-bundle/config.json
test -f /var/lib/dubnium/models/selected-model-bundle/model.safetensors.index.json

Acceptance Check

After seeding, switch to compute only when normal bring-up preconditions are satisfied:

sudo mode request compute
systemctl status vllm.service
journalctl -u vllm.service -b

The first start should load the local model path. If vLLM tries to fetch model files from the network, the model argument or bundle location is wrong.

Runbook: vLLM Runtime

Status: living

Use this when Dubnium’s NixOS configuration manages vllm.service, but the vLLM Python/CUDA runtime is installed outside the Nix store.

NixOS owns:

  • vllm.service
  • /var/lib/vllm
  • /var/lib/dubnium/models
  • CUDA_VISIBLE_DEVICES
  • ai.dubnium
  • Tailscale-only firewall exposure

The external runtime owns:

  • /var/lib/vllm/venv
  • Python, PyTorch, vLLM, and CUDA wheel packages inside that venv

This keeps rebuilds fast and avoids compiling PyTorch, CUDA, CuPy, MAGMA, OpenCV CUDA, or vLLM during nixos-rebuild.

Scope

This runbook covers the current hybrid-Nix phase. NixOS is authoritative for the service contract, host alias, firewall exposure, users, directories, environment, and health checks. The Python/CUDA package runtime is mutable operator-managed state under /var/lib/vllm/venv.

A pure-Nix vLLM runtime is a separate later phase. That phase should be treated as build-infrastructure work: it likely needs a dedicated CUDA builder, an Attic/Cachix/nix-serve cache, or an upstream Nixpkgs packaging path that avoids rebuilding the full CUDA/PyTorch/vLLM stack on every workstation.

Preconditions

  • the host has been switched to a Dubnium generation with dubnium.vllm.runtime = "external"
  • uv is available in the operator shell
  • NVIDIA GPU access works on the host
  • model weights are already seeded under /var/lib/dubnium/models

Check GPU visibility first:

nvidia-smi

1. Create The Runtime Directory

sudo install -d -m 0755 -o root -g root /var/lib/vllm
sudo install -d -m 0755 -o root -g root /var/lib/dubnium/models

The NixOS module also declares these directories. These commands are safe to run before or after nixos-rebuild switch.

2. Install vLLM Into The Managed venv

Create a fresh venv:

sudo uv venv --python /run/current-system/sw/bin/python3.12 --python-preference only-system /var/lib/vllm/venv

Install vLLM with CUDA/PyTorch wheels selected by uv:

sudo env UV_TORCH_BACKEND=auto uv pip install --python /var/lib/vllm/venv/bin/python vllm

This is intentionally the only default install command. Do not install audio, JAX, TPU, or broad framework extras during workstation bring-up. In particular, avoid commands that reinstall torchvision, torchaudio, or jax unless a specific workload requires them and the host has enough memory to resolve, download, install, and import that dependency set. The default Dubnium vLLM path is text inference against a local model bundle.

The upstream vLLM GPU install docs recommend uv pip install vllm --torch-backend=auto so uv can select the PyTorch backend from the installed CUDA driver. If that flag is not supported by the installed uv, use the environment variable form above or update uv.

If the installed uv supports newer PyTorch backends, use a specific CUDA backend that matches the host driver. For CUDA 13.0:

sudo uv pip install --python /var/lib/vllm/venv/bin/python --torch-backend=cu130 vllm

Some packaged uv versions may not list cu130 yet. On those versions, keep the default install command above, or upgrade uv to a version that supports the host CUDA backend. Do not use a broad PyTorch-family reinstall as a workstation bring-up workaround; it can pull optional packages such as torchaudio and exceed available memory.

If PyTorch CUDA selection is wrong after the default install, recreate the venv and rerun the vLLM install with a supported UV_TORCH_BACKEND or --torch-backend value rather than layering more framework packages into the same environment.

Host config adds the venv’s PyTorch and NVIDIA wheel library directories to LD_LIBRARY_PATH. That is required because the external venv is outside the Nix store and vLLM’s CUDA extension must be able to find libtorch, libcudart, and the CUDA wheel libraries at runtime.

The service also sets CC to Nix’s C compiler wrapper. Triton may compile a small runtime helper during vLLM startup even when vLLM itself is installed in the external venv.

Keep dubnium.vllm.runtime = "package" available for the future pure-Nix phase, but do not use it for this external-runtime path.

3. Verify The Runtime

Check the executable:

/var/lib/vllm/venv/bin/vllm --version

Check CUDA through PyTorch:

/var/lib/vllm/venv/bin/python -c "import torch; print(torch.cuda.is_available())"

Expected:

True

If this prints False, fix the venv/PyTorch/CUDA wheel selection before debugging Dubnium’s systemd service.

4. Verify The Local Model Bundle

Dubnium keeps model weights out of Git and out of the Nix store. The vLLM service should point at a local model bundle.

MODEL_DIR=/var/lib/dubnium/models/qwen2.5-coder-14b-instruct

If the model bundle was seeded from removable media, verify that the local bundle exists:

test -f "$MODEL_DIR/config.json"
test -f "$MODEL_DIR/model.safetensors.index.json" || test -f "$MODEL_DIR/model.safetensors"

If SHA256SUMS exists, verify it:

cd "$MODEL_DIR"
sudo sha256sum -c SHA256SUMS

If vLLM tries to download model files on first start, the configured model path or local bundle is wrong.

5. Start The Service

Start compute mode or restart the service directly:

sudo systemctl start compute.target
sudo systemctl restart vllm.service

Inspect service state:

systemctl status vllm --no-pager
journalctl -u vllm -n 100 --no-pager
systemctl show vllm.service -p ExecStart --value
systemctl show vllm.service -p Environment --value

If /var/lib/vllm/venv/bin/vllm does not exist or is not executable, vllm.service should fail before startup with an executable check error. That means the NixOS service contract is present but the external runtime has not been installed yet.

6. Verify The API

From the Dubnium host:

getent hosts ai.dubnium
curl http://ai.dubnium:8000/v1/models

From another tailnet machine:

curl http://<dubnium-tailnet-name>:8000/v1/models

ai.dubnium is host-local unless the tailnet DNS or client hosts file also maps that name to the Dubnium node’s Tailscale IP.

References

Plano routing gateway

Dubnium owns the system/runtime side of Plano. User-level client configuration lives in ryjen/dotfiles through Home Manager modules.

Boundary

Dubnium
  systemd service lifecycle
  compute target integration
  vLLM/Ollama local model endpoint
  ai.slice placement
  runtime state under /var/lib and /var/cache

ryjen/dotfiles
  Home Manager user config
  ~/.config/planoai/dubnium.yaml
  ~/.config/model-router/profiles/local-first-dev.yaml
  shell environment and helper scripts

ryjen/model-router
  source policy schemas
  route-decision record semantics
  governance-oriented model-router design

Service model

The Plano workload module is defined at:

modules/workloads/plano.nix

It creates:

plano.service

When enabled, the service is attached to:

compute.target
ai.slice

It is intentionally disabled by default in hosts/workstation/default.nix.

Defaults

dubnium.plano = {
  enable = false;
  runtime = "external";
  externalExecutable = "/var/lib/plano/venv/bin/planoai";
  host = "127.0.0.1";
  port = 12000;
  localBaseUrl = "http://127.0.0.1:8000/v1";
  exposeOnTailscale = false;
};

The default local model endpoint assumes vllm.service is serving an OpenAI-compatible API on port 8000.

Enablement

Enable once the Plano executable exists:

dubnium.plano.enable = true;

For the current external runtime default, verify:

test -x /var/lib/plano/venv/bin/planoai

If Plano becomes available as a Nix package or overlay, switch to:

dubnium.plano = {
  enable = true;
  runtime = "package";
  package = pkgs.<plano-package>;
};

Validation

Dry-build the workstation target:

sudo nixos-rebuild build --flake .#workstation

Then inspect the generated unit:

systemctl cat plano.service

When enabled and in compute mode:

sudo mode request compute
systemctl status vllm.service
systemctl status plano.service

Check the gateway endpoint:

curl http://127.0.0.1:12000

The exact health endpoint may differ depending on Plano’s runtime API.

Security notes

  • Keep exposeOnTailscale = false until the gateway behavior is validated
  • Do not store cloud provider secrets in the generated config
  • Prefer environment files managed by sops-nix or another host secret provider
  • Treat Plano as routing infrastructure, not an authorization layer
  • Privacy and route policy belong above the gateway in model-router/Anthesis semantics

Failure behavior

The service fails closed if the configured Plano executable is missing because ExecStartPre checks that the executable exists.

Fallback between models must not bypass privacy, budget, safety, or approval failures. Those are policy failures, not operational retry events.

Persistent Context Memory Architecture

Status: planning

This document describes the long-term persistent context memory architecture for Dubnium’s local vLLM runtime.

Goals

The architecture should:

  • support long-lived conversational and agentic workflows
  • preserve low-latency vLLM inference characteristics
  • separate inference runtime concerns from memory persistence
  • expose enough structure for replay, audit, and policy enforcement
  • operate efficiently on constrained local GPU hardware
  • leave room for Anthesis-style governed agent systems

Future Governance Boundary

A future governance layer remains external to this memory/runtime architecture.

The memory/runtime layer stores, retrieves, summarizes, compacts, and serves context. It records structured metadata and lifecycle events so another layer can inspect, constrain, attest, or replay behavior later.

The future governance layer evaluates policy, provenance, trust, retention, audit, and replay concerns. This document does not define that governance authority.

Dubnium memory/runtime layer
    = stores, retrieves, summarizes, compacts, and serves context

Future governance layer
    = evaluates policy, provenance, trust, retention, audit, and replay concerns

Design implication: memory records, artifacts, retrieval events, and runtime transitions must be structured and externally observable, but vLLM, vector stores, artifact stores, and MemGPT-style runtimes must not depend directly on a future governance substrate.

Core Principle

vLLM is the inference runtime.

Persistent memory is a separate subsystem.

Do not persist transformer KV state as durable memory. KV state can remain an inference optimization inside vLLM. Durable memory must be reconstructable from stored events, summaries, artifacts, metadata, and retrieval records.

flowchart TD
    U[User or Agent] --> O[Orchestrator]
    O --> W[Working Context Buffer]
    O --> R[Retriever]
    O --> T[Task State Store]
    R --> V[(Vector Store)]
    R --> M[(Structured Memory Store)]
    O --> L[vLLM]
    L --> S[Summarizer]
    S --> E[Embedding Pipeline]
    E --> V
    S --> M

Layers

Inference

Responsibilities:

  • token generation
  • batching
  • prefix caching
  • streaming
  • model lifecycle management

Recommended components:

ComponentRecommendation
Inference runtimevLLM
Primary modelsQwen, DeepSeek, Llama-family
Embeddingsbge-small or nomic-embed
QuantizationAWQ or GPTQ initially

Inference nodes should remain stateless where possible. Durable memory logic does not belong inside inference workers.

Working Context

Working context maintains immediate conversational and task continuity.

It contains recent messages, tool outputs, current objectives, active plans, and unresolved references.

Storage options:

OptionUse
Redisfast transient sessions
SQLitesingle-user local setups
Postgresunified durable stack

Recommended strategy:

  • keep the last N conversational turns verbatim
  • keep a rolling summary for older turns
  • keep external references outside the prompt

Episodic Memory

Episodic memory stores meaningful historical interactions, such as debugging sessions, deployment history, design discussions, incidents, and user preferences.

Example shape:

{
  "id": "uuid",
  "timestamp": "ISO8601",
  "session_id": "uuid",
  "memory_type": "episodic",
  "summary": "Condensed interaction summary",
  "importance": 0.82,
  "ttl": null,
  "source": "conversation",
  "provenance": {
    "model": "qwen",
    "extractor_version": "1"
  }
}

Semantic Memory

Semantic memory stores normalized stable facts and reusable knowledge: infrastructure topology, user preferences, architecture decisions, project conventions, and coding standards.

Semantic memory is not raw transcript storage.

Instead of storing “user mentioned NixOS several times”, store:

{
  "fact": "Primary workstation uses NixOS",
  "confidence": 0.94,
  "scope": "personal-preference"
}

Task State

Task state is active execution state, not conversational memory.

Examples:

  • queued work
  • workflow checkpoints
  • active RFC generation
  • agent plans
  • unresolved actions
  • execution graphs

Task state should be strongly structured. Do not embed executable workflow state inside vector stores.

ComponentRecommendation
Structured storePostgres
QueueingRabbitMQ or Redis Streams
Workflow engineTemporal later

Retrieval

Retrieval responsibilities:

  • semantic search
  • scoped retrieval
  • ranking
  • filtering
  • relevance compression
flowchart LR
    Q[Query] --> E[Embed Query]
    E --> S[Vector Search]
    S --> R[Re-ranker]
    R --> C[Context Builder]

Retrieval constraints:

ConstraintExample
Session scopeonly current project
TTLexclude expired memories
Agent boundaryisolate agents
Recency weightingprioritize recent events

The orchestrator constrains retrieval scope and memory assembly. Future governance can inspect the retrieval event stream and stored metadata, but the retriever must remain useful without embedding a governance engine.

Minimal Stack

ConcernTechnology
InferencevLLM
Structured dataPostgres
Vector searchpgvector
Session cacheRedis
Object storagelocal filesystem first, MinIO later
QueueingRedis Streams first, RabbitMQ later

Artifact And Binary Memory

Artifacts and memory are distinct concepts.

ConceptMeaning
Memorysemantic or cognitive abstraction
Artifactraw external object
Evidenceimmutable referenced source
Contexttransient prompt state
Knowledgevalidated normalized facts

Raw binaries should not be first-class prompt memory. Binaries remain externalized, semantic extraction feeds retrieval systems, agents retrieve references and derived context, and multimodal inference runs on demand.

Initial artifact types:

TypeExamples
Imagesscreenshots, whiteboards, diagrams
DocumentsPDFs, Office docs
Audiorecordings, meetings
Videodemos, walkthroughs
Source bundlesarchives, repos
Logsruntime and system logs
Structured dataCSV, JSON, YAML
flowchart TD
    A[Artifact Upload] --> B[Object Storage]
    A --> C[Extraction Pipeline]
    C --> D[OCR]
    C --> E[Captioning]
    C --> F[Metadata Extraction]
    C --> G[Embedding Generation]
    D --> H[Semantic Records]
    E --> H
    F --> H
    G --> H
    H --> I[(Vector Store)]
    H --> J[(Structured Metadata Store)]

Artifact metadata should include content hashes, storage URIs, MIME type, derived captions or OCR, embedding references, provenance, trust hints, and sensitivity hints.

Binary artifacts create operational risk: screenshots can contain credentials, EXIF metadata can leak location, visual data can be sensitive, retrieved artifacts can amplify exposure, and malicious files can poison extraction pipelines. Those controls belong in the external governance/security layer, but the memory layer must expose enough metadata and hooks for them.

Multimodal Retrieval

For normal text prompts, retrieve captions, OCR text, semantic embeddings, metadata, and artifact references rather than injecting raw binaries.

When multimodal reasoning is required:

  1. Semantic retrieval locates relevant artifacts.
  2. Artifact references are resolved.
  3. Binaries are attached to VLM requests.
  4. Multimodal inference runs on demand.

Candidate model classes:

ModelPurpose
Qwen-VLlocal multimodal reasoning
CLIP or SigLIPimage-text embeddings
Whisperaudio transcription
OCR pipelinesdocument extraction

OCI-Compatible Future

Dubnium should stay compatible with OCI-style cognition and artifact distribution.

OCI registries are a strong long-term fit for content addressing, distribution, deduplication, signing, provenance layering, immutable references, artifact versioning, and registry federation.

Candidate future artifact classes:

Artifact classExample
Model artifactsGGUF, safetensors
Embedding indexesvector snapshots
Prompt bundlesgoverned prompts and system policies
Memory bundlesexported episodic memory sets
Workflow definitionsagent workflows
Execution tracesreplayable sessions
Multimodal artifactsimage, document, and audio evidence
Tool contractsMCP capability manifests

Long-term direction:

OCI artifact
    = versioned governed cognition object

This allows Dubnium to evolve toward replayable cognition, portable agent state, attestable workflows, signed memory exports, reproducible multimodal sessions, and distributed cognition registries without coupling cognition storage to one database implementation.

MemGPT-Style Runtime Evolution

MemGPT-style runtimes remain an incremental upgrade path after the persistent memory substrate is stable. Current Letta documentation describes this lineage as agents with in-context core memory, recall memory, archival memory, and self-editing memory tools.

Do not couple Dubnium directly to Letta or MemGPT internals early. Define stable interfaces first:

class MemoryRuntime:
    def retrieve(...): ...
    def summarize(...): ...
    def compact(...): ...
    def promote(...): ...
    def classify(...): ...

Evolution path:

PhaseCapability
1governed retrieval with explicit schemas
2rolling summaries, compaction, and bounded working context
3reflection, summarization loops, memory promotion, relevance scoring
4adaptive retrieval, workflow-aware recall, retrieval planning
5portable cognitive runtime artifacts and OCI-packaged memory overlays

Preserve the distinction between runtime cognition and durable external state. MemGPT-style runtimes should remain replaceable, capability-scoped, inspectable, and externally configurable.

Phases

Phase 1: Minimal Viable Memory

Deliver durable conversation storage, semantic retrieval, basic summarization, Postgres plus pgvector, an embedding pipeline, retrieval API, and rolling conversation summaries.

Phase 2: Structured Memory

Deliver episodic and semantic separation, retrieval filtering, scoped namespaces, metadata tagging, and confidence scoring.

Phase 3: Multi-Agent Coordination

Deliver isolated agent memory, shared collaborative memory, workflow continuity, capability-scoped retrieval, memory federation, execution checkpoints, and task orchestration.

Non-Goals

Avoid initially:

  • serialized GPU KV persistence
  • distributed GPU cache coherence
  • infinite-context simulation
  • recurrent-memory transformer experimentation
  • fully autonomous self-modifying memory

These add substantial complexity and operational instability.

First Milestone

Build a local prototype with:

  • vLLM
  • Qwen coder model
  • Postgres
  • pgvector
  • Redis
  • bge-small embeddings
  • retrieval middleware
  • rolling summaries

Then validate latency, retrieval quality, memory drift, and hallucinated recall before expanding into multi-agent memory systems.

Runbook: vLLM Persistent Memory Prototype

Status: planning

Use this when designing or validating a Dubnium memory subsystem around the local vLLM runtime.

vLLM owns inference. The memory subsystem owns persistence, retrieval, summarization, compaction, artifact references, and replay inputs. Do not make durable memory depend on serialized transformer KV state.

Scope

This runbook covers the first prototype milestone:

  • durable conversation and event storage
  • rolling summaries
  • embeddings for retrieval
  • scoped retrieval
  • externally observable metadata on every stored memory
  • bounded prompt assembly for vLLM

It does not cover multi-agent federation, distributed workflow engines, cryptographic memory attestation, or a pure-Nix packaging path for all services. It also does not adopt Letta or another MemGPT-style agent framework in the first milestone; those belong after the local storage, retrieval, and governance contracts are proven.

Future governance remains external to this runbook. The prototype records metadata and lifecycle events so a later governance substrate can inspect, constrain, attest, or replay behavior, but the prototype does not implement the governance authority itself.

Target Shape

flowchart TD
    U[User or Agent] --> O[Orchestrator]
    O --> W[Working Context]
    O --> R[Retriever]
    O --> T[Task State]
    R --> V[(pgvector)]
    R --> M[(Postgres Memory Tables)]
    O --> L[vLLM]
    L --> S[Summarizer]
    S --> E[Embedding Worker]
    E --> V
    S --> M

Prototype Components

Use conservative local services first:

ConcernPrototype choice
Inferenceexisting vllm.service
Structured storePostgres
Vector searchpgvector
Working contextRedis or Postgres
QueueingRedis Streams initially
Object storagelocal filesystem first, MinIO later
Embeddingsbge-small or nomic-embed

Keep large artifacts outside prompt assembly. Store references to files, logs, and generated outputs, then retrieve and compress only the relevant excerpts.

Data Classes

Working context is transient session state: recent messages, current objective, active plan, unresolved references, and recent tool outputs.

Episodic memory records meaningful historical interactions, such as debugging sessions, deployment history, design discussions, and operational incidents.

Semantic memory records normalized facts, preferences, project conventions, infrastructure topology, and architecture decisions. Do not treat raw transcripts as semantic memory.

Task state records active workflow state: queued work, checkpoints, execution graphs, pending validations, and unresolved actions.

Metadata records where a memory came from, how trusted it appears, how sensitive it appears, how long it should live, and which scopes may retrieve it. A later governance layer can evaluate that metadata, but the Phase 1 memory service only records and exposes it.

Minimum Schema Direction

The first schema should keep memory objects and embeddings separate so memory metadata can evolve without rewriting vector payloads.

Suggested tables:

  • sessions
  • memories
  • memory_embeddings
  • tasks
  • artifacts
  • provenance

Each memory row should include:

{
  "id": "uuid",
  "session_id": "uuid",
  "memory_type": "episodic",
  "summary": "Condensed interaction summary",
  "scope": "project:dubnium",
  "importance": 0.82,
  "confidence": 0.76,
  "sensitivity": "internal",
  "validation_status": "unverified",
  "ttl": null,
  "source": "conversation",
  "created_at": "ISO8601",
  "provenance": {
    "origin": "agent",
    "model": "qwen",
    "extractor_version": "1"
  }
}

Retrieval Contract

The retriever should take a scoped request from the orchestrator and return scoped context candidates, not final prompts.

Required filters:

  • project or session scope
  • agent namespace
  • TTL expiration
  • recency

Recommended ranking inputs:

  • vector similarity
  • keyword match
  • recency
  • importance
  • source authority
  • validation status

The context builder should compress results before prompt assembly and preserve citations, artifact references, retrieval event ids, or memory ids so a response can be audited later.

Storage Path

  1. Capture a conversation, tool event, task event, or artifact reference.
  2. Classify the event and reject data that should not become durable memory.
  3. Redact secrets and sensitive payloads.
  4. Summarize the event into a typed memory candidate.
  5. Attach provenance, sensitivity, scope, confidence, and retention metadata.
  6. Embed the memory summary.
  7. Store structured memory and vector data.
  8. Schedule expiration or revalidation when retention metadata requires it.

Retrieval Path

  1. Receive a query and current task scope from the orchestrator.
  2. Embed the query.
  3. Search the vector index and any structured filters.
  4. Apply scope, TTL, and sensitivity filters before re-ranking.
  5. Re-rank by relevance, recency, importance, and source hints.
  6. Compress selected context.
  7. Return context candidates with ids, scope, and provenance.
  8. Assemble the final vLLM prompt outside the retriever.

Validation Checks

Before treating the prototype as useful, test:

  • latency impact on vLLM request path
  • recall quality for prior sessions
  • false recall and hallucinated-memory rate
  • memory poisoning resistance
  • prompt-injection persistence resistance
  • cross-project and cross-agent isolation
  • secret redaction before storage
  • TTL expiration and revalidation behavior
  • replay from stored events and memory ids

Acceptance Criteria

The first milestone is complete when:

  • vLLM can answer with retrieved context without changing vllm.service
  • memory storage survives service restart
  • retrieval can be scoped to one project
  • expired or sensitive memories are excluded from prompt assembly
  • summaries can be traced back to source events or artifacts
  • a replay can reconstruct which memories were available to a response

Artifact Handling

Artifacts are not memory. Store raw binaries outside prompts and retrieve derived context by default:

  • captions
  • OCR text
  • extracted metadata
  • embeddings
  • content hashes
  • artifact references

Use on-demand multimodal inference only when a task needs the binary itself. The retrieval result should carry an artifact reference rather than copying the artifact into ordinary text prompt memory.

Incremental Upgrade: MemGPT / Letta

After the Phase 1 substrate is stable, evaluate MemGPT-style self-editing memory as an orchestration-layer upgrade. Use current Letta documentation when testing concrete framework integration; reserve “MemGPT” for the research pattern unless a legacy component explicitly uses that name.

The evaluation should answer:

  • whether Letta can use Dubnium’s Postgres/pgvector-backed memory stores without bypassing scope, sensitivity, TTL, validation, or provenance filters
  • whether agent-managed memory edits can be audited and replayed
  • whether archival and recall memory operations can preserve Dubnium memory ids and source lineage
  • whether the framework can call local vLLM without requiring model-hosted memory persistence
  • whether rejected, expired, or sensitive memories stay out of generated prompts

Do not adopt the framework if it requires storing ungoverned transcripts, credentials, or tool outputs in durable memory.

References

Runbook: Memory Service

Status: prototype

Use this after explicitly enabling dubnium.memory.enable = true for the workstation host. The memory service is intentionally opt-in during Phase 1 so first bring-up does not automatically start additional persistent services.

The memory service is the local persistent context substrate for Dubnium. It does not govern agent behavior by itself. Anthesis or another orchestrator should authorize retrieval, inspect provenance, and decide whether retrieved memory may be injected into an agent prompt.

Service Boundary

Anthesis / orchestrator
  -> Dubnium memory API
  -> Postgres + pgvector
  -> Redis working context / queue substrate
  -> vLLM prompt assembly outside the memory service

The API must remain bound to 127.0.0.1 for the Phase 1 prototype.

Service Impact

Enabling dubnium.memory starts additional local services:

  • postgresql.service
  • redis-dubnium-memory.service
  • dubnium-memory-api.service

It also runs packaged memory-service migrations before the API starts. Validate the package and module evaluation before enabling this on the bare-metal workstation target.

Enable Locally

The default workstation target keeps the memory service disabled. Enable it through a host-local override such as hosts/workstation/user.nix:

{
  dubnium.memory = {
    enable = true;
    api.host = "127.0.0.1";
    api.port = 8090;
    retention.defaultTtlDays = null;
  };
}

Then build before switching:

nix --extra-experimental-features "nix-command flakes" build .#memory-service
sudo nixos-rebuild build --flake .#workstation

Verify Disabled Default

Without a host-local override, the workstation target should keep the prototype disabled:

nix --extra-experimental-features "nix-command flakes" eval .#nixosConfigurations.workstation.config.dubnium.memory.enable

Expected:

false

Verify Enabled Configuration

After enabling through hosts/workstation/user.nix, verify:

nix --extra-experimental-features "nix-command flakes" eval .#nixosConfigurations.workstation.config.dubnium.memory.enable
nix --extra-experimental-features "nix-command flakes" eval .#nixosConfigurations.workstation.config.dubnium.memory.api.host
nix --extra-experimental-features "nix-command flakes" eval .#nixosConfigurations.workstation.config.services.postgresql.enable
nix --extra-experimental-features "nix-command flakes" eval .#nixosConfigurations.workstation.config.services.redis.servers.dubnium-memory.enable

Expected:

true
"127.0.0.1"
true
true

Verify Services

After switching an enabled configuration:

systemctl status postgresql
systemctl status redis-dubnium-memory
systemctl status dubnium-memory-api
ai-memory health

Expected health response:

{
  "status": "ok"
}

Raw HTTP is also available for debugging:

curl http://127.0.0.1:8090/healthz

Scope Convention

Use explicit scope prefixes for new memory rows:

personal:
project:
session:
agent:
workflow:

Examples:

project:dubnium
session:11111111-1111-4111-8111-111111111111
agent:anthesis-reviewer
workflow:memory-phase-2

The current implementation provides advisory scope helpers. Full runtime enforcement is intentionally deferred until existing callers and examples are migrated.

CLI Smoke Test

Store one memory:

ai-memory store --file docs/examples/memory-store-request.json

Retrieve scoped memory:

ai-memory retrieve \
  --query "What is Dubnium memory for?" \
  --scope project:dubnium \
  --require-verified \
  --purpose review \
  --actor-type agent \
  --actor-id anthesis-reviewer \
  --envelope-id env-manual-smoke-test

Inspect retrieval events:

ai-memory events

Expire old memories:

ai-memory expire --now 2026-05-28T00:00:00Z

Use a non-default API URL when needed:

ai-memory --url http://127.0.0.1:8090 health

API Smoke Test

The CLI is preferred for operator use. Raw HTTP examples are kept for debugging and automation parity.

Store one memory:

curl -sS http://127.0.0.1:8090/memory/store \
  -H 'Content-Type: application/json' \
  -d @docs/examples/memory-store-request.json

Retrieve scoped memory:

curl -sS http://127.0.0.1:8090/memory/retrieve \
  -H 'Content-Type: application/json' \
  -d '{
    "query": "What is Dubnium memory for?",
    "scope": "project:dubnium",
    "allowed_sensitivity": ["internal"],
    "require_verified": true,
    "limit": 8
  }'

Inspect retrieval events:

curl -sS http://127.0.0.1:8090/memory/retrieval-events

Retrieval Behavior

Normal retrieval excludes memory when:

  • scope does not match the request
  • sensitivity is not explicitly allowed
  • require_verified is true and memory is not verified
  • memory is expired by TTL
  • memory has validation_status = rejected

Rejected memory is excluded even when require_verified = false. Audit retrieval of rejected memory is future work and should use a separate endpoint or explicit audit mode.

Security Checks

  • API binds to 127.0.0.1
  • raw vLLM remains separate from durable memory
  • memory rows include scope, sensitivity, validation status, source, and provenance
  • expired memories are excluded from retrieval
  • rejected memories are excluded from normal retrieval
  • sensitive memories are excluded unless explicitly allowed
  • retrieval events record returned memory ids and artifact ids
  • logs must not contain raw token-like values
  • prompt assembly must happen outside the memory service

Anthesis Governance Hook

Phase 1 does not implement Anthesis directly. The intended integration contract is:

  1. Anthesis classifies the task and authorizes retrieval scope
  2. Anthesis calls /memory/retrieve with explicit scope, allowed_sensitivity, and require_verified
  3. The memory service returns memories plus a retrieval event id
  4. Anthesis records the retrieval event, memory ids, provider decision, and prompt assembly in an execution envelope
  5. Anthesis decides whether retrieved memory may enter the model context

Memory may inform an agent, but governance decides whether it is allowed to do so.

Troubleshooting

journalctl -u dubnium-memory-api -b
journalctl -u postgresql -b
journalctl -u redis-dubnium-memory -b

Common failure buckets:

  • database role or socket mismatch
  • pgvector extension unavailable for the selected Postgres package
  • migration failure
  • API accidentally bound to a non-local address
  • malformed JSON payload
  • scope mismatch during retrieval

Validation Before Merge

git diff --check
nix --extra-experimental-features "nix-command flakes" build .#memory-service
nix --extra-experimental-features "nix-command flakes" eval .#nixosConfigurations.workstation.config.dubnium.memory.enable
pytest pkgs/memory-service/tests

Default workstation expectation before opt-in:

false

If the full workstation build fails on host-specific hardware configuration, report that separately from the memory service package/module validation.

Memory Data Model Specification

Status: draft

This document is the canonical data model and requirements specification for the Dubnium memory service prototype. It reconciles the architecture direction, API/domain models, and current Postgres migration.

Implementation references:

  • pkgs/memory-service/src/dubnium_memory/models.py
  • pkgs/memory-service/src/dubnium_memory/embeddings.py
  • pkgs/memory-service/src/dubnium_memory/migrations/001_initial.sql
  • pkgs/memory-service/src/dubnium_memory/migrations/002_pgvector_embeddings.sql

Goals

The data model must support:

  • durable episodic, semantic, and working memory records
  • scoped retrieval for projects, sessions, and agents
  • externalized artifacts and evidence references
  • retrieval event capture for audit and replay
  • metadata needed by a future external governance layer
  • local Postgres and pgvector evolution without coupling to vLLM internals

Non-Goals

The data model does not define:

  • transformer KV-cache persistence
  • prompt assembly format
  • future governance authority behavior
  • autonomous memory mutation rules
  • object storage implementation details
  • a Letta or MemGPT internal schema

Trust Boundary

All stored content is untrusted when it enters the system and when it is retrieved later. This includes user input, agent output, model-generated summaries, tool output, artifact-derived text, and database rows.

Boundary requirements:

  • validate API payloads before constructing domain objects
  • redact secret-like values before persistence
  • use parameterized SQL for all request-derived values
  • keep secrets out of logs and durable memory summaries
  • store enough provenance, validation, sensitivity, scope, and TTL metadata for external policy systems to inspect later
  • return retrieval candidates and identifiers, not assembled prompts

Domain Objects

Memory

Memory is a normalized semantic or episodic record. It is not raw transcript storage and should not contain binary artifact data.

Required fields:

FieldTypeRequirement
idUUIDStable identifier generated before persistence
memory_typeenumOne of working, episodic, semantic
summarystringNon-empty, max 8000 chars, redacted before persistence
scopestringNon-empty, max 256 chars
sourcestringNon-empty source label, max 128 chars
provenanceobjectJSON object, empty object allowed

Optional or defaulted fields:

FieldTypeDefaultRequirement
session_idUUID or nullnullReferences sessions.id when durable
importancefloat0.0Range 0.0 to 1.0
confidencefloat0.0Range 0.0 to 1.0
sensitivitystringinternalNon-empty, max 64 chars
validation_statusenumunverifiedOne of unverified, verified, rejected
ttltimestamp or nullnullExpired records excluded and removable
artifact_refslistemptyEach artifact scope must match memory scope

Durable table: memories.

Current gap: artifact refs are represented in domain/API objects but are not yet persisted as a relationship table.

Retrieved Memory

Retrieved memory is a context candidate returned by retrieval. It must contain only the fields needed by callers to decide whether and how to assemble context.

Fields:

  • id
  • summary
  • scope
  • sensitivity
  • validation_status
  • provenance
  • artifact_refs

Retrieval responses must not construct prompts. Prompt assembly remains outside the memory service.

Retrieve Request

Retrieve requests define caller intent and visibility constraints.

Fields:

FieldTypeDefaultRequirement
querystringnoneNon-empty, max 4000 chars
scopestringnoneNon-empty, max 256 chars
allowed_sensitivitystring list["internal"]Must not be empty
require_verifiedboolfalseFilters to verified memories when true
limitint8Range 1 to 32

Retrieval Event

Retrieval events record what was available to a caller at retrieval time.

Fields:

FieldTypeRequirement
idUUIDGenerated for each retrieval
scopestringRequest scope
querystringRequest query
returned_memory_idsUUID listOrdered returned memory ids
returned_artifact_idsUUID listArtifact ids referenced by returned memories
created_attimestampDurable database timestamp

Durable table: retrieval_events.

Replay requirements:

  • preserve returned memory ids
  • preserve returned artifact ids
  • preserve query and scope
  • preserve timestamp
  • later replay surfaces should reconstruct candidate availability from these identifiers and persisted records

Artifact Reference

Artifact refs are lightweight pointers from memory records to external evidence. They do not embed raw binary content.

Fields:

FieldTypeRequirement
idUUIDArtifact identifier
scopestringMust match containing memory scope
sha256stringContent hash
storage_uristringExternal storage pointer
artifact_typestringType such as image, document, log

Durable table: artifacts.

Current gap: memory-to-artifact relationship persistence is not implemented.

Embedding

Embeddings are model-specific vector representations. They are separate from memory records so memory facts remain portable across embedding model changes.

Fields:

FieldTypeRequirement
modelstringNon-empty, max 128 chars
dimensionsintPositive
vectorfloat listLength must match dimensions

Current durable table: memory_embeddings.

Current durable fields:

  • memory_id
  • embedding_model
  • embedding_ref
  • embedding
  • embedding_dimensions
  • created_at

Current implementation can persist embedding references and pgvector values for a memory. The application service can embed stored summaries when configured with an embedder and an embedding-capable store. The Postgres store can query vectors behind the storage boundary.

Session

Sessions group conversational or agentic work under a scope.

Durable table: sessions.

Fields:

  • id
  • scope
  • created_at

Current gap: session creation and lookup APIs are not implemented.

Task State

Task state is active execution state, not memory. It should remain structured and queryable instead of being embedded in vector stores.

Durable table: tasks.

Fields:

  • id
  • scope
  • status
  • state
  • created_at
  • updated_at

Current gap: task-state domain objects and APIs are not implemented.

Provenance

Provenance records attach lineage to one memory, artifact, or retrieval event.

Durable table: provenance.

Fields:

  • id
  • memory_id
  • artifact_id
  • retrieval_event_id
  • source_identity
  • source_event
  • created_at

Constraint: exactly one of memory_id, artifact_id, or retrieval_event_id must be set.

Current gap: provenance has initial schema support but no write path beyond memory-local JSON metadata.

Durable Tables

TablePurposeStatus
sessionsSession metadataSchema only
memoriesNormalized memory recordsImplemented for store/retrieve/expire
memory_embeddingsEmbedding references and vectorsImplemented for persistence
tasksActive workflow stateSchema only
artifactsExternalized artifact metadataSchema only
retrieval_eventsRetrieval audit/replay recordsImplemented for retrieval event persistence
provenanceLineage recordsSchema only

API Requirements

The API boundary must:

  • reject non-JSON write requests
  • reject oversized payloads
  • validate UUIDs, timestamps, enum values, scores, and bounds
  • redact secret-like values before storing memory summaries
  • return JSON errors without stack traces
  • expose retrieval events for local replay/audit inspection
  • keep durable storage implementation behind the application service contract

Retrieval Requirements

Retrieval must filter by:

  • scope
  • allowed sensitivity
  • validation status when require_verified is true
  • TTL expiration

Retrieval should rank by:

  • lexical or vector relevance
  • importance
  • confidence
  • recency

Current implementation supports scope, sensitivity, verification, TTL, lexical matching, vector relevance in the Postgres store, importance, and confidence. Recency ranking is future work.

Evolution Requirements

Future changes should preserve:

  • vLLM runtime statelessness
  • memory/runtime separation from governance authority
  • external artifact references instead of binary prompt memory
  • replayable retrieval events
  • replaceable embedding providers
  • MemGPT/Letta integration above Dubnium memory APIs, not as source of truth

Before adding autonomous memory writes, durable storage, redaction, retrieval filters, provenance, expiration, and replay evidence must pass local validation.

Memory Governance Contract

Status: draft

This contract defines how orchestrators such as Anthesis may request memory from the Dubnium memory service without delegating governance authority to the memory service itself.

Boundary

Anthesis / orchestrator
  - classifies task risk
  - authorizes memory scope
  - chooses sensitivity filters
  - decides whether retrieved memory enters prompt context
  - records execution envelope

Dubnium memory service
  - stores memories
  - filters by scope, sensitivity, verification, rejection, and TTL
  - returns retrieval candidates
  - records retrieval events and metadata

The memory service must not assemble final prompts or decide whether a memory is safe to inject into an agent context.

Scope Convention

Memory scopes should use one of these prefixes:

personal:
project:
session:
agent:
workflow:

Examples:

project:dubnium
session:11111111-1111-4111-8111-111111111111
agent:anthesis-reviewer
workflow:memory-phase-2

The current scope helper is advisory. Runtime enforcement may be added after existing callers and examples are fully migrated.

Retrieval Request

A retrieval request may include governance metadata in addition to the Phase 1 filters.

{
  "query": "What changed in memory phase 2?",
  "scope": "project:dubnium",
  "allowed_sensitivity": ["internal"],
  "require_verified": true,
  "limit": 8,
  "purpose": "review",
  "requester": {
    "actor_type": "agent",
    "actor_id": "anthesis-reviewer"
  },
  "envelope_id": "env-20260528-001"
}

Required Fields

FieldMeaning
queryRetrieval query text
scopeRetrieval boundary, such as project:dubnium

Optional Fields

FieldMeaningDefault
allowed_sensitivitySensitivity labels allowed in results["internal"]
require_verifiedWhether only verified memory may returnfalse
limitMaximum memory candidates8
purposeOrchestrator purpose: ask, plan, patch, review, testomitted
requesterActor requesting retrievalomitted
envelope_idUpstream Anthesis execution envelope idomitted

Retrieval Event

Every retrieval returns an event.

{
  "id": "uuid",
  "scope": "project:dubnium",
  "query": "What changed in memory phase 2?",
  "returned_memory_ids": ["uuid"],
  "returned_artifact_ids": [],
  "metadata": {
    "allowed_sensitivity": ["internal"],
    "require_verified": true,
    "limit": 8,
    "purpose": "review",
    "requester": {
      "actor_type": "agent",
      "actor_id": "anthesis-reviewer"
    },
    "envelope_id": "env-20260528-001"
  }
}

The event is an audit hook. It is not proof that the memory entered a prompt. Anthesis must separately record prompt assembly and provider execution in its own envelope.

Normal Retrieval Rules

Normal retrieval excludes memory when:

  • scope does not match the request
  • sensitivity is not explicitly allowed
  • require_verified is true and memory is not verified
  • memory is expired by TTL
  • memory has validation_status = rejected

Rejected memory is excluded even when require_verified = false.

Audit retrieval of rejected memory is future work and should use a separate endpoint or explicit audit mode.

Memory Promotion

Memory should move through explicit states:

working -> episodic -> semantic -> repo doc / ADR / runbook

Promotion rules:

  • working memory may be generated inside a session
  • episodic memory must summarize a meaningful event or task
  • semantic memory must represent a stable fact, decision, convention, or invariant
  • repo docs, ADRs, and runbooks remain higher-authority than memory rows

Rejection Reasons

Memory candidates should be rejected or marked rejected when they contain:

  • secret-like content that redaction could not confidently sanitize
  • cross-scope contamination
  • unsupported or hallucinated claims
  • stale facts
  • prompt-injection residue
  • weak or missing provenance

Rejected memory must not appear in normal retrieval paths.

Anthesis Envelope Handoff

Anthesis should record:

  • retrieval request
  • retrieval event id
  • returned memory ids
  • returned artifact ids
  • prompt assembly decision
  • provider decision
  • model/provider response
  • validation result

The memory service only supplies retrieval candidates and metadata. Governance remains external.

Anthesis Memory Envelope Examples

Status: draft

This document shows how Dubnium memory retrieval evidence should appear inside an Anthesis execution envelope. It is intentionally contract-only: Dubnium does not implement Anthesis runtime orchestration here.

Boundary

Dubnium memory service
  - stores memories
  - filters retrieval candidates
  - records retrieval events
  - returns memory ids, artifact ids, and retrieval metadata

Anthesis
  - authorizes retrieval
  - assembles prompts
  - decides whether retrieved memory may be used
  - records provider decisions
  - records validation results

The memory service retrieval event proves that memory was fetched. It does not prove that memory entered the model prompt. Anthesis must record the prompt assembly decision separately.

Envelope Fragment

A governed Anthesis execution envelope should include a memory section shaped like this:

{
  "memory": {
    "retrieval_request": {
      "query": "What is the current Dubnium memory boundary?",
      "scope": "project:dubnium",
      "allowed_sensitivity": ["internal"],
      "require_verified": true,
      "limit": 8,
      "purpose": "review",
      "requester": {
        "actor_type": "agent",
        "actor_id": "anthesis-reviewer"
      },
      "envelope_id": "env-20260528-001"
    },
    "retrieval_event": {
      "id": "22222222-2222-4222-8222-222222222222",
      "scope": "project:dubnium",
      "query": "What is the current Dubnium memory boundary?",
      "returned_memory_ids": [
        "11111111-1111-4111-8111-111111111111"
      ],
      "returned_artifact_ids": [],
      "metadata": {
        "allowed_sensitivity": ["internal"],
        "require_verified": true,
        "limit": 8,
        "purpose": "review",
        "requester": {
          "actor_type": "agent",
          "actor_id": "anthesis-reviewer"
        },
        "envelope_id": "env-20260528-001"
      }
    },
    "prompt_assembly_decision": {
      "used_memory_ids": [
        "11111111-1111-4111-8111-111111111111"
      ],
      "excluded_memory_ids": [],
      "decision": "used",
      "reason": "Verified internal project memory matched the authorized scope and review purpose."
    }
  }
}

Provider Decision Fragment

Memory evidence should sit beside, not inside, the provider decision.

{
  "provider_decision": {
    "selected_provider": "vllm.local",
    "selected_model": "qwen2.5-coder-14b-instruct",
    "provider_class": "local",
    "cloud_escalation_allowed": false,
    "reason": "Review task used verified internal project memory and did not require external context."
  }
}

Validation Fragment

Validation should explicitly tie output review to the memory/context decision.

{
  "validation": {
    "status": "passed",
    "checks": [
      {
        "name": "memory_scope",
        "status": "passed",
        "details": "All retrieved memory was scoped to project:dubnium."
      },
      {
        "name": "rejected_memory_exclusion",
        "status": "passed",
        "details": "No rejected memories were returned or used."
      },
      {
        "name": "prompt_assembly_recorded",
        "status": "passed",
        "details": "Used and excluded memory ids were recorded."
      }
    ]
  }
}

Non-Use Case

If memory is retrieved but not used, Anthesis should record that explicitly:

{
  "memory": {
    "retrieval_event_id": "22222222-2222-4222-8222-222222222222",
    "returned_memory_ids": [
      "11111111-1111-4111-8111-111111111111"
    ],
    "returned_artifact_ids": [],
    "prompt_assembly_decision": {
      "used_memory_ids": [],
      "excluded_memory_ids": [
        "11111111-1111-4111-8111-111111111111"
      ],
      "decision": "excluded",
      "reason": "Memory was relevant but unverified; task required verified memory."
    }
  }
}

Rejected Memory Case

Rejected memory should not appear in normal retrieval events. If a future audit mode retrieves rejected memory, the envelope must make the audit mode explicit:

{
  "memory_audit": {
    "mode": "audit_rejected_memory",
    "normal_prompt_use_allowed": false,
    "retrieved_rejected_memory_ids": [
      "33333333-3333-4333-8333-333333333333"
    ],
    "reason": "Operator audit of previously rejected cross-scope memory."
  }
}

Audit-mode retrieval is future work. Normal prompt assembly must not use rejected memory.

Minimum Envelope Requirements

For any Anthesis-governed run that uses Dubnium memory, record:

  • retrieval request
  • retrieval event id
  • returned memory ids
  • returned artifact ids
  • prompt assembly decision
  • used memory ids
  • excluded memory ids
  • provider decision
  • validation result

This creates a replayable boundary between retrieval, prompt assembly, provider execution, and validation.

vLLM Memory Phase 1 Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Build a minimal local persistent memory prototype around Dubnium’s existing vLLM service without coupling durable memory to transformer KV state.

Architecture: vLLM remains the inference runtime. A separate memory workload provides Postgres/pgvector storage, optional Redis working context, summarization and embedding workers, and a scoped retrieval API that an orchestrator can use before calling vLLM. A future governance layer remains external; Phase 1 records metadata and lifecycle events but does not implement the governance authority.

Tech Stack: NixOS modules, Postgres, pgvector, Redis, Python service code, pytest, systemd services.


Scope

This plan implements the Phase 1 prototype described in ADR-0010 and vLLM Persistent Memory Prototype.

Do not implement multi-agent federation, Temporal, MinIO, cryptographic attestation, a production policy DSL, or durable KV-cache persistence in this phase.

Do not implement Letta or another MemGPT-style framework in Phase 1. Keep it as an incremental upgrade candidate after storage, retrieval filters, redaction, provenance, and replay checks are stable.

Do not implement MinIO, OCI artifact publishing, VLM artifact resolution, or binary artifact extraction in Phase 1. Store artifact references and metadata only where needed; binary artifact pipelines are a later architecture phase.

Trust Boundaries

Risk: medium.

Attacker-controlled inputs include user prompts, agent messages, model output, tool output, retrieved artifacts, imported documents, and model-generated summaries. Treat all of them as untrusted before storage and before prompt assembly.

The Phase 1 implementation must enforce:

  • validation at API boundaries
  • scoped retrieval before prompt assembly
  • redaction before durable storage
  • TTL filtering
  • sensitivity metadata and filters
  • provenance on every memory row and artifact reference
  • retrieval event logging for later replay
  • no secret values in logs or memory payloads

Planned Files

Create:

  • modules/workloads/memory.nix: NixOS workload module for Postgres, pgvector, Redis, memory API, and workers.
  • pkgs/memory-service/default.nix: package the local Python memory service.
  • pkgs/memory-service/pyproject.toml: Python package metadata.
  • pkgs/memory-service/src/dubnium_memory/__init__.py: package marker.
  • pkgs/memory-service/src/dubnium_memory/api.py: HTTP API boundary and input validation.
  • pkgs/memory-service/src/dubnium_memory/config.py: environment parsing.
  • pkgs/memory-service/src/dubnium_memory/db.py: database connection and migrations runner.
  • pkgs/memory-service/src/dubnium_memory/models.py: typed request and memory models.
  • pkgs/memory-service/src/dubnium_memory/filters.py: retrieval scope, TTL, and sensitivity filters.
  • pkgs/memory-service/src/dubnium_memory/redaction.py: secret and sensitive payload redaction.
  • pkgs/memory-service/src/dubnium_memory/retrieval.py: scoped query and ranking logic.
  • pkgs/memory-service/src/dubnium_memory/storage.py: memory persistence.
  • pkgs/memory-service/src/dubnium_memory/workers.py: summarization and embedding worker entrypoints.
  • pkgs/memory-service/migrations/001_initial.sql: schema for sessions, memories, embeddings, tasks, artifacts, retrieval events, and provenance.
  • pkgs/memory-service/tests/test_filters.py: retrieval filter tests.
  • pkgs/memory-service/tests/test_redaction.py: redaction tests.
  • pkgs/memory-service/tests/test_storage.py: storage contract tests.
  • pkgs/memory-service/tests/test_retrieval.py: retrieval filter tests.
  • docs/runbooks/memory-service.md: operator runbook for the prototype.

Modify:

  • modules/dubnium/options.nix: add dubnium.memory options and assertions.
  • hosts/workstation/default.nix: import and enable the memory workload for the workstation only after the module evaluates.
  • flake.nix: expose the memory-service package.
  • docs/README.md: link the memory service runbook.
  • docs/SUMMARY.md: link the memory service runbook.

Implementation Tasks

Task 1: Add Memory Options

Files:

  • Modify: modules/dubnium/options.nix

  • Step 1: Add a disabled-by-default dubnium.memory option set

Add this next to the existing dubnium.vllm and dubnium.k3s options:

memory = {
  enable = mkEnableOption "persistent memory services for local vLLM orchestration";

  api = {
    host = mkOption {
      type = types.str;
      default = "127.0.0.1";
      description = "Host address bound by the Dubnium memory API.";
    };

    port = mkOption {
      type = types.port;
      default = 8090;
      description = "Port bound by the Dubnium memory API.";
    };
  };

  database = {
    name = mkOption {
      type = types.str;
      default = "dubnium_memory";
      description = "Postgres database used by the Dubnium memory subsystem.";
    };

    user = mkOption {
      type = types.str;
      default = "dubnium_memory";
      description = "Postgres role used by the Dubnium memory service.";
    };
  };

  redis = {
    enable = mkOption {
      type = types.bool;
      default = true;
      description = "Whether Redis is enabled for transient working context and worker queues.";
    };
  };

  retention = {
    defaultTtlDays = mkOption {
      type = types.nullOr types.int;
      default = null;
      description = "Default TTL in days for memory objects without an explicit TTL.";
    };
  };
};
  • Step 2: Add assertions for safe local defaults

Add these to the existing assertions list:

{
  assertion = (!config.dubnium.memory.enable) || (config.dubnium.memory.api.host == "127.0.0.1");
  message = "dubnium.memory.api.host must stay local-only for the Phase 1 prototype";
}
{
  assertion =
    (config.dubnium.memory.retention.defaultTtlDays == null)
    || (config.dubnium.memory.retention.defaultTtlDays > 0);
  message = "dubnium.memory.retention.defaultTtlDays must be positive when set";
}
  • Step 3: Verify option evaluation

Run:

nix --extra-experimental-features "nix-command flakes" eval .#nixosConfigurations.workstation.config.dubnium.memory.enable

Expected:

false

Task 2: Package The Memory Service Skeleton

Files:

  • Create: pkgs/memory-service/default.nix

  • Create: pkgs/memory-service/pyproject.toml

  • Create: pkgs/memory-service/src/dubnium_memory/__init__.py

  • Create: pkgs/memory-service/src/dubnium_memory/config.py

  • Create: pkgs/memory-service/src/dubnium_memory/api.py

  • Modify: flake.nix

  • Step 1: Create package metadata

Create pkgs/memory-service/pyproject.toml:

[project]
name = "dubnium-memory"
version = "0.1.0"
description = "Local persistent memory service for Dubnium vLLM orchestration"
requires-python = ">=3.12"
dependencies = [
  "fastapi",
  "pydantic",
  "psycopg[binary]",
  "uvicorn",
]

[project.scripts]
dubnium-memory-api = "dubnium_memory.api:main"
  • Step 2: Create the Nix package

Create pkgs/memory-service/default.nix:

{ python312Packages }:

python312Packages.buildPythonApplication {
  pname = "dubnium-memory";
  version = "0.1.0";
  pyproject = true;

  src = ./.;

  build-system = [
    python312Packages.setuptools
    python312Packages.wheel
  ];

  dependencies = [
    python312Packages.fastapi
    python312Packages.pydantic
    python312Packages.psycopg
    python312Packages.uvicorn
  ];
}
  • Step 3: Add minimal app entrypoint

Create pkgs/memory-service/src/dubnium_memory/__init__.py:

"""Dubnium persistent memory service."""

Create pkgs/memory-service/src/dubnium_memory/config.py:

from pydantic import BaseModel


class Settings(BaseModel):
    database_url: str
    host: str = "127.0.0.1"
    port: int = 8090

Create pkgs/memory-service/src/dubnium_memory/api.py:

import os

from fastapi import FastAPI
import uvicorn

from dubnium_memory.config import Settings


app = FastAPI(title="Dubnium Memory API")


@app.get("/healthz")
def healthz() -> dict[str, str]:
    return {"status": "ok"}


def settings_from_env() -> Settings:
    return Settings(
        database_url=os.environ["DATABASE_URL"],
        host=os.environ.get("DUBNIUM_MEMORY_HOST", "127.0.0.1"),
        port=int(os.environ.get("DUBNIUM_MEMORY_PORT", "8090")),
    )


def main() -> None:
    settings = settings_from_env()
    uvicorn.run(app, host=settings.host, port=settings.port)
  • Step 4: Expose the package from the flake

Modify flake.nix under packages.${system}:

memory-service = pkgs.callPackage ./pkgs/memory-service { };
  • Step 5: Verify package build

Run:

nix --extra-experimental-features "nix-command flakes" build .#memory-service

Expected:

result/bin/dubnium-memory-api exists

Task 3: Add Schema And Storage Contracts

Files:

  • Create: pkgs/memory-service/migrations/001_initial.sql

  • Create: pkgs/memory-service/src/dubnium_memory/models.py

  • Create: pkgs/memory-service/src/dubnium_memory/storage.py

  • Create: pkgs/memory-service/tests/test_storage.py

  • Step 1: Create the first migration

Create pkgs/memory-service/migrations/001_initial.sql:

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE IF NOT EXISTS sessions (
  id uuid PRIMARY KEY,
  scope text NOT NULL,
  created_at timestamptz NOT NULL DEFAULT now()
);

CREATE TABLE IF NOT EXISTS memories (
  id uuid PRIMARY KEY,
  session_id uuid REFERENCES sessions(id),
  memory_type text NOT NULL CHECK (memory_type IN ('working', 'episodic', 'semantic')),
  summary text NOT NULL,
  scope text NOT NULL,
  importance double precision NOT NULL DEFAULT 0.0,
  confidence double precision NOT NULL DEFAULT 0.0,
  sensitivity text NOT NULL DEFAULT 'internal',
  validation_status text NOT NULL DEFAULT 'unverified',
  ttl timestamptz,
  source text NOT NULL,
  provenance jsonb NOT NULL DEFAULT '{}'::jsonb,
  created_at timestamptz NOT NULL DEFAULT now()
);

CREATE TABLE IF NOT EXISTS memory_embeddings (
  memory_id uuid PRIMARY KEY REFERENCES memories(id) ON DELETE CASCADE,
  embedding vector(384) NOT NULL,
  model text NOT NULL,
  created_at timestamptz NOT NULL DEFAULT now()
);

CREATE TABLE IF NOT EXISTS tasks (
  id uuid PRIMARY KEY,
  scope text NOT NULL,
  status text NOT NULL,
  state jsonb NOT NULL DEFAULT '{}'::jsonb,
  created_at timestamptz NOT NULL DEFAULT now(),
  updated_at timestamptz NOT NULL DEFAULT now()
);

CREATE TABLE IF NOT EXISTS artifacts (
  id uuid PRIMARY KEY,
  scope text NOT NULL,
  uri text NOT NULL,
  media_type text,
  sensitivity text NOT NULL DEFAULT 'internal',
  provenance jsonb NOT NULL DEFAULT '{}'::jsonb,
  created_at timestamptz NOT NULL DEFAULT now()
);

CREATE TABLE IF NOT EXISTS provenance (
  id uuid PRIMARY KEY,
  memory_id uuid REFERENCES memories(id) ON DELETE CASCADE,
  source_identity text NOT NULL,
  source_event jsonb NOT NULL,
  created_at timestamptz NOT NULL DEFAULT now()
);

CREATE TABLE IF NOT EXISTS retrieval_events (
  id uuid PRIMARY KEY,
  scope text NOT NULL,
  query text NOT NULL,
  returned_memory_ids uuid[] NOT NULL DEFAULT '{}',
  returned_artifact_ids uuid[] NOT NULL DEFAULT '{}',
  created_at timestamptz NOT NULL DEFAULT now()
);

CREATE INDEX IF NOT EXISTS memories_scope_created_at_idx
  ON memories (scope, created_at DESC);

CREATE INDEX IF NOT EXISTS memories_ttl_idx
  ON memories (ttl);
  • Step 2: Define typed storage input

Create pkgs/memory-service/src/dubnium_memory/models.py:

from datetime import datetime
from typing import Literal
from uuid import UUID

from pydantic import BaseModel, Field


MemoryType = Literal["working", "episodic", "semantic"]
ValidationStatus = Literal["unverified", "verified", "rejected"]


class MemoryIn(BaseModel):
    id: UUID
    session_id: UUID | None = None
    memory_type: MemoryType
    summary: str = Field(min_length=1, max_length=8000)
    scope: str = Field(min_length=1, max_length=256)
    importance: float = Field(default=0.0, ge=0.0, le=1.0)
    confidence: float = Field(default=0.0, ge=0.0, le=1.0)
    sensitivity: str = Field(default="internal", max_length=64)
    validation_status: ValidationStatus = "unverified"
    ttl: datetime | None = None
    source: str = Field(min_length=1, max_length=128)
    provenance: dict
  • Step 3: Implement storage with parameterized SQL

Create pkgs/memory-service/src/dubnium_memory/storage.py:

from psycopg import Connection

from dubnium_memory.models import MemoryIn


def store_memory(conn: Connection, memory: MemoryIn) -> None:
    conn.execute(
        """
        INSERT INTO memories (
          id, session_id, memory_type, summary, scope, importance, confidence,
          sensitivity, validation_status, ttl, source, provenance
        )
        VALUES (
          %(id)s, %(session_id)s, %(memory_type)s, %(summary)s, %(scope)s,
          %(importance)s, %(confidence)s, %(sensitivity)s, %(validation_status)s,
          %(ttl)s, %(source)s, %(provenance)s
        )
        """,
        memory.model_dump(),
    )
  • Step 4: Add a storage test

Create pkgs/memory-service/tests/test_storage.py:

from uuid import uuid4

from dubnium_memory.models import MemoryIn


def test_memory_requires_summary() -> None:
    payload = {
        "id": uuid4(),
        "memory_type": "episodic",
        "summary": "",
        "scope": "project:dubnium",
        "source": "conversation",
        "provenance": {"origin": "test"},
    }

    try:
        MemoryIn(**payload)
    except Exception as exc:
        assert "summary" in str(exc)
    else:
        raise AssertionError("empty summary should be rejected")

Task 4: Add Redaction And Retrieval Filters

Files:

  • Create: pkgs/memory-service/src/dubnium_memory/redaction.py

  • Create: pkgs/memory-service/src/dubnium_memory/filters.py

  • Create: pkgs/memory-service/tests/test_redaction.py

  • Create: pkgs/memory-service/tests/test_filters.py

  • Step 1: Implement conservative redaction

Create pkgs/memory-service/src/dubnium_memory/redaction.py:

import re


SECRET_PATTERNS = [
    re.compile(r"(?i)(api[_-]?key|token|secret|password)\s*[:=]\s*([^\s]+)"),
]


def redact_text(value: str) -> str:
    redacted = value
    for pattern in SECRET_PATTERNS:
        redacted = pattern.sub(r"\1=[REDACTED]", redacted)
    return redacted
  • Step 2: Test redaction

Create pkgs/memory-service/tests/test_redaction.py:

from dubnium_memory.redaction import redact_text


def test_redacts_api_key_like_values() -> None:
    text = "OPENAI_API_KEY=sk-test-value"

    assert redact_text(text) == "OPENAI_API_KEY=[REDACTED]"
  • Step 3: Implement retrieval filtering

Create pkgs/memory-service/src/dubnium_memory/filters.py:

from datetime import datetime, timezone
from typing import TypedDict


class MemoryCandidate(TypedDict):
    id: str
    scope: str
    sensitivity: str
    validation_status: str
    ttl: datetime | None


def is_retrievable(
    memory: MemoryCandidate,
    *,
    scope: str,
    allowed_sensitivity: set[str],
    require_verified: bool,
) -> bool:
    if memory["scope"] != scope:
        return False
    if memory["sensitivity"] not in allowed_sensitivity:
        return False
    if require_verified and memory["validation_status"] != "verified":
        return False
    if memory["ttl"] is not None and memory["ttl"] <= datetime.now(timezone.utc):
        return False
    return True
  • Step 4: Test scope and TTL enforcement

Create pkgs/memory-service/tests/test_filters.py:

from datetime import datetime, timedelta, timezone

from dubnium_memory.filters import is_retrievable


def test_rejects_cross_scope_memory() -> None:
    memory = {
        "id": "m1",
        "scope": "project:other",
        "sensitivity": "internal",
        "validation_status": "verified",
        "ttl": None,
    }

    assert not is_retrievable(
        memory,
        scope="project:dubnium",
        allowed_sensitivity={"internal"},
        require_verified=True,
    )


def test_rejects_expired_memory() -> None:
    memory = {
        "id": "m1",
        "scope": "project:dubnium",
        "sensitivity": "internal",
        "validation_status": "verified",
        "ttl": datetime.now(timezone.utc) - timedelta(days=1),
    }

    assert not is_retrievable(
        memory,
        scope="project:dubnium",
        allowed_sensitivity={"internal"},
        require_verified=True,
    )

Task 5: Add Retrieval API Boundary

Files:

  • Modify: pkgs/memory-service/src/dubnium_memory/api.py

  • Create: pkgs/memory-service/src/dubnium_memory/retrieval.py

  • Create: pkgs/memory-service/tests/test_retrieval.py

  • Step 1: Add request and response models

Add to models.py:

class RetrieveRequest(BaseModel):
    query: str = Field(min_length=1, max_length=4000)
    scope: str = Field(min_length=1, max_length=256)
    allowed_sensitivity: list[str] = Field(default_factory=lambda: ["internal"])
    require_verified: bool = False
    limit: int = Field(default=8, ge=1, le=32)


class RetrievedMemory(BaseModel):
    id: UUID
    summary: str
    scope: str
    sensitivity: str
    validation_status: ValidationStatus
    provenance: dict
  • Step 2: Implement retrieval query contract

Create pkgs/memory-service/src/dubnium_memory/retrieval.py:

from psycopg import Connection

from dubnium_memory.models import RetrieveRequest, RetrievedMemory


def retrieve_memories(conn: Connection, request: RetrieveRequest) -> list[RetrievedMemory]:
    rows = conn.execute(
        """
        SELECT id, summary, scope, sensitivity, validation_status, provenance
        FROM memories
        WHERE scope = %(scope)s
          AND sensitivity = ANY(%(allowed_sensitivity)s)
          AND (%(require_verified)s = false OR validation_status = 'verified')
          AND (ttl IS NULL OR ttl > now())
        ORDER BY importance DESC, created_at DESC
        LIMIT %(limit)s
        """,
        request.model_dump(),
    ).fetchall()
    return [RetrievedMemory.model_validate(dict(row)) for row in rows]
  • Step 3: Add API endpoint

Add to api.py:

from dubnium_memory.models import RetrieveRequest, RetrievedMemory


@app.post("/memory/retrieve")
def retrieve(request: RetrieveRequest) -> list[RetrievedMemory]:
    raise NotImplementedError("database connection wiring is added in the service module task")

Keep this endpoint local-only until the database dependency is wired. Do not expose it on the network in Phase 1.

Task 6: Add NixOS Workload Module

Files:

  • Create: modules/workloads/memory.nix

  • Modify: hosts/workstation/default.nix

  • Step 1: Create the workload module

Create modules/workloads/memory.nix:

{ lib, config, pkgs, ... }:
let
  cfg = config.dubnium.memory;
  memoryPackage = pkgs.callPackage ../../pkgs/memory-service { };
in
{
  config = lib.mkIf cfg.enable {
    services.postgresql = {
      enable = true;
      extensions = ps: [ ps.pgvector ];
      ensureDatabases = [ cfg.database.name ];
      ensureUsers = [
        {
          name = cfg.database.user;
          ensureDBOwnership = true;
        }
      ];
    };

    services.redis.servers.dubnium-memory = lib.mkIf cfg.redis.enable {
      enable = true;
      bind = "127.0.0.1";
      port = 6379;
    };

    systemd.services.dubnium-memory-api = {
      description = "Dubnium persistent memory API";
      wantedBy = [ "multi-user.target" ];
      after = [ "postgresql.service" ];
      requires = [ "postgresql.service" ];
      environment = {
        DUBNIUM_MEMORY_HOST = cfg.api.host;
        DUBNIUM_MEMORY_PORT = toString cfg.api.port;
        DATABASE_URL = "postgresql:///${cfg.database.name}?host=/run/postgresql";
      };
      serviceConfig = {
        Type = "simple";
        ExecStart = "${memoryPackage}/bin/dubnium-memory-api";
        Restart = "always";
        RestartSec = "5s";
        NoNewPrivileges = true;
        PrivateTmp = true;
        ProtectHome = true;
        Slice = "platform.slice";
      };
    };
  };
}
  • Step 2: Import the module without enabling it

Modify hosts/workstation/default.nix imports:

../../modules/workloads/memory.nix

Do not set dubnium.memory.enable = true until package build and module eval pass.

  • Step 3: Verify disabled module eval

Run:

nix --extra-experimental-features "nix-command flakes" eval .#nixosConfigurations.workstation.config.systemd.services.dubnium-memory-api.enable

Expected: the attribute should be absent or evaluation should show the service is not defined while dubnium.memory.enable = false.

Task 7: Enable Prototype Locally

Files:

  • Modify: hosts/workstation/default.nix

  • Step 1: Enable the memory workload

Add under dubnium:

memory = {
  enable = true;
  api.host = "127.0.0.1";
  api.port = 8090;
};
  • Step 2: Verify generated service contracts

Run:

nix --extra-experimental-features "nix-command flakes" eval .#nixosConfigurations.workstation.config.services.postgresql.enable
nix --extra-experimental-features "nix-command flakes" eval .#nixosConfigurations.workstation.config.services.redis.servers.dubnium-memory.enable
nix --extra-experimental-features "nix-command flakes" eval .#nixosConfigurations.workstation.config.systemd.services.dubnium-memory-api.environment.DUBNIUM_MEMORY_HOST

Expected:

true
true
"127.0.0.1"

Task 8: Add Operator Runbook

Files:

  • Create: docs/runbooks/memory-service.md

  • Modify: docs/README.md

  • Modify: docs/SUMMARY.md

  • Step 1: Create the runbook

Create docs/runbooks/memory-service.md with:

# Runbook: Memory Service

Status: prototype

Use this after `dubnium.memory.enable = true`.

## Verify Services

```bash
systemctl status postgresql
systemctl status redis-dubnium-memory
systemctl status dubnium-memory-api
curl http://127.0.0.1:8090/healthz

Expected:

{"status":"ok"}

Security Checks

  • the API binds to 127.0.0.1
  • memories include scope, sensitivity, validation status, and provenance
  • expired memories are not returned
  • sensitive memories are not returned unless explicitly allowed
  • retrieval events are logged with memory ids and artifact references
  • logs do not contain raw token-like values

- [ ] **Step 2: Link the runbook**

Add `Memory Service` to the Runbooks lists in `docs/README.md` and
`docs/SUMMARY.md`.

- [ ] **Step 3: Build docs**

Run:

```bash
mdbook build

Expected: docs build succeeds. Generated web/docs changes may be reverted if the review scope is source docs only.

Final Verification

Before committing Phase 1 implementation:

git diff --check
nix --extra-experimental-features "nix-command flakes" build .#memory-service
nix --extra-experimental-features "nix-command flakes" eval .#nixosConfigurations.workstation.config.dubnium.memory.enable
pytest pkgs/memory-service/tests
mdbook build

If a full workstation build still fails on the known placeholder hardware configuration, report that separately from targeted memory-module evaluation.

Follow-Up: MemGPT-Style Agent Upgrade

After Phase 1 is stable, create a separate ADR or spike plan for evaluating Letta as the maintained framework lineage from MemGPT. That spike should be read-only against existing memory rows at first, then test controlled agent-managed memory writes only after external governance hooks and replay evidence are in place.

Follow-Up: Artifact And OCI Architecture

After Phase 1 is stable, create a separate implementation plan for artifact handling. That work should start with filesystem content-addressed storage and metadata extraction, then evaluate MinIO and OCI-style exported cognition artifacts only after memory rows, retrieval events, and artifact references have stable ids.

Memory Phase 2: Governed Structured Memory

Status: planning

Phase 1 prepared the local memory substrate: package, API, Postgres/pgvector schema, Redis support, retrieval events, tests, and an opt-in workstation runbook.

Phase 2 makes memory useful for governed agent workflows without turning the memory service into the governance authority.

Goal

Build a structured, policy-aware memory layer that Anthesis or another orchestrator can govern explicitly.

The Phase 2 target is:

Anthesis decides what memory may be used.
Dubnium stores, retrieves, filters, and records memory events.
vLLM remains the inference runtime.

Non-Goals

Do not implement these in Phase 2:

  • autonomous self-editing memory
  • global always-on personal memory injection
  • durable transformer KV-cache persistence
  • multi-agent memory federation
  • Temporal or complex workflow orchestration
  • MinIO or OCI memory bundles
  • raw artifact extraction pipelines
  • public or Tailscale-exposed memory API
  • Anthesis itself inside the Dubnium memory service

Boundary

flowchart TD
    A[Anthesis / Orchestrator] --> B[Memory Policy Decision]
    B --> C[Dubnium Memory API]
    C --> D[(Postgres)]
    C --> E[(pgvector)]
    C --> F[(Redis)]
    C --> G[Retrieval Event]
    G --> A
    A --> H[Execution Envelope]
    A --> I[vLLM / Agent Prompt]

Dubnium must expose enough structure for Anthesis to audit and replay memory use, but Dubnium must not silently decide that retrieved memory belongs in a prompt.

Phase 2 Capabilities

1. Memory Namespaces

Add explicit namespace concepts on top of the existing scope field.

Suggested namespace shape:

personal:<name>
project:<repo-or-system>
session:<uuid>
agent:<agent-id>
workflow:<workflow-id>

The existing scope field can remain the primary filter, but Phase 2 should document and validate accepted scope patterns.

2. Memory Classes

Keep the current memory types:

  • working
  • episodic
  • semantic

Add operational guidance:

TypeMeaningDefault retention
workingtransient task/session contextshort TTL
episodicevent/session summariesmedium or explicit TTL
semanticnormalized stable facts/decisionslong-lived but reviewable

Semantic memory should require stronger provenance and confidence than working memory.

3. Governance Metadata

Each memory row already carries sensitivity, validation_status, source, provenance, and ttl. Phase 2 should standardize expected provenance fields.

Recommended provenance shape:

{
  "origin": "agent|operator|system|import",
  "source_uri": "optional source reference",
  "source_event_id": "optional event id",
  "extractor": "manual|summary-worker|agent",
  "extractor_version": "1",
  "governance": "manual|anthesis|none",
  "envelope_id": "optional Anthesis envelope id"
}

4. Retrieval Policy Contract

Add a policy-facing retrieval request contract:

{
  "query": "string",
  "scope": "project:dubnium",
  "allowed_sensitivity": ["internal"],
  "require_verified": false,
  "limit": 8,
  "purpose": "ask|plan|patch|review|test",
  "requester": {
    "actor_type": "human|agent|system",
    "actor_id": "string"
  },
  "envelope_id": "optional Anthesis envelope id"
}

The existing API can continue accepting the Phase 1 shape, but Phase 2 should add optional fields and preserve backward compatibility.

5. Retrieval Event Completeness

Retrieval events should eventually record:

  • query
  • scope
  • returned memory ids
  • returned artifact ids
  • allowed sensitivities
  • require_verified
  • requester
  • purpose
  • envelope id
  • timestamp

This is the key replay hook for Anthesis.

6. Memory Promotion

Add an explicit promotion workflow:

working -> episodic -> semantic -> repo doc / ADR / runbook

Rules:

  • working memory can be generated freely inside a session
  • episodic memory requires summarization and provenance
  • semantic memory requires confidence, review status, and scope
  • repo docs remain the highest-authority source for durable project truth

7. Memory Rejection

Add a clear rejection path:

candidate memory -> rejected -> never retrieved unless explicitly requested for audit

Rejection reasons should include:

  • secret-like content
  • cross-scope contamination
  • hallucinated or unsupported claim
  • stale fact
  • prompt-injection residue
  • unsupported provenance

8. Prompt Assembly Boundary

The memory service should never return a final prompt. It should return candidates and event metadata.

The orchestrator owns:

  • prompt assembly
  • context ordering
  • final redaction
  • policy enforcement
  • provider selection
  • execution envelope capture

Implementation Tasks

Task 1: Add Governance-Oriented Request Metadata

Files:

  • pkgs/memory-service/src/dubnium_memory/models.py
  • pkgs/memory-service/src/dubnium_memory/serialization.py
  • pkgs/memory-service/tests/test_models.py
  • pkgs/memory-service/tests/test_api.py

Add optional fields to retrieval requests:

  • purpose
  • requester
  • envelope_id

Keep them optional so Phase 1 clients do not break.

Task 2: Extend Retrieval Events

Files:

  • pkgs/memory-service/src/dubnium_memory/migrations/003_retrieval_event_metadata.sql
  • pkgs/memory-service/src/dubnium_memory/postgres.py
  • pkgs/memory-service/tests/test_migrations.py
  • pkgs/memory-service/tests/test_postgres.py

Add nullable metadata columns or a metadata jsonb field to retrieval_events.

Recommended initial shape:

ALTER TABLE retrieval_events
  ADD COLUMN IF NOT EXISTS metadata jsonb NOT NULL DEFAULT '{}'::jsonb;

This avoids premature schema churn while keeping replay metadata available.

Task 3: Add Scope Validation Helpers

Files:

  • pkgs/memory-service/src/dubnium_memory/scopes.py
  • pkgs/memory-service/tests/test_scopes.py

Add validation for scope prefixes:

  • personal:
  • project:
  • session:
  • agent:
  • workflow:

Do not enforce globally until existing tests and callers are migrated.

Task 4: Add Promotion/Rejection Contract Docs

Files:

  • docs/specs/memory-governance-contract.md
  • docs/runbooks/memory-service.md

Document:

  • memory promotion rules
  • rejection reasons
  • semantic memory expectations
  • Anthesis envelope handoff

Task 5: Add Policy Examples

Files:

  • docs/examples/memory-policy.project-dubnium.json
  • docs/examples/memory-retrieval-request.json
  • docs/examples/memory-retrieval-event.json

These examples should be data contracts, not active enforcement.

Acceptance Criteria

Phase 2 is complete when:

  • retrieval requests can carry optional governance metadata
  • retrieval events preserve that metadata for replay
  • scope conventions are documented and testable
  • memory promotion/rejection rules are documented
  • Anthesis can use memory ids and retrieval event ids in execution envelopes
  • no prompt assembly happens inside the memory service
  • memory remains opt-in on the workstation host

Risks

RiskMitigation
Memory poisoningrequire scope, provenance, validation status, and retrieval event logging
Cross-project leakageenforce scoped retrieval and explicit sensitivity filters
Silent context injectionkeep prompt assembly outside memory service
Governance couplingexpose metadata; let Anthesis decide policy
Schema churnprefer additive migrations and metadata JSON for early governance fields
Stale semantic factsuse confidence, validation status, TTL, and promotion workflow

The first Phase 2 PR should be small:

  1. Add metadata jsonb to retrieval_events
  2. Add optional retrieval request metadata fields
  3. Preserve metadata in retrieval event responses
  4. Add tests
  5. Add governance contract docs

Do not add Anthesis runtime wiring yet.

Architecture Overview

Status: living

This is the arc42-lite entrypoint for Dubnium. It describes the system shape, constraints, building blocks, runtime behavior, deployment view, and current risks without replacing lower-level implementation docs.

Purpose

Dubnium is a policy-driven NixOS workstation and AI node. It supports multiple host-local operational contracts on one physical machine:

  • desktop: interactive Hyprland workstation and development mode.
  • studio-local: conditional low-latency audio overlay on desktop.
  • compute: headless throughput-oriented AI/platform mode.

The architecture exists to make mode transitions explicit, observable, guard-driven, auditable, and reversible.

Constraints

  • Desired state is not current state.
  • Current state must be derived from runtime observation.
  • Runtime reconciliation is mandatory for mode changes.
  • systemd targets, services, and slices are the enforcement mechanism.
  • Runtime switching comes before NixOS specialisations.
  • studio-local is conditional and must not dominate the architecture.
  • Host-local modes must remain separate from capability placement.
  • Failure, degraded, and blocked states must be modeled explicitly.

System Context

Actors and adjacent systems:

  • Local operator: requests mode changes, checks status, recovers failures.
  • NixOS host: owns systemd enforcement, hardware, services, and runtime state.
  • GPU/display/audio hardware: shared resources with conflicting latency and throughput requirements.
  • vLLM: compute workload, active only in compute for v1.
  • k3s: platform workload, stable across modes for v1.
  • Possible external studio host: future placement for audio/studio capability.

Building Blocks

  • Nix flake: declares the host configuration and packaged tools.
  • modules/dubnium: mode policy, options, targets, slices, controller units, state files, and guard installation.
  • modules/workloads: workload-specific service definitions such as Hyprland, audio, NVIDIA, vLLM, and k3s.
  • mode CLI: operator surface for requests, status, desired/current state, and explanation.
  • Reconciler: privileged transition executor.
  • Observer: evidence-based classifier for current mode.
  • Guards: small checks that return pass, policy block, or execution error.
  • systemd: target, service, slice, and cgroup enforcement layer.

Runtime View

All mode changes follow the same control-loop shape:

  1. Authorize the request.
  2. Write desired state.
  3. Acquire the controller lock.
  4. Observe current state from runtime facts.
  5. Validate target and capability placement.
  6. Run transition guards.
  7. Execute bounded actions through systemd and helper scripts.
  8. Re-observe.
  9. Classify success, degraded state, blocked state, or failure.
  10. Write transition and guard records.

Success is never inferred from attempted actions. Success requires post-transition observation that satisfies the target mode predicates.

Deployment View

Primary deployment target:

  • one x86_64-linux NixOS workstation host named workstation
  • Hyprland desktop
  • NVIDIA/CUDA runtime
  • planned dual-GPU topology, with hardware-tolerant transitional config
  • vLLM model/cache state outside the Nix store
  • k3s control-node duties

Runtime state:

  • live state under /run/mode-controller
  • future persistent audit history under /var/lib/mode-controller or /persist/var/lib/mode-controller when impermanence lands

Cross-Cutting Concerns

  • Safety: guards block destructive transitions and distinguish policy blocks from execution errors.
  • Observability: status must show desired state, observed state, conflicts, guard failures, and latest transition result.
  • Auditability: every reconciliation attempt should produce structured records.
  • Resource ownership: GPU, CPU, memory, I/O, audio, AI, and platform planes must not silently overlap in conflicting ways.
  • Security: unprivileged users must not forge desired/current state or transition success.

Current Risks

  • NVIDIA/Wayland GPU release may not be reliable enough for runtime-only compute promotion.
  • Mixed runtime states may confuse a shell observer unless conflicts are handled conservatively.
  • systemctl isolate can stop required services if target dependencies are not explicit enough.
  • Rollback must prove restored desktop behavior through observation, not just successful systemd commands.

See also:

Control Plane

Status: living

The control plane reconciles requested mode intent with runtime facts. It is a local privileged authority, not a convenience shell script.

Authority Model

V1 decision:

  • transition execution is privileged
  • the initial operator path is sudo mode request <mode> or a root-owned mode-controller@.service
  • unprivileged users must not mutate observed state or forge transition success

Future options:

  • polkit-mediated request path
  • local service endpoint
  • richer automation integration

Those options should not be added until the root/sudo path proves the control loop on the target host.

State Model

Live state lives under /run/mode-controller:

desired
current
lock
last-transition.json
last-guards.json
capability-placement.json
hardware-topology.json

Persistent transition history lives under:

/var/lib/mode-controller/events.jsonl

Each line is an append-only JSON event emitted by the reconciler. Initial event types:

  • transition

Initial event fields:

  • timestamp
  • requested
  • prior
  • final
  • success
  • reason

This event stream is intended to become the basis for:

  • audit history
  • degraded transition diagnosis
  • future reconciliation analytics
  • operator replay/debug tooling
  • higher-level memory/context systems

V1 accepts plain desired and current files for the first bootable milestone. The hardening path is either:

  • migrate to desired.json and current.json, or
  • explicitly document the plain-text files as stable interface and keep structured metadata in transition records.

When impermanence is introduced, the persistent event path can be mapped to:

/persist/var/lib/mode-controller/events.jsonl

Reconciliation Sequence

Every requested transition follows this sequence:

  1. acquire lock
  2. observe current state
  3. validate requested target
  4. validate capability placement
  5. run guards
  6. execute bounded actions
  7. re-observe
  8. classify final state
  9. record guard, action, timing, and outcome data
  10. release lock

If target predicates fail after mutation, the controller must attempt rollback, classify a degraded state, or report failed-transition.

Observer Contract

The observer must derive current state from evidence only. It must not trust desired state as proof of success.

Required output fields for JSON mode:

{
  "observed_state": "desktop",
  "confidence": "high",
  "degraded": false,
  "signals": {},
  "conflicts": [],
  "timestamp": "..."
}

Required signal families:

  • graphical session presence
  • compositor/display-manager state
  • compute.target
  • vllm.service
  • studio-local-policy.service
  • PipeWire/JACK/REAPER indicators
  • GPU process/VRAM evidence when available
  • controller lock/transition marker
  • latest failed transition marker

Conservative rule: report unknown, transitioning, degraded-*, or failed-transition instead of pretending a stable target has been reached.

Guard Contract

Guards are small deterministic checks with stable exit classes:

0     pass
10-19 policy block
20+   execution error

Initial guard set:

  • check_target_reachable
  • check_audio_idle
  • check_graphical_session_terminable
  • check_gpu_display_released
  • check_vllm_drainable
  • check_compute_capability_local
  • check_studio_capability_local
  • check_memory_headroom
  • check_persistence_paths_ready

Each guard should emit a reason code and evidence payload suitable for mode explain and transition logs.

Failure Semantics

Blocked transition:

  • a guard returns a policy block
  • desired state may remain requested
  • current state must not be rewritten to target

Execution error:

  • a guard or action could not run reliably
  • target should not be considered safe

Degraded state:

  • system is usable but does not satisfy all target guarantees
  • must be surfaced directly in status

Failed transition:

  • no stable or acceptable degraded contract could be established
  • rollback failed or final observation remained unsafe/conflicted

Runtime Behavior

Status: living

This document describes how Dubnium behaves while switching between host-local modes.

Modes

desktop

Intent:

  • interactive workstation and development mode

Expected runtime facts:

  • graphical session available
  • ordinary audio available
  • display GPU protected for UI
  • vLLM inactive in v1
  • k3s may remain active with bounded platform pressure

studio-local

Intent:

  • low-latency local audio profile when studio capability remains on this host

V1 representation:

  • overlay on desktop
  • studio-local-policy.service
  • audio-priority.service
  • no first-class studio-local.target

Expected runtime facts:

  • graphical session available
  • audio-priority policy active
  • AI suppressed or inactive
  • heavy background pressure reduced

compute

Intent:

  • headless throughput mode for AI/platform work

Expected runtime facts:

  • graphical session absent or non-authoritative
  • compute target active
  • vLLM active when enabled
  • AI resources assigned according to configured compute GPU profile
  • k3s remains active with mode-appropriate platform budget

Supported V1 Transitions

desktop -> studio-local
studio-local -> desktop
desktop -> compute
compute -> desktop

studio-local -> compute should route through desktop policy unless a future transition contract explicitly allows direct promotion.

desktop -> studio-local

Actions:

  1. validate studio capability is local
  2. stop vLLM if active
  3. verify or isolate desktop.target
  4. start studio-local-policy.service
  5. start audio-priority.service
  6. re-observe

Success predicates:

  • observer reports studio-local
  • graphical session is available
  • studio policy marker is active
  • audio-priority overlay is active
  • vLLM is inactive

studio-local -> desktop

Actions:

  1. stop audio-priority.service
  2. stop studio-local-policy.service
  3. isolate or verify desktop.target
  4. re-observe

Success predicates:

  • observer reports desktop
  • studio policy marker is inactive
  • audio-priority overlay is inactive
  • graphical session remains available

desktop -> compute

Actions:

  1. observe source state
  2. validate local compute capability
  3. check audio idle
  4. check graphical session is terminable
  5. notify or terminate graphical session when configured
  6. wait for session exit
  7. check GPU display release predicate
  8. stop studio-local overlay services if active
  9. isolate compute.target
  10. start or verify vllm.service
  11. re-observe

Success predicates:

  • observer reports compute
  • graphical session is absent or non-authoritative
  • compute target is active
  • vLLM is active when enabled
  • GPU ownership evidence satisfies compute profile

Acceptable degraded compute examples:

  • vLLM active on a reduced GPU profile while meeting minimum compute policy
  • non-critical desktop service remains but does not conflict with compute
  • residual display allocation is below configured threshold

Failed transition examples:

  • source cannot be classified
  • audio guard blocks transition
  • graphical session cannot terminate
  • GPU release predicate returns execution error or unsafe conflict
  • compute target starts but observer remains conflicted

compute -> desktop

Actions:

  1. observe source state
  2. check vLLM drainability
  3. stop vllm.service
  4. isolate desktop.target
  5. start or verify graphical/session path
  6. re-observe

Success predicates:

  • observer reports desktop
  • vLLM is inactive
  • graphical session is available
  • no compute-only conflict remains

Rollback must be validated through the same post-action observation rules.

ConfigCTL Home Layering Implementation Plan

Purpose

configctl is a generic home-configuration reconciliation CLI.

Dubnium may package and invoke it, but the CLI must not be Dubnium-specific. It should be usable on:

  • Dubnium bare metal
  • laptops
  • WSL
  • future NixOS machines
  • CI dry-run environments

Dubnium remains responsible for machine policy, runtime modes, services, and local AI infrastructure. configctl owns layered home configuration reconciliation.

Core Model

Per-tool home configuration is organized into ownership layers:

~/.config/<tool>/
├── managed.*      # generated by Home Manager/dotfiles; never edit directly
├── local.*        # machine-specific; never automatically promoted
├── custom.d/      # user-authored promotion candidates
└── adopted.d/     # fragments already promoted or represented by managed config

Ownership rules:

managed.*    -> governed source of truth
local.*      -> machine-specific, ignored by promotion
custom.d/*   -> promotion candidates
adopted.d/*  -> archived/adopted fragments, ignored during normal load

Initial CLI Surface

Implemented commands:

configctl status [tool]
configctl doctor
configctl init <tool>
configctl promote <tool> <fragment>
configctl reconcile [tool]

Phase 0 — Documentation and Skeleton

Status: complete.

Tasks:

  • document the per-tool layering contract
  • add configctl package scaffold
  • add initial configctl script
  • expose configctl from the Dubnium flake packages
  • install configctl on the workstation target

Phase 1 — Local Layer Initialization

Goal: safe scaffolding of layer directories.

Status: complete.

Commands:

configctl init hypr
configctl init git
configctl init nvim
configctl init zsh

Behavior:

  • create custom.d/
  • create adopted.d/
  • create the tool-appropriate local.* file
  • do not overwrite existing files
  • do not modify managed files

Phase 2 — Status and Doctor

Goal: inspect local layer state without mutating anything.

Status: complete.

configctl status [tool] reports:

  • local layer presence
  • custom fragment count
  • adopted fragment count
  • missing expected directories
  • unpromoted files in custom.d/

configctl doctor reports:

  • whether essential tools (git, find) are available
  • whether the dotfiles repo is found
  • whether XDG state/cache/data roots exist

Phase 3 — Promote

Goal: move local configuration fragments into the dotfiles repository.

Status: complete.

configctl promote <tool> <fragment>:

  • identifies the fragment in custom.d/
  • copies it to the equivalent path in external/dotfiles/files/home/
  • stages the file in the dotfiles git repository

Promotion remains review-gated via Git (operator must commit and push).

Phase 4 — Reconcile

Goal: detect drift between local overlays and the dotfiles repository.

Status: initial version complete.

configctl reconcile [tool]:

  • compares custom.d/ locally with the dotfiles repository
  • reports files present in dotfiles but missing locally (suggesting a sync or adoption)

Future Phases

  • Adoption Manifest: track promoted fragments by hash across machines.
  • Governance Integration: link promotion to review workflows.
  • Cleanup: automated garbage collection of adopted fragments.

Non-Goals

configctl should not:

  • replace Home Manager
  • replace Git
  • replace NixOS modules
  • become Dubnium-specific
  • silently promote local configuration
  • automatically delete user-authored fragments without an adopted/archive path
  • treat runtime state as governed configuration

Diagrams

Status: living

These diagrams use a C4-inspired structure plus state/runtime views.

System Context

flowchart LR
    Operator[Local operator]
    Host[Dubnium NixOS host]
    GPUs[Display and compute GPUs]
    Audio[Audio interface]
    Studio[Optional external studio host]
    Micrantha[Micrantha / k3s workloads]
    Models[Local model bundles / runtime data]

    Operator -->|mode request/status| Host
    Host --> GPUs
    Host --> Audio
    Host -->|future placement| Studio
    Host --> Micrantha
    Host --> Models

Container View

flowchart TD
    CLI[mode CLI]
    Controller[mode-controller]
    Observer[observe-current]
    Guards[guard scripts]
    Systemd["systemd targets/services/slices"]
    Workloads["Hyprland, audio, vLLM, k3s"]
    Runtime["/run/mode-controller"]
    Audit["/var/lib/mode-controller"]

    CLI --> Runtime
    CLI --> Controller
    Controller --> Observer
    Controller --> Guards
    Controller --> Systemd
    Systemd --> Workloads
    Observer --> Systemd
    Observer --> Runtime
    Controller --> Runtime
    Controller --> Audit

Mode State View

stateDiagram-v2
    [*] --> bootstrapping
    bootstrapping --> desktop: boot default

    desktop --> studioLocal: request studio-local
    studioLocal --> desktop: request desktop

    desktop --> transitioning: request compute
    compute --> transitioning: request desktop

    transitioning --> desktop: observed desktop
    transitioning --> compute: observed compute
    transitioning --> studioLocal: observed studio-local
    transitioning --> degradedDesktop: partial desktop
    transitioning --> degradedCompute: partial compute
    transitioning --> failedTransition: unsafe/conflicted

    degradedDesktop --> desktop: reconcile
    degradedCompute --> compute: reconcile
    failedTransition --> desktop: rollback succeeds

Reconciliation Sequence

sequenceDiagram
    participant U as Operator
    participant C as mode CLI
    participant R as Reconciler
    participant O as Observer
    participant G as Guards
    participant S as systemd

    U->>C: mode request compute
    C->>R: start mode-controller@compute
    R->>R: acquire lock
    R->>O: observe current
    O-->>R: desktop with evidence
    R->>G: run transition guards
    G-->>R: pass/block/error results
    R->>S: terminate session / isolate target / start services
    R->>O: re-observe
    O-->>R: compute or degraded/failed
    R-->>C: transition result
    C-->>U: status

Rolling Implementation Design

Status: living draft

This file captures the current implementation design for Dubnium as a rolling reference. It should be updated as hardware facts, control-plane contracts, and mode-transition behavior are validated on the real host.

Documentation framework:

  • architecture docs live under docs/architecture/
  • accepted decisions live under docs/decisions/
  • operator procedures live under docs/runbooks/
  • this file remains the rolling synthesis, gap register, and implementation backlog

Architecture Summary

Dubnium is a NixOS host that must behave as one physical machine with multiple operational contracts:

  • desktop: normal Hyprland workstation/dev mode. GUI and ordinary audio are active. The display GPU is protected. AI is off or tightly bounded in v1.
  • studio-local: conditional low-latency audio profile. It is a policy overlay on desktop, not the center of the architecture. If studio/audio moves to a Mac mini, the host-local state machine should still make sense.
  • compute: headless throughput mode. GUI is absent or non-authoritative. vLLM and platform workloads may use more of the machine, including both GPUs when present.

The key design rule is that desired state and current state are different things:

  • Desired state is operator or automation intent, written under /run/mode-controller.
  • Current state is observation-derived from runtime facts, not copied from desired state.
  • A reconciler moves the system toward desired state through guarded transitions.
  • systemd targets, services, and slices are the enforcement layer.
  • Transitions must be bounded, logged, idempotent, and able to report blocked, degraded, or failed outcomes explicitly.

The normative source is the Dubnium control-plane specification. Desired state is authoritative intent, current state is observer output, no transition runs without a lock, and success requires post-action re-observation. The local docs and current repo scaffold already align with the main direction: runtime switching first, no specialisations yet, desktop.target and compute.target as first-class targets, studio-local as a desktop overlay, vLLM compute-only in v1, and k3s stable across modes.

Gaps / Risks

The goal is to keep this section operational. Items should either be resolved for v1, converted into implementation work, or left as explicit open questions with an owner before the first live build.

Contradictions to Resolve

Resolved for v1:

TopicDecisionFollow-up
studio-local.target vs overlayDo not create a first-class studio-local.target in v1. Use studio-local-policy.service and audio-priority.service as a desktop overlay.Update older checklist wording when touching that file.
Root-on-RAM / impermanenceDefer Root-on-RAM, /persist, Home Manager, sops-nix, and impermanence until the base bootable control loop works.Keep persistent path design compatible with adding /persist later.
modectl vs modeKeep the local command name mode.Treat modectl in upstream notes as an older name unless a rename is explicitly requested.
Desktop AI vs compute-only vLLMKeep vLLM compute-only in v1.Revisit bounded desktop AI only after reliable desktop <-> compute transitions.
Maintenance modeDo not implement maintenance mode in the first milestone.Reserve state names and avoid enum designs that make maintenance hard to add later.

Open compatibility item:

  • Desired/current state format remains plain text in the current scaffold. This is acceptable for the first bootable milestone only if transition records carry structured metadata. The next hardening pass should move toward desired.json and current.json, or explicitly document why the plain-text files remain the stable interface.

Missing Decisions

Resolved for v1:

DecisionV1 stance
Authority modelRequire privileged transition execution. The initial operator path is sudo mode request <mode> or root-owned mode-controller@.service. Unprivileged users must not be able to forge desired/current state or transition success.
Reboot policyBoot normalizes to desktop. Do not replay last desired mode across reboot in v1.
vLLM service shapeUse one vllm.service, compute-only. Keep the controller and options shaped so vllm@compute.service can replace it later.
k3s lifecycleKeep k3s.service stable across modes in v1. Express mode pressure through platform.slice budgets before adding start/stop behavior.

Still open before live compute testing:

Open itemConcrete next step
GPU release predicateDefine a target-host predicate using loginctl, compositor absence, nvidia-smi process evidence, and an acceptable residual VRAM threshold. Record both pass and indeterminate outcomes.
Degraded thresholdsDefine degraded-compute as safe but incomplete compute operation, such as vLLM active on a reduced GPU profile or residual non-critical display allocation below the configured threshold. Define failed-transition for unsafe, conflicting, or unclassified post-action states.
Persistent audit locationChoose /var/lib/mode-controller/events.jsonl now, with an option to move it under /persist/var/lib/mode-controller/events.jsonl when impermanence lands.
k3s compute policyDecide whether v1 only changes platform.slice weights or also applies k3s labels/taints for workload intensity. Do not do both until there is a real workload that needs it.

Risky Assumptions

RiskFailure modeMitigation
NVIDIA/Wayland GPU release is stickyCompute promotion terminates the GUI but leaves display GPU allocations or ambiguous CUDA/display ownership.Treat GPU release as an observation predicate, not an assumption. Add bounded timeout, residual threshold, and escalation criteria for specialization/reboot-mediated compute.
systemctl isolate compute.target stops too muchImportant baseline services disappear because target dependencies are incomplete.Keep compute.target minimal and explicitly list required base services. Test with systemctl list-dependencies compute.target before live switching.
Shell observer misclassifies mixed statesStatus reports compute while GUI, audio, or conflicting services are still active.Prefer unknown, transitioning, degraded-*, or failed-transition over false success. Add JSON evidence output and snapshot tests.
Rollback does not restore a usable desktopdesktop.target starts but graphical session/audio/display remain broken.Make rollback success require post-rollback observation, not just successful systemctl commands. Record degraded desktop if partially restored.
/run loses state on rebootRecent desired/current files disappear and audit history is lost.Keep live lock/current/desired in /run; write transition history to /var/lib/mode-controller/events.jsonl before introducing impermanence.

Gap Closure Backlog

These are the smallest useful implementation/doc tasks to close the current gaps without broadening scope:

  1. Update older checklist references so studio-local is consistently described as a desktop overlay, not a v1 target.
  2. Add a short docs/control-plane-decisions.md or extend this file with a dated decision log for authority model, reboot policy, vLLM shape, and audit location.
  3. Define the exact observe-current --json schema before adding more transition logic.
  4. Define the GPU release predicate in docs, then implement it in check_gpu_display_released.
  5. Add persistent audit output to /var/lib/mode-controller/events.jsonl.
  6. Add observer classifications for degraded-compute, degraded-desktop, and failed-transition before relying on rollback.
  7. Keep k3s mode behavior limited to platform.slice weights until a concrete platform workload proves that labels, taints, or service restarts are needed.

Proposed Repo Structure

Use the existing scaffold and keep it simple:

.
├── flake.nix
├── hosts/
│   └── workstation/
│       ├── default.nix
│       └── hardware-configuration.nix
├── modules/
│   ├── dubnium/
│   │   ├── default.nix
│   │   ├── options.nix
│   │   ├── state.nix
│   │   ├── targets.nix
│   │   ├── slices.nix
│   │   ├── services.nix
│   │   ├── controller.nix
│   │   └── guards.nix
│   └── workloads/
│       ├── hyprland.nix
│       ├── audio.nix
│       ├── nvidia.nix
│       ├── vllm.nix
│       └── k3s.nix
├── pkgs/
│   └── mode-tools.nix
├── scripts/
│   ├── mode
│   ├── reconcile
│   ├── observe-current
│   ├── lib.sh
│   └── guards/
│       ├── check_audio_idle
│       ├── check_gpu_display_released
│       ├── check_graphical_session_terminable
│       ├── check_vllm_drainable
│       ├── check_compute_capability_local
│       ├── check_studio_capability_local
│       ├── check_memory_headroom
│       └── check_persistence_paths_ready
└── docs/

Flake Design

  • nixosConfigurations.workstation imports hosts/workstation/default.nix.
  • nixosModules.default exposes the Dubnium module.
  • packages.x86_64-linux.mode-tools packages the CLI, observer, reconciler, and guards.
  • Add home-manager, sops-nix, and impermanence later only when the base transition loop is proven.

Module Layout

  • options.nix: all host policy knobs: default mode, GPU topology, vLLM model/profile, studio placement, slice weights.
  • state.nix: creates /run/mode-controller, writes generated topology and placement files, initializes boot default.
  • targets.nix: defines desktop.target and compute.target; no v1 studio-local.target.
  • slices.nix: defines interactive.slice, ai.slice, platform.slice.
  • services.nix: marker/policy services like studio-local-policy.service, audio-priority.service, mode-observe.service.
  • controller.nix: mode-controller@.service, boot normalization unit, permissions.
  • guards.nix: installs guard scripts and documents exit-code contract.
  • workloads/*.nix: workload-specific units, not mode policy.

systemd Targets and Dependencies

desktop.target
  Wants=graphical.target
  After=graphical.target

compute.target
  Conflicts=graphical.target desktop.target
  Wants=vllm.service
  After=multi-user.target network-online.target

For studio-local, use:

studio-local-policy.service
  Type=oneshot
  RemainAfterExit=true
  Slice=interactive.slice

audio-priority.service
  Type=oneshot
  RemainAfterExit=true
  ExecStart=systemctl set-property --runtime ...
  ExecStop=reset slice weights

Slice Structure

  • interactive.slice: Hyprland/session-adjacent services, audio priority policy, desktop-critical work.
  • ai.slice: vLLM and future AI workloads.
  • platform.slice: k3s and platform/background services.
  • Optional later: maintenance.slice if maintenance mode becomes real.

Service Layout

  • vllm.service: compute-only in v1, Slice=ai.slice, WantedBy=compute.target, persistent model/cache path outside the Nix store.
  • k3s.service: stable across modes in v1, Slice=platform.slice; mode differences are resource budgets/policy, not start/stop.
  • Hyprland/display stack: owned by normal graphical/session machinery; desktop.target should depend on it but not become a giant desktop controller.
  • Audio/PipeWire: normal desktop user services; studio-local only applies priority policy and blocks compute promotion when active audio is detected.

Control Plane Shape

Mode CLI

mode status
mode request <desktop|studio-local|compute>
mode reconcile [--target <mode>]
mode current [--refresh] [--json]
mode desired
mode dry-run <mode>
mode explain [<mode>]

Recommended additions after the first scaffold:

mode guards <target>
mode history
mode last-transition
mode doctor

mode request should be synchronous in v1: return success only after post-transition observation satisfies the target. Otherwise it should return non-zero and show the failed or blocking reason.

Observer / Classifier

The observer should be conservative and evidence-first. It should inspect:

  • active graphical sessions via loginctl
  • compositor/display-manager state
  • compute.target and vllm.service
  • studio-local-policy.service
  • PipeWire/JACK/REAPER indicators
  • NVIDIA process/VRAM evidence where available
  • controller lock/transition marker
  • last failed transition marker

Output should support plain mode for scripts and JSON for status/debug:

{
  "observed_state": "desktop",
  "confidence": "high",
  "degraded": false,
  "signals": {
    "graphical_session_active": true,
    "compute_target_active": false,
    "vllm_active": false,
    "studio_policy_active": false
  },
  "conflicts": [],
  "timestamp": "..."
}

Classification rule: if signals conflict, report transitioning, degraded-*, or failed-transition; do not pretend the desired target was reached.

Guard Layout

  • Guards are standalone scripts or subcommands.
  • Exit codes:
    • 0: pass
    • 10-19: policy block
    • 20+: execution/check error
  • Each guard emits structured JSON or stable key/value output.
  • Guards should check one thing each.

Initial guard set:

  • check_audio_idle: REAPER/PipeWire/JACK activity blocks compute.
  • check_graphical_session_terminable: pre-action check before killing GUI.
  • check_gpu_display_released: post-action validation after GUI teardown.
  • check_vllm_drainable: compute -> desktop.
  • check_compute_capability_local: placement check.
  • check_studio_capability_local: blocks studio-local if externalized.
  • check_memory_headroom: avoids launching compute under obvious pressure.
  • check_persistence_paths_ready: model store/runtime paths exist and are writable.

First Milestone

The smallest bootable milestone should be narrower than “all modes implemented.”

Goal: boot the flake-managed workstation into desktop, expose the control plane, and prove an observable/auditable desktop baseline before deep workload switching.

  1. Generate real hardware config into hosts/workstation/hardware-configuration.nix.

  2. Confirm host options:

    • dubnium.boot.defaultMode = "desktop"
    • dubnium.hardware.presentGpus
    • dubnium.hardware.displayGpu
    • dubnium.hardware.computeGpus
    • vLLM disabled or compute-only
    • studio placement set to local only if local audio is still intended
  3. Build without switching:

    sudo nixos-rebuild build --flake .#workstation
    
  4. Switch only after evaluation succeeds:

    sudo nixos-rebuild switch --flake .#workstation
    
  5. Verify boot/control-plane files:

    mode status
    mode current
    mode desired
    sudo ls -la /run/mode-controller
    
  6. Verify systemd skeleton:

    systemctl status desktop.target
    systemctl status compute.target
    systemctl status studio-local-policy.service
    systemctl status audio-priority.service
    systemctl status vllm.service
    
  7. Prove observer honesty:

    • In desktop, mode current should say desktop.
    • vllm.service should be inactive.
    • studio-local-policy.service should be inactive unless requested.
    • If evidence conflicts, status should show conflict/degraded/failed rather than silently reporting success.
  8. Test the safe overlay first:

    sudo mode request studio-local
    mode status
    sudo mode request desktop
    mode status
    
  9. Only then test desktop -> compute with vLLM either disabled, stubbed, or known-good:

    sudo mode request compute
    mode status
    sudo mode request desktop
    mode status
    
  10. Milestone success criteria:

    • The machine boots from the flake.
    • mode status/current/desired work.
    • Desired/current separation is visible.
    • The controller lock prevents concurrent transitions.
    • Guard failures are reported distinctly from execution errors.
    • desktop -> studio-local -> desktop works as an overlay.
    • desktop -> compute -> desktop either works or fails with a clear guard/action/post-observation reason.
    • No failed transition is reported as a successful target mode.

The next milestone after that should be a real desktop <-> compute control loop with vLLM active, structured audit records, rollback to desktop, and explicit degraded-compute thresholds.

System Implementation Plan

Status: living plan

This plan is for implementing Dubnium on the actual workstation host. It expands the short bring-up checklist into a cautious, evidence-driven rollout. The goal is not to turn everything on at once. The goal is to prove one layer at a time: hardware facts, Nix evaluation, boot baseline, observer honesty, overlay mode, compute mode, rollback, then hardening.

Current V1 Assumptions

These assumptions come from the current repo configuration and should be confirmed before the first live switch:

AreaCurrent assumption
Host flake target.#workstation
Hostnamedubnium-workstation
Boot defaultdesktop
Studio placementlocal
studio-local representationdesktop overlay using studio-local-policy.service and audio-priority.service
vLLM lifecyclecompute-only in v1
vLLM modelQwen/Qwen2.5-Coder-14B-Instruct
Current GPU phaseplanned 2 GPUs, currently present [ 0 ]
Display GPU0
Compute GPUs[ 0 ] until second GPU is present
k3sdisabled in current host config
Bootloadersystemd-boot with EFI variable access
Runtime state/run/mode-controller

Do not proceed to live transition testing until the hardware facts are confirmed against the actual host.

Phase 0: Safety and Ground Truth

Objective: know enough about the machine to avoid destructive or confusing changes.

0.1 Confirm Installation Path

Decide which path applies:

  • existing NixOS machine: use nixos-rebuild build then switch
  • fresh install from live USB: use the fresh install runbook first
  • non-NixOS current OS: do not use this plan directly until disk/install strategy is decided

Exit criteria:

  • install path is explicit
  • target disk and boot mode are known if fresh installing
  • rollback access path is known

0.2 Confirm Remote/Recovery Access

Before switching system configuration:

ip addr
systemctl status sshd

Confirm:

  • local keyboard/display access works
  • SSH is enabled or a local console is available
  • you know how to select an older NixOS generation at boot
  • important local data is backed up

Failure mode to avoid:

  • switching into a broken graphical/session state with no recovery path

0.3 Capture Hardware Facts

Run on the target host:

lspci -nn | grep -E 'VGA|3D|Audio|USB'
nvidia-smi
lsblk -f
findmnt
bootctl status

Record:

  • actual GPU count
  • which GPU drives display
  • GPU PCI IDs
  • NVIDIA driver visibility through nvidia-smi
  • boot disk/filesystem layout
  • EFI/systemd-boot status
  • audio interface and whether REAPER/local studio is still needed on-host

Exit criteria:

  • dubnium.hardware.presentGpus matches real visible GPUs
  • dubnium.hardware.displayGpu matches the display path
  • dubnium.hardware.computeGpus only references present GPUs
  • bootloader assumptions match the host

0.4 Decide First Compute Profile

For first live validation, choose the least surprising compute profile:

  • with one GPU: compute may terminate the desktop and use GPU 0
  • with two GPUs: compute can target both GPUs, but only after single-GPU behavior is proven
  • vLLM should stay compute-only

If VRAM is tight, add vLLM guardrails before compute testing:

dubnium.vllm.extraArgs = [
  "--max-model-len" "8192"
  "--gpu-memory-utilization" "0.70"
  "--enforce-eager"
];

Do not add desktop AI in the first rollout.

0.5 Seed Local Model Bundle

Preferred path:

  • copy the selected materialized model bundle from the Dubnium USB seed into /var/lib/dubnium/models
  • keep model weights out of Git and out of the Nix store

See docs/runbooks/model-seeding.md for the exact operator flow.

Phase 1: Repo and Host Configuration Review

Objective: make the flake match the real system before any switch.

1.1 Generate Hardware Configuration

On the target NixOS machine:

sudo nixos-generate-config --dir ./hosts/workstation

Review:

  • root filesystem and boot filesystem entries
  • EFI mount point
  • generated hardware imports
  • NVIDIA-related hardware detection

Do not preserve the placeholder hardware file if it does not match the target.

1.2 Review Host Config

Inspect:

sed -n '1,220p' hosts/workstation/default.nix

Confirm or update:

  • networking.hostName
  • bootloader settings
  • services.openssh.enable
  • dubnium.capabilityPlacement.studio
  • dubnium.vllm.enable
  • dubnium.vllm.model
  • dubnium.vllm.extraArgs
  • dubnium.hardware.presentGpus
  • dubnium.hardware.displayGpu
  • dubnium.hardware.computeGpus
  • dubnium.k3s.enable

Recommended first-system stance:

  • keep boot.defaultMode = "desktop"
  • keep enableDesktopProfile = false
  • keep k3s.enable = false until mode control is proven
  • keep computeGpus = [ 0 ] if only one GPU is currently installed

1.3 Confirm Module Assertions

The module already asserts:

  • display GPU must be present
  • desktop AI GPUs must be present
  • compute GPUs must be present
  • vLLM package and model must be set when vLLM is enabled

These assertions are useful. If they fail, fix the host facts rather than bypassing them.

Exit criteria:

  • host config expresses real hardware, not planned hardware
  • planned hardware is represented only in plannedGpuCount
  • actual services enabled match the first rollout scope

Phase 2: Build Without Switching

Objective: prove Nix evaluation and build before mutating the live system.

Run:

sudo nixos-rebuild build --flake .#workstation

If it fails, classify the failure:

  • hardware config mismatch
  • unfree/NVIDIA package issue
  • vLLM package evaluation issue
  • missing module import
  • syntax or option error

Do not run switch until build succeeds.

Useful follow-up checks:

nix flake check
nix build .#packages.x86_64-linux.mode-tools

Exit criteria:

  • flake builds successfully
  • mode-tools package builds
  • no host option assertion is failing

Phase 3: First Switch to Desktop Baseline

Objective: switch only into the safe desktop-default posture.

Run:

sudo nixos-rebuild switch --flake .#workstation

Immediately check:

hostname
mode status
mode current
mode desired
sudo ls -la /run/mode-controller
systemctl status desktop.target
systemctl status compute.target
systemctl status studio-local-policy.service
systemctl status audio-priority.service
systemctl status vllm.service

Expected:

  • host boots or remains usable
  • desired mode is desktop
  • current mode is desktop, or a clearly explained non-desktop state
  • vllm.service is inactive in desktop
  • studio-local-policy.service is inactive
  • audio-priority.service is inactive
  • /run/mode-controller exists

If mode current reports compute or studio-local unexpectedly, stop and fix observation before testing transitions.

Exit criteria:

  • desktop baseline is usable
  • mode CLI works
  • observer output matches visible reality

Phase 4: Control-Plane Inspection Before Transitions

Objective: prove the controller can explain the system before it mutates the system.

Run:

mode status
mode current --refresh
mode current --json
mode explain desktop
mode explain studio-local
mode explain compute
sudo cat /run/mode-controller/capability-placement.json
sudo cat /run/mode-controller/hardware-topology.json

Check that the JSON/evidence shape is useful enough to diagnose:

  • graphical session active or not
  • studio policy active or not
  • compute target active or not
  • vLLM active or not
  • last transition status

If mode current --json is too thin, harden observer output before running compute transitions. The observer is the foundation of safe switching.

Exit criteria:

  • status output distinguishes desired and current
  • current state is derived from facts
  • hardware and placement files match host configuration

Phase 5: Test desktop -> studio-local -> desktop

Objective: prove the low-risk overlay path before terminating the GUI for compute.

Run:

sudo mode request studio-local
mode status
systemctl status studio-local-policy.service
systemctl status audio-priority.service
systemctl show interactive.slice -p CPUWeight -p IOWeight
systemctl show ai.slice -p CPUWeight -p IOWeight
systemctl show platform.slice -p CPUWeight -p IOWeight

Expected:

  • observed mode becomes studio-local
  • studio-local-policy.service is active
  • audio-priority.service is active
  • interactive slice weights are raised
  • AI/platform slice weights are lowered
  • vLLM remains inactive

Return to desktop:

sudo mode request desktop
mode status
systemctl status studio-local-policy.service
systemctl status audio-priority.service
systemctl show interactive.slice -p CPUWeight -p IOWeight
systemctl show ai.slice -p CPUWeight -p IOWeight
systemctl show platform.slice -p CPUWeight -p IOWeight

Expected:

  • observed mode becomes desktop
  • overlay services are inactive
  • slice weights return to baseline

Exit criteria:

  • overlay activation and cleanup are repeatable
  • observer accurately distinguishes desktop and studio-local
  • failure records are useful if a command fails

Phase 6: Precompute Guard Validation

Objective: test compute guards without trusting the full transition yet.

Before running a real compute transition:

mode status
systemctl status vllm.service
loginctl list-sessions

Manually confirm:

  • no active REAPER project
  • no live audio session you care about
  • no long-running foreground job
  • model store path has enough space
  • vLLM model choice fits current GPU memory plan

Run or inspect guards if exposed through the CLI. If not yet exposed, use the existing transition path cautiously and rely on last-guards.json.

Compute should be blocked when:

  • audio is active
  • graphical session is not terminable
  • memory headroom is insufficient
  • target is not reachable
  • required persistence paths are missing

Exit criteria:

  • you know which guards are hard blocks
  • guard failures are visible in last-guards.json
  • no guard silently assumes success

Phase 7: First desktop -> compute Transition

Objective: prove one real promotion into compute, accepting that the first attempt may reveal NVIDIA/session behavior.

Preconditions:

  • desktop baseline has already been verified
  • studio overlay path has already been verified
  • no critical local work is running
  • local console or SSH recovery is available

Run:

sudo mode request compute

Then inspect:

mode status
systemctl status compute.target
systemctl status vllm.service
loginctl list-sessions
nvidia-smi
sudo cat /run/mode-controller/last-transition.json
sudo cat /run/mode-controller/last-guards.json
journalctl -u 'mode-controller@*' -b
journalctl -u vllm.service -b

Expected success:

  • observed mode is compute
  • graphical session is absent or non-authoritative
  • compute.target is active
  • vllm.service is active if enabled
  • GPU process evidence matches compute expectations
  • transition record says success

Acceptable first degraded outcomes:

  • vLLM starts but only on reduced GPU profile
  • residual display allocation remains below a documented threshold
  • non-critical desktop unit remains active without resource conflict

Hard failures:

  • observer cannot classify final state
  • audio or GUI conflict remains
  • GPU release is indeterminate
  • vLLM fails repeatedly and prevents compute contract
  • rollback cannot restore desktop

If the transition fails, do not keep retrying blindly. Diagnose the first failed predicate.

Phase 8: First compute -> desktop Return

Objective: prove rollback/restoration before treating compute as usable.

Run:

sudo mode request desktop

Then inspect:

mode status
systemctl status desktop.target
systemctl status vllm.service
loginctl list-sessions
nvidia-smi

Expected:

  • observed mode is desktop
  • vllm.service is inactive
  • graphical session path is usable
  • audio returns to ordinary desktop behavior
  • no compute-only state remains authoritative

If desktop is only partially restored, classify the result as degraded and fix the observer/controller before more compute testing.

Exit criteria:

  • one complete desktop -> compute -> desktop loop works or fails with a clear documented reason
  • rollback is evidence-backed

Phase 9: Repeatability and Soak

Objective: distinguish a one-time success from a reliable operating model.

Repeat:

sudo mode request studio-local
sudo mode request desktop
sudo mode request compute
sudo mode request desktop

For each run, record:

  • final mode status
  • transition duration
  • guard output
  • whether GPU release was clean
  • whether desktop restoration was clean
  • whether vLLM startup was reliable

Minimum repeatability bar before broader usage:

  • 3 clean studio overlay round trips
  • 3 clean compute round trips
  • no false-success observer classifications
  • no unexplained stale locks
  • no manual cleanup needed between runs

Phase 10: Hardening Backlog

Only after the first transition loop is proven, prioritize hardening in this order:

  1. Richer observe-current --json evidence and conflicts.
  2. Persistent audit log at /var/lib/mode-controller/events.jsonl.
  3. Explicit GPU release predicate and thresholds.
  4. Degraded state classification for desktop and compute.
  5. Guard CLI surface such as mode guards <target>.
  6. vLLM runtime guardrails and model store persistence.
  7. k3s enablement and platform.slice policy.
  8. Optional impermanence and /persist mapping.
  9. Bounded desktop AI after second GPU and stable transitions.
  10. Specialisation evaluation only if runtime switching fails repeatedly.

Stop Conditions

Stop implementation and return to planning if any of these occur:

  • the observer reports false success
  • desktop cannot be restored through the controller
  • GPU release is repeatedly indeterminate
  • target isolation stops recovery-critical services
  • vLLM causes repeated OOM or driver instability
  • failures require undocumented manual cleanup

The correct response to any stop condition is not more automation. First improve observation, logs, predicates, and rollback.

Evidence to Keep

For each major milestone, keep the following:

mode status
mode current --json
sudo cat /run/mode-controller/last-transition.json
sudo cat /run/mode-controller/last-guards.json
systemctl status desktop.target compute.target vllm.service
nvidia-smi
journalctl -u 'mode-controller@*' -b

For repeated failures, copy the relevant evidence into an issue, planning note, or future runbook update before changing more code.

External Sources

Dotfiles

Dubnium uses the external/dotfiles checkout for user-level Home Manager configuration.

The submodule contract is declared in .gitmodules.

ADR-0001: Runtime Switching First

Status: accepted

Context

Dubnium needs to move between interactive desktop behavior and headless compute behavior. NixOS specialisations may eventually provide stronger separation, but they require reboot-mediated workflows and would slow down early validation.

Decision

Use runtime switching first. Implement mode changes through a local reconciliation loop using systemd targets, services, slices, guards, and post-action observation.

Do not introduce NixOS specialisations in v1.

Consequences

  • Rebootless switching can be validated early.
  • The observer and guard layer must be conservative.
  • GPU release reliability becomes a live risk.
  • Specialisations remain an escalation path if runtime switching proves too brittle.

Escalation Criteria

Reconsider specialisations or reboot-mediated compute if:

  • display GPU release remains unreliable after bounded iteration
  • compute promotion frequently lands in degraded or ambiguous states
  • desktop restoration is unreliable
  • kernel/module settings diverge materially between modes

ADR-0002: Studio-Local Is a Desktop Overlay

Status: accepted

Context

The host may support local low-latency audio work, but studio capability may move to an external Mac mini or another host. The architecture must not overfit around local studio behavior.

Decision

Represent studio-local as a policy overlay on desktop in v1.

Use:

  • studio-local-policy.service
  • audio-priority.service

Do not create a first-class studio-local.target in v1.

Consequences

  • The host-local state model remains coherent if studio capability moves away.
  • Studio policy can be applied and removed without a separate top-level target.
  • The observer still reports studio-local as a mode when overlay predicates are satisfied.
  • Any direct studio-local -> compute path should be routed through desktop policy unless a future transition contract explicitly permits it.

ADR-0003: vLLM Is Compute-Only in V1

Status: accepted

Context

Desktop-mode AI is possible in the target architecture, especially when a second GPU is installed. For the first reliable control-loop milestone, desktop AI adds resource contention and observer complexity.

Decision

Keep vLLM compute-only in v1.

Use one vllm.service attached to compute behavior. Shape options and controller actions so vllm@compute.service and a future bounded desktop profile can be added later.

Consequences

  • desktop and studio-local should leave vLLM inactive.
  • compute owns vLLM activation.
  • The first milestone can focus on mode transitions and observation.
  • Bounded desktop AI is deferred until desktop <-> compute switching is reliable on real hardware.

ADR-0004: Boot Defaults to Desktop

Status: accepted

Context

The control-plane specification asks whether the system should replay the last desired mode after reboot or normalize to a safe default. Replaying compute after reboot could surprise the operator and re-enter a throughput posture without current evidence.

Decision

In v1, boot normalizes to desktop.

Do not replay the last desired mode across reboot.

Consequences

  • First boot behavior is predictable and operator-friendly.
  • /run/mode-controller can remain ephemeral for live state.
  • Persistent desired replay can be revisited after transition behavior and audit history are proven.

ADR-0005: k3s Stays Stable Across Modes in V1

Status: accepted

Context

k3s provides platform/control-node duties. Starting and stopping it during every mode transition would add operational churn before there is evidence that it is needed.

Decision

Keep k3s.service stable across desktop, studio-local, and compute in v1.

Express mode differences through platform.slice budgets first. Defer labels, taints, workload intensity policies, and service lifecycle changes until a real platform workload requires them.

Consequences

  • Mode switching has fewer moving parts.
  • k3s remains available during desktop and compute operation.
  • Platform pressure must be bounded through slice policy until richer k3s mode behavior is justified.

ADR-0006: Tailscale Platform Connectivity

Status: accepted

Context

Dubnium needs stable remote reachability for the workstation without moving user-level shell, editor, or agent configuration into the system repository. Tailscale is machine and network identity, so it belongs with Dubnium’s platform policy rather than dotfiles.

Tailscale can also provide subnet routing, exit-node behavior, automatic enrollment, and Tailscale SSH. Those features change routing, firewalling, access control, and trust boundaries, so they should not be enabled as an incidental side effect of installing the client daemon.

Decision

Enable Tailscale as workstation-only platform connectivity in v1.

Dubnium will enable tailscaled and the tailscale CLI on the workstation, but node enrollment remains manual with sudo tailscale up.

Do not enable auth-key or OAuth enrollment, subnet routing, exit-node behavior, or Tailscale SSH in v1. Document those as future options that require explicit routing, ACL, firewall, and secrets-policy review.

Consequences

  • The workstation can join the tailnet with a small, reviewable system change.
  • Dotfiles remains responsible for user-level tooling only.
  • First enrollment is an operator action instead of a rebuild side effect.
  • Future subnet router, exit-node, and Tailscale SSH support has a documented path without widening v1 network exposure.

ADR-0007: WSL Is a Headless Validation Target

Status: accepted

Context

Dubnium needs a fast way to validate shared flake composition, module wiring, and mode-controller behavior before every change has to run on the bare-metal workstation target.

WSL is useful for that loop, but it is not equivalent to the real workstation. It does not validate EFI, bootloader behavior, workstation hardware generation, Hyprland, audio/studio behavior, NVIDIA runtime details, or final GPU topology.

The upstream nix-community/NixOS-WSL project already owns the WSL-specific boot and integration layer. Reimplementing that locally would create another platform surface for Dubnium to maintain before there is evidence that it is needed.

Decision

Keep wsl as a first-class flake host target for headless validation, built on top of nix-community/NixOS-WSL.

Use .#wsl to validate shared Dubnium composition and headless services inside an existing NixOS-WSL distro. Set its default Dubnium mode to compute, enable the shared mode controller, and keep resource-heavy services such as vllm and k3s disabled by default. Enable those services intentionally when the task is specifically to validate their WSL runtime behavior.

Do not treat .#wsl as a replacement for .#workstation, the bare-metal install path, or workstation hardware validation.

Consequences

  • Shared module wiring and activation can be exercised from a faster Windows/WSL loop.
  • WSL-specific platform support stays delegated to the upstream NixOS-WSL module.
  • The WSL target remains intentionally headless, compute-biased, and lightweight.
  • Passing WSL validation does not prove workstation graphics, audio, bootloader, EFI, NVIDIA, or final GPU behavior.
  • WSL runbooks and checks must stay separate from bare-metal first-bring-up and fresh-install procedures.

Escalation Criteria

Reconsider the WSL target shape if:

  • upstream NixOS-WSL no longer supports the required system integration points
  • WSL behavior diverges enough from Dubnium’s shared module graph to make the target misleading
  • bare-metal validation becomes cheap and reliable enough that a separate WSL target no longer reduces risk or cycle time

ADR-0008: Seed Local vLLM Model Bundles

Status: accepted

Context

Dubnium’s first compute workload uses vLLM with a locally served model bundle. The exact model is host configuration, not part of the USB seed format.

Model weights are large mutable runtime artifacts. Keeping them in Git would inflate the repository and blur source policy with runtime state. Keeping them in the Nix store would make first install, rebuild, and recovery depend on large model fetches during system activation and would couple model bytes to immutable system generations.

Fresh install and recovery should work even when the machine does not yet have reliable network access. The seed format should not depend on Hugging Face hub cache internals such as refs, blobs, snapshots, or symlinks.

Decision

Keep model weights out of Git and out of the Nix store.

Treat /var/lib/dubnium/models as the Dubnium-owned runtime model store. Seed normal local model bundle directories from removable media as the preferred v1 provisioning path.

Use a materialized bundle directory for the selected compute model. The workstation vLLM service serves a path under:

/var/lib/dubnium/models

If a Hugging Face cache is used as the source of the seed, materialize the snapshot once before putting it on the USB. The runtime seed and installed model store should be ordinary directories with model files and SHA256SUMS.

Consequences

  • The Dubnium repository stays small and source-only.
  • Nix continues to own service policy and runtime configuration, not model artifact storage.
  • Fresh install and recovery can avoid depending on a large network download.
  • Runtime no longer depends on Hugging Face cache layout or symlink behavior.
  • Operators must manage the seed media and verify the local bundle before entering compute mode.
  • Reproducibility of model bytes depends on the seed contents until a specific model revision is selected and recorded.
  • vLLM startup failures may indicate an absent, incomplete, misplaced, or revision-mismatched local model bundle.

Escalation Criteria

Reconsider this policy if:

  • model revision pinning becomes mandatory for reproducible evaluation
  • a dedicated artifact mirror or cache service becomes available
  • install-time network access becomes reliable enough to remove the USB seed path
  • model storage needs to support multiple served models, quantized variants, or per-mode model selection

ADR-0009: Manage Runtime Secrets Outside Nix Source

Status: accepted

Context

Dubnium needs private material for several different lifetimes:

  • local source payloads for installing this private repository
  • host-local identities for services such as Tailscale
  • runtime tokens for workloads such as vLLM model downloads
  • user-runtime tokens for tools such as Codex and GitHub CLIs after install
  • large private or mutable artifacts such as model weights

These are not the same class of data. Treating all of them as Nix source would either leak secrets into Git, copy secret bytes into the Nix store, or make activation depend on external state that belongs to the operator.

Existing Dubnium policy already keeps the repository source-only and keeps vLLM model weights in runtime cache state. Installer bootstrap should use local source payloads, such as a git archive tarball or copied working tree, rather than GitHub credentials in the live installer.

Decision

Use sops-nix with age recipients as the preferred provider for runtime service secrets.

Commit only encrypted SOPS documents and non-secret policy. Decrypt secrets at activation into runtime paths under /run/secrets or into sops-nix generated environment files. Services consume those paths; Nix modules declare the consumer contract, not the secret value.

Keep install source bootstrap separate from runtime secrets. Install media should use a local source payload prepared before booting the target machine. Do not require GitHub credentials during install.

Allow user-runtime secrets after install. Tools such as Codex may need an OPENAI_API_KEY, and user workflows may later need a GITHUB_TOKEN. Those tokens belong to the user runtime, not installer bootstrap, and should be decrypted by Home Manager or another user-scoped secret mechanism at session or process launch time.

Keep host enrollment identities separate from ordinary workload tokens. Tailscale remains manually enrolled for v1. If unattended enrollment is added later, it must use a short-lived auth key passed once during enrollment rather than a long-lived key committed to source.

Keep model weights out of Git, out of SOPS, and out of the Nix store. The Dubnium model store under /var/lib/dubnium/models remains mutable runtime state, not secret state.

Consequences

  • The repository can contain secret wiring without containing secret values.
  • Host rebuilds can declare which services need secrets without exposing those secrets in derivations or module options.
  • Operators must manage age identities and encrypted SOPS files during bring-up.
  • Secret rotation is done by updating encrypted SOPS data and rebuilding or restarting affected services.
  • Source bootstrap, enrollment, runtime tokens, and model artifacts keep separate handling rules instead of sharing one overloaded mechanism.

Escalation Criteria

Reconsider this policy if:

  • Dubnium gains a dedicated external secret manager
  • unattended installation needs to handle many machines at once
  • secret rotation needs central audit or approval workflows
  • Kubernetes-hosted workloads become the primary secret consumers

ADR-0010: Keep Persistent Memory Separate From vLLM Runtime

Status: accepted

Context

Dubnium is evolving from a local vLLM compute node toward longer-lived conversational and agentic workflows. Those workflows need durable recall, replayability, externally observable metadata, lifecycle hooks, and scoped retrieval.

vLLM is already the inference runtime for Dubnium’s compute mode. It is built to serve tokens with batching, prefix caching, streaming, model lifecycle control, and GPU-aware scheduling. It is not the right owner for durable user memory, agent task state, retention policy, or governance metadata.

The target hardware is constrained. Dual 12GB RTX 3060 GPUs leave limited room for oversized context windows, high concurrency, and unnecessary KV-cache pressure. Treating persistent memory as “keep all context in the model” would make latency, reliability, and recovery worse.

Persistent memory also changes the security posture. Model output, retrieved documents, tool results, artifacts, and prior conversation summaries are all untrusted inputs when they cross a new session boundary. Without structured metadata and lifecycle events, a future governance layer cannot inspect, constrain, attest, or replay memory behavior.

Decision

Keep vLLM as the inference runtime only.

Build persistent memory as a separate subsystem owned by orchestration, retrieval, storage, summarization, and compaction layers. Orchestrators assemble prompts from working context, retrieved memories, task state, and artifact references before calling vLLM.

Keep the future governance layer external to the memory/runtime architecture. Dubnium memory/runtime should expose structured records, metadata, and lifecycle hooks for governance to inspect later, but vLLM, vector stores, artifact stores, and MemGPT-style runtimes should not depend directly on that future substrate.

Do not persist transformer KV state as the durable memory mechanism. KV cache state can remain an inference optimization inside vLLM, but durable memory must be replayable from stored events, summaries, artifacts, metadata, and retrieval records.

Use separate memory classes:

  • working context for current session continuity
  • episodic memory for meaningful historical interactions
  • semantic memory for normalized stable facts and conventions
  • task state for active workflows, checkpoints, and execution graphs
  • artifacts for external files, logs, generated outputs, and large payloads
  • metadata for provenance, trust hints, retention hints, sensitivity hints, and scope

The first implementation milestone should use a conservative local stack:

  • Postgres for structured memory, sessions, tasks, artifacts, and provenance
  • pgvector for local vector search
  • Redis for transient working context and queues where useful
  • a small embedding model such as bge-small or nomic-embed
  • rolling summaries instead of transcript replay
  • scoped retrieval before prompt assembly

Treat MemGPT-style self-editing memory as a later orchestration upgrade path, not the first storage substrate. The current maintained framework from that lineage is Letta; evaluate it after Dubnium has stable local memory storage, retrieval filters, redaction, provenance, and replay checks. If adopted, it should sit above the persistent memory subsystem and vLLM runtime instead of replacing Dubnium’s metadata, lifecycle hooks, or runtime-secret boundaries.

Boundaries

The inference layer owns token generation, batching, streaming, prefix caching, model startup, GPU assignment, and service health.

The memory subsystem owns storage, retrieval, summarization, embedding, compaction, artifact references, provenance records, and replay inputs.

The orchestration layer owns prompt assembly, scoped retrieval requests, tool coordination, and task workflow progression.

The future governance layer is adjacent. It may later evaluate policy, provenance, trust, retention, audit, and replay concerns by inspecting the structured records emitted by this layer, but it is not embedded in the vLLM runtime, memory database, vector store, artifact store, or MemGPT-style runtime.

Security Model

Assume all inputs are untrusted, including model output and retrieved memories.

Trust boundaries include:

  • user and agent prompts entering the orchestrator
  • model output entering summarization or memory extraction
  • tool output entering task state or memory storage
  • external documents entering retrieval indexes
  • retrieved memory entering prompt assembly
  • retrieval metadata controlling visibility and retention

Durable memory objects must carry enough metadata to support later policy and audit decisions:

  • source identity
  • provenance
  • validation status or validation hints
  • trust score
  • sensitivity classification
  • retention hint or TTL
  • namespace or project scope
  • agent boundary
  • replay lineage

The first milestone must emit enough structure to support mitigation of:

  • memory poisoning through confidence and validation metadata
  • persistent prompt injection through instruction classification metadata
  • cross-agent leakage through scoped namespaces and retrieval events
  • sensitive data retention through redaction markers and TTL metadata

Do not store credentials, raw secret payloads, or private tokens as memories. Secret values remain governed by the runtime-secret policy in ADR-0009.

Consequences

  • vLLM workers can stay mostly stateless and focused on low-latency inference.
  • Memory behavior can be tested, replayed, audited, and evolved without changing the inference service contract.
  • Prompt size stays bounded by retrieval and compression rather than by naive transcript replay.
  • Future governance remains possible because memory, retrieval, artifact, and runtime events are structured and externally observable.
  • Governance does not become an embedded runtime dependency.
  • More infrastructure is required before memory-backed agents are production ready.
  • Retrieval quality, memory drift, stale facts, and hallucinated recall become explicit validation targets.
  • Binary artifacts remain externalized and are referenced through metadata or on-demand multimodal inference rather than injected into prompts by default.

Escalation Criteria

Reconsider this policy if:

  • vLLM gains a production-grade durable memory interface with replayable external metadata
  • local hardware changes enough that long-context replay is cheaper than external memory retrieval
  • a dedicated Anthesis-aligned memory service becomes the primary Dubnium memory provider
  • Letta or another MemGPT-style agent framework can integrate with Dubnium’s storage, metadata, and replay contracts without becoming the source of truth
  • compliance requirements demand a concrete external governance authority, attestation system, or retention architecture

References

ADR-0010: External Ownership Boundaries

Status: accepted

Context

Dubnium is evolving into a machine orchestration and runtime policy layer for a hybrid workstation and AI-node environment.

The repository already integrates an external dotfiles source for Home Manager and user-scoped configuration. At the same time, the local k3s integration in Dubnium remains intentionally thin and partially placeholder while broader cluster automation work evolves separately.

Without an explicit ownership boundary, there is a risk that:

  • machine policy drifts into user-home concerns
  • cluster bootstrap logic becomes duplicated across repositories
  • recovery boundaries become unclear
  • operational responsibilities overlap
  • host rebuilds become fragile or non-reproducible

Decision

Dubnium adopts a layered ownership model.

Dubnium

Dubnium is the authoritative repository for:

  • machine identity
  • NixOS host composition
  • runtime mode control
  • systemd orchestration
  • hardware policy
  • GPU placement policy
  • runtime reconciliation
  • machine-scoped secrets and service contracts

Dubnium orchestrates external systems but should avoid duplicating their source of truth.

Dotfiles

ryjen/dotfiles is the authoritative repository for:

  • Home Manager configuration
  • user shell configuration
  • editor configuration
  • CLI tooling
  • user-scoped agent tooling
  • workstation UX preferences
  • user-scoped secrets materialization

Dubnium may consume dotfiles directly through flake inputs and local checkout paths.

Laboratory

hackelia-micrantha/laboratory is the intended authoritative repository for:

  • local cluster bootstrap
  • k3s deployment orchestration
  • Flux bootstrap and reconciliation
  • GitOps substrate configuration
  • cluster overlays and platform services
  • environment lifecycle workflows

Dubnium may invoke Laboratory entrypoints but should avoid embedding full cluster orchestration logic internally.

Consequences

Positive

  • cleaner recovery boundaries
  • reduced duplication
  • improved source-of-truth clarity
  • safer rebuild semantics
  • clearer operational ownership
  • easier future migration of cluster workflows

Negative

  • additional repository coordination
  • version pinning discipline becomes important
  • submodule or external checkout management complexity
  • bootstrap sequencing becomes more explicit

Current Implementation State

Current repository state:

  • dotfiles integration exists today
  • local k3s wiring remains host-local and intentionally thin
  • Laboratory integration is planned but not yet fully wired into runtime flows

The current v1 implementation keeps k3s operationally local while explicitly preparing for externalized cluster bootstrap ownership.

Operational Rules

  • machine boot must not depend on successful Laboratory reconciliation
  • machine boot must not depend on user-home customization success
  • dotfiles failure degrades user experience, not machine orchestration
  • Laboratory failure degrades cluster capabilities, not machine orchestration
  • Dubnium remains the root machine control plane

Follow-Up Work

  • add stable Laboratory bootstrap entrypoints
  • add optional external/laboratory checkout integration
  • add bootstrap and validation scripts
  • tighten version pinning and provenance validation
  • reduce placeholder local cluster assumptions over time

Dubctl Flake Input Manager

dubctl is Dubnium’s small helper for managing top-level flake inputs. It is intended for quick add, remove, search, list, and update operations without hand-editing the common inputs = { ... }; block every time.

dubctl manages only flake inputs. It does not wire new inputs into outputs, NixOS modules, package sets, overlays, or Home Manager arguments. Make those call-site changes explicitly after adding an input.

Install and Run

From this repository:

nix run .#dubctl -- list

Install into a profile:

nix profile install .#dubctl
dubctl list --flake /path/to/dubnium

For local development without Nix packaging:

scripts/dubctl list

Commands

List current inputs:

dubctl list

Search input names and definitions:

dubctl search nix

Show one input definition:

dubctl info nixpkgs

Add an input:

dubctl install foo github:owner/repo

Add an input that follows nixpkgs:

dubctl install foo github:owner/repo --follows nixpkgs

Remove an input:

dubctl remove foo

Update all lock entries:

dubctl update

Update one lock entry:

dubctl update nixpkgs

Use a specific flake directory or file:

dubctl --flake /path/to/repo list
dubctl --flake /path/to/repo/flake.nix info nixpkgs

Lockfile Behavior

install and remove run nix flake lock after editing flake.nix. Use --no-lock when staging or testing a source-only change:

dubctl install foo github:owner/repo --no-lock
dubctl remove foo --no-lock

update runs nix flake update, with an optional input name.

Safety Model

dubctl treats command arguments as untrusted input.

Controls:

  • input names must be Nix attr-safe names
  • URLs cannot be empty and cannot contain quotes or newlines
  • edits are limited to the top-level inputs = { ... }; block
  • mutations write flake.nix.bak before changing flake.nix
  • Nix commands are invoked with argv arrays, not shell string concatenation

The backup is local operator safety only. Review the diff before committing.

When Not To Use Dubctl

Do not use dubctl for:

  • changing outputs arguments
  • adding module imports
  • adding overlays
  • changing Home Manager extra arguments
  • editing nested flakes such as external/dotfiles unless you pass that flake path explicitly

Those changes are architectural wiring, not package-manager operations.

Runbook: Post-Install Source Reconciliation

Status: living

Use this after a fresh install when the installer source snapshot has produced local changes that should become normal Dubnium repo history.

The custom installer payload is an export-style source snapshot on the USB live system. It is suitable for running nixos-install, but it does not automatically become a durable checkout inside the installed OS. Even when the snapshot is copied into the target filesystem, it is not the long-term working copy because it does not include .git history.

Desired Shape

  • installed system has a normal Git checkout for Dubnium
  • install-time changes are reviewed as a Git diff
  • host-specific files are committed only when they belong in repo policy
  • secrets, tokens, model weights, local caches, and temporary installer state stay out of Git

1. Locate Or Recreate The Install Snapshot

After first boot, start by checking whether the installer source was copied into the installed filesystem:

test -e ~/local/src/dubnium/flake.nix

If it was not copied, boot the custom installer USB or mount the prepared source media again and import the same source snapshot into a temporary location, such as ~/local/src/dubnium-install-snapshot. The goal is to recover any install-time edits, especially the generated hardware config.

If the installed system already has the copied installer source at ~/local/src/dubnium, check whether it is a Git checkout:

cd ~/local/src/dubnium
git rev-parse --is-inside-work-tree

If that fails, keep the snapshot as evidence and make room for a real checkout:

cd ~/local/src
mv dubnium dubnium-install-snapshot

If the source was copied elsewhere, use that path as the snapshot path in the commands below. If there were no install-time source edits to preserve, skip the snapshot and create the canonical checkout directly.

2. Create The Canonical Checkout

Clone the private Dubnium repo using the installed system’s normal operator credential path. Prefer SSH keys or an intentional short-lived HTTPS token; do not reuse live-installer credentials as a persistent access mechanism.

mkdir -p ~/local/src
git clone <dubnium-private-repo-url> ~/local/src/dubnium
cd ~/local/src/dubnium
git submodule update --init --recursive

If the installed machine should use a different source root, keep the same pattern: one normal Git checkout, and one preserved installer snapshot until the diff has been reconciled.

3. Bring Across Intentional Install Changes

Copy only the changes that should become repo state. The most common first install candidate is the generated hardware config:

cp ~/local/src/dubnium-install-snapshot/hosts/workstation/hardware-configuration.nix \
  hosts/workstation/hardware-configuration.nix

Review any optional host-local file before copying it. For example, hosts/workstation/user.nix may be useful on the installed machine, but it should be committed only if the repo is meant to carry that exact user policy.

For a broader comparison between the preserved snapshot and the canonical checkout:

diff -ruN \
  ~/local/src/dubnium-install-snapshot/hosts/workstation \
  ~/local/src/dubnium/hosts/workstation

Prefer copying specific files over bulk-syncing the snapshot into the checkout.

4. Review, Test, Commit, Push

From the canonical checkout:

git status --short
git diff -- hosts/workstation modules docs

nix --extra-experimental-features 'nix-command flakes' \
  eval .#nixosConfigurations.workstation.config.networking.hostName

git add hosts/workstation/hardware-configuration.nix
git commit -m "Record workstation hardware configuration"
git push

Use a broader validation command when the reconciled change touches modules, services, or shared policy. If evaluation or rebuild fails, keep the snapshot and the Git checkout separate until the failure is understood.

5. Rebuild From The Canonical Checkout

After the change is committed or intentionally kept as local-only state, rebuild from the normal checkout rather than the installer snapshot:

sudo nixos-rebuild switch --flake ~/local/src/dubnium#workstation

Once the canonical checkout has the needed changes and the system rebuilds from it, the preserved install snapshot can be archived or deleted.

Runbook: Laboratory Bootstrap

Status: living

This runbook describes the current intended integration boundary between:

  • Dubnium
  • ryjen/dotfiles
  • hackelia-micrantha/laboratory

Dubnium owns machine orchestration and runtime policy.

Laboratory is the intended source of truth for:

  • k3s bootstrap
  • Flux bootstrap
  • GitOps reconciliation
  • local cluster lifecycle operations

Current State

The current Dubnium repository still contains a thin local k3s integration for v1 bring-up.

The long-term intended direction is:

  • Dubnium owns host orchestration
  • Laboratory owns cluster orchestration

This runbook defines the current bootstrap contract without pretending the full migration is already complete.

Expected Repository Shape

Typical local source layout:

~/local/src/
├── dubnium/
│   ├── external/dotfiles/
│   └── external/laboratory/

The external/laboratory checkout may be:

  • a Git submodule
  • a manually managed checkout
  • another intentionally pinned local source path

The preferred integration ref today is:

feature/fresh

Bootstrap Flow

After the machine is operational:

  1. validate Dubnium host state
  2. validate user environment
  3. bootstrap Laboratory
  4. validate cluster state
  5. fetch kubeconfig
  6. validate Flux reconciliation

Prerequisites

Laboratory expects tooling such as:

  • tofu or terraform
  • ansible
  • kubectl
  • flux
  • jq

See the Laboratory repository for current authoritative prerequisites.

Bootstrap Command

Dubnium exposes a thin wrapper entrypoint:

scripts/bootstrap-lab

The wrapper intentionally:

  • validates the checkout exists
  • validates the repository shape looks correct
  • warns when the checkout ref differs from the preferred ref
  • delegates execution into Laboratory

The wrapper intentionally does not duplicate Laboratory internals.

Environment Overrides

Optional overrides:

export DUBNIUM_LAB_PATH=~/local/src/laboratory
export DUBNIUM_LAB_REF=feature/fresh

Override the delegated bootstrap command:

export DUBNIUM_LAB_BOOTSTRAP_CMD='make deploy ENV=local'

Default Delegated Flow

Current default delegated flow:

make deploy ENV=local && \
make local-kubeconfig ENV=local && \
make validate ENV=local

This is intentionally conservative while the integration boundary evolves.

Failure Boundaries

If Laboratory bootstrap fails:

  • Dubnium machine orchestration should still function
  • mode transitions should still function
  • user environment should still function
  • only cluster capabilities should be degraded

Machine boot must not require successful Laboratory reconciliation.

Recovery

To retry the bootstrap:

scripts/bootstrap-lab

To validate current cluster state directly through Laboratory:

cd external/laboratory
make validate ENV=local

Runtime Secrets

Dubnium uses sops-nix with age for runtime service secrets. Nix declares which services consume secrets; secret values stay out of Git, module options, and the Nix store.

Secret Classes

Use separate handling for each class:

  • Source bootstrap: prepare a local repo archive or copied working tree before install; do not require GitHub credentials in the installer.
  • Runtime service tokens: encrypt with SOPS and expose to services through /run/secrets or generated environment files.
  • User-runtime tokens: decrypt through the user profile after install for tools such as Codex, GitHub CLIs, or agent workflows.
  • Host enrollment identities: enroll interactively for v1 unless a future ADR accepts unattended enrollment.
  • Model weights: seed local model bundles into /var/lib/dubnium/models; do not store them in Git, SOPS, or the Nix store.

Host Age Identity

Create one age identity per host and keep it on that host:

sudo mkdir -p /var/lib/sops-nix
sudo age-keygen -o /var/lib/sops-nix/key.txt
sudo chmod 0600 /var/lib/sops-nix/key.txt
sudo cat /var/lib/sops-nix/key.txt | age-keygen -y

Add the printed public recipient to .sops.yaml when the first encrypted secrets file is introduced.

Host Secret File

Keep encrypted host secret files under an ignored or carefully reviewed path such as secrets/hosts/<host>.yaml. Commit encrypted files only after checking that the cleartext values are not present in the diff.

Example SOPS data shape:

service_name:
  token: example

vLLM Model Downloads

The default Dubnium install should not need a Hugging Face token. Dubnium points vLLM at local model bundle paths under /var/lib/dubnium/models, and the fresh install path seeds those bundles from USB.

Only add a model-provider token if you intentionally choose an online download workflow for a future host. In that case, prefer an environment file generated by sops-nix:

{ config, ... }:
{
  dubnium.secrets.defaultSopsFile = ../../secrets/hosts/workstation.yaml;

  sops.secrets.model-provider-token = {
    key = "model_provider/token";
  };
  sops.templates."vllm-model-provider.env".content = ''
    HF_TOKEN=${config.sops.placeholder.model-provider-token}
    HUGGINGFACE_HUB_TOKEN=${config.sops.placeholder.model-provider-token}
  '';

  dubnium.vllm.environmentFiles = [
    config.sops.templates."vllm-model-provider.env".path
  ];
}

Do not add provider tokens to the custom installer ISO or USB seed partition.

User Runtime Tokens

User tools are owned by the dotfiles Home Manager profile, not by Dubnium system services. Keep tokens such as these in the user SOPS file:

github_token: ghp_example
openai_api_key: sk-example

The dotfiles profile exposes secret file paths, for example GITHUB_TOKEN_PATH and OPENAI_API_KEY_PATH. It can also source a sops-generated shell fragment for interactive user sessions, so tools installed by the profile inherit variables such as OPENAI_API_KEY without per-tool wrappers and without putting plaintext values in Nix options.

Codex should get OPENAI_API_KEY this way. A later user workflow can use GITHUB_TOKEN the same way without changing the installer policy.

Rotation

  1. Edit the encrypted SOPS file with sops.
  2. Rebuild the target host.
  3. Restart any service that consumes the rotated secret if activation did not already restart it.
  4. Revoke the old token at the provider.

Checks

Before committing, inspect staged changes:

git diff --cached
git diff --check

Do not commit plaintext tokens, private keys, generated age identities, model weights, or local decrypted files.

Tailscale

Tailscale is workstation-only platform connectivity in v1. Dubnium enables the daemon and CLI, but enrollment is manual until secrets and OAuth policy are settled.

First Activation

Build and switch the workstation configuration:

sudo nixos-rebuild switch --flake .#workstation

Enroll the node manually:

sudo tailscale up

Follow the browser/device login flow. Do not pass --ssh, --advertise-routes, or --advertise-exit-node for v1.

Verification

Check the daemon:

systemctl status tailscaled

Check tailnet state:

tailscale status
tailscale ip -4

Regular OpenSSH can be used over the assigned tailnet IP if SSH is allowed by the host firewall and OpenSSH configuration.

vLLM Over Tailnet

Dubnium exposes vllm.service on port 8000 over the Tailscale interface only. From another tailnet machine, use the node’s Tailscale IP or MagicDNS name:

curl http://<dubnium-tailnet-name>:8000/v1/models

The local alias ai.dubnium is a host-local convenience entry on Dubnium. To use that same name from other machines, add a tailnet DNS/hosts alias that points ai.dubnium at the Dubnium node’s Tailscale IP.

Deferred Automation

Automatic enrollment should use services.tailscale.authKeyFile only after Dubnium has a settled secrets policy. The intended future shape is:

services.tailscale.authKeyFile = "/run/secrets/tailscale-auth-key";

OAuth or auth-key enrollment should be paired with explicit key scope, expiration, tagging, and rotation decisions.

Deferred Routing Options

Subnet router support would require:

  • services.tailscale.useRoutingFeatures = "server" or "both"
  • sudo tailscale up --advertise-routes=...
  • Tailscale admin approval for the advertised routes
  • firewall, forwarding, and reverse-path-filtering review

Exit-node support would require:

  • services.tailscale.useRoutingFeatures = "server" or "both"
  • sudo tailscale up --advertise-exit-node
  • Tailscale admin approval
  • stronger trust and privacy review, because the node can carry client traffic

Deferred Tailscale SSH

Tailscale SSH is not enabled in v1. If enabled later, it should be tied to a written Tailscale ACL policy and explicit operator intent.

Future manual enrollment would use:

sudo tailscale up --ssh

Future declarative enrollment could add:

services.tailscale.extraUpFlags = [ "--ssh" ];

Until that policy exists, use regular OpenSSH over the tailnet IP.

Runbook: Transition Testing

Status: living

Use this after the machine can boot the flake-managed desktop baseline.

Preflight

mode status
mode current
mode desired
systemctl status desktop.target
systemctl status compute.target
systemctl status vllm.service

The expected baseline is:

  • observed state is desktop
  • vLLM is inactive
  • no transition lock is held
  • latest transition is not failed

Test Studio Overlay

sudo mode request studio-local
mode status
systemctl status studio-local-policy.service
systemctl status audio-priority.service
sudo mode request desktop
mode status

Expected result:

  • studio-local is observed only while both overlay services are active
  • returning to desktop stops both overlay services
  • vLLM remains inactive

Test Compute Promotion

Before testing:

  • close REAPER and active low-latency audio work
  • avoid foreground long-running user jobs
  • expect the graphical session to terminate
sudo mode request compute
mode status
systemctl status compute.target
systemctl status vllm.service

Expected result:

  • observer reports compute or an explicit degraded/failed state
  • graphical session is absent or non-authoritative
  • vLLM is active if enabled
  • guard and transition records explain any block or failure

Test Desktop Return

sudo mode request desktop
mode status
systemctl status vllm.service

Expected result:

  • observer reports desktop
  • vLLM is inactive
  • graphical/session path is usable

If rollback only partially restores desktop, classify it as degraded rather than successful.

Runbook: Failed Transition Recovery

Status: living

Use this when mode status reports failed-transition, a degraded state, or a post-action observation mismatch.

Inspect State

mode status
mode current --refresh
sudo cat /run/mode-controller/last-transition.json
sudo cat /run/mode-controller/last-guards.json
journalctl -u 'mode-controller@*' -b

Classify the Failure

Common buckets:

  • guard policy block, such as active audio or unsafe user jobs
  • guard execution error, such as missing nvidia-smi
  • graphical session did not terminate
  • GPU release predicate did not pass
  • vLLM failed to start or stop
  • target isolation stopped required services
  • post-action observation remained conflicted

Recover to Desktop

If the system is not in the middle of an active transition:

sudo mode request desktop
mode status

Success requires observer confirmation, not just successful systemd commands.

If desktop recovery fails:

  • inspect journalctl -b
  • inspect display-manager/session logs
  • stop compute-only services manually only if their ownership is clear
  • consider rebooting to the v1 boot default, desktop

Record Evidence

For every failure worth keeping:

  • final mode status
  • last transition JSON
  • last guards JSON
  • relevant systemd unit status
  • whether rollback restored desktop
  • whether the failure suggests runtime switching is insufficient

Repeated GPU release or desktop restoration failures should trigger specialisation/reboot-mediated compute evaluation.

WSL Documentation Boundary

Dubnium uses WSL in two different ways, and the docs should keep those roles separate.

WSL As Build Environment

Use WSL as a convenient Linux build environment. This includes:

  • building the custom installer ISO
  • preparing the local seed-model bundle
  • running Nix commands that do not need bare-metal hardware

This role does not imply the wsl host target is being installed or validated. For the installer flow, WSL prepares artifacts and the platform writer prepares the USB unless the USB disk is deliberately exposed to WSL.

Primary docs:

WSL As Validation Target

Use the .#wsl host target only inside an existing nix-community/NixOS-WSL distro. This target validates shared Dubnium module composition and activation before touching the real workstation. It keeps resource-heavy services such as vllm and k3s disabled by default.

This role does not prove bare-metal behavior. Passing WSL validation does not prove EFI, bootloader, Hyprland, audio, or final GPU behavior for .#workstation.

Primary docs:

Boundary Rules

  • Keep bare-metal install steps in fresh-install and custom-installer docs.
  • Keep .#wsl activation and validation steps in the WSL bring-up runbook.
  • Do not use the fresh-install checklist for WSL bring-up.
  • Do not use WSL results as proof that workstation hardware configuration is correct.
  • When a command is meant to run inside WSL, label it as WSL or Bash.

Build Installer Artifacts From WSL

Status: living

Use this when the Dubnium installer ISO and seed-model bundle should be prepared from an existing WSL distro.

This is only a build workflow. It is not .#wsl host activation and does not validate the WSL target.

Boundary

  • build the ISO and prepare the seed model here
  • write the USB with the platform’s guarded writer unless the USB disk is deliberately exposed to the WSL distro

Build

Enter the Nix-capable WSL distro:

wsl -d NixOS

Inside the distro:

cd /path/to/dubnium

git status --short
git -C external/dotfiles status --short

scripts/build-installer-iso.sh \
  --iso ./dubnium-installer.iso

The script prepares the current Dubnium default seed bundle when no existing materialized bundle is detected. Use --seed-model to point at a different bundle, --no-seed-download to require an existing bundle, or --no-seed-model to build installer-only media.

Write The USB

After the ISO exists in the shared checkout, use the platform writer from the custom installer runbook. For Windows PowerShell:

.\scripts\write-installer-usb.ps1 `
  -IsoPath .\dubnium-installer.iso `
  -DiskNumber 7 `
  -ExpectedFriendlyName "USB SanDisk 3.2Gen1" `
  -SeedModelPath ..\models\selected-model-bundle

Each writer still checks the USB disk identity and requires the typed erase confirmation.

Runbook: WSL Bring-Up

Status: living

Use this when the target environment is the wsl host, running inside an existing nix-community/NixOS-WSL distro.

This is separate from the bare-metal install and first-bring-up flow because the commands, platform assumptions, and validation steps are materially different.

This runbook assumes you are already using the community WSL base:

The dubnium .#wsl target layers on top of that base. It is not a replacement for the initial NixOS-WSL installation process.

When To Use This

Use this runbook when:

  • you are already inside the NixOS WSL distro
  • you want to switch that distro to dubnium’s .#wsl target
  • you want to validate shared Dubnium wiring in WSL before touching the bare-metal workstation target

Do not use this runbook for:

  • bare-metal install
  • hosts/workstation/hardware-configuration.nix generation
  • EFI or bootloader validation
  • Hyprland or audio/studio validation

Preconditions

  • WSL is installed on Windows
  • a NixOS WSL distro based on nix-community/NixOS-WSL already exists and boots successfully
  • this repo is available inside the distro. Examples:
/mnt/c/Users/<user>/Projects/dubnium
~/src/dubnium
  • flakes are available, either through system config or explicit flags

Success Criteria

  • nixos-rebuild switch --flake .#wsl succeeds inside the WSL distro
  • git is available from the switched system generation
  • mode status, mode current, and mode desired work
  • dubnium.k3s.enable and dubnium.vllm.enable evaluate to false
  • compute.target evaluates without pulling in k3s or vllm.service
  • the runtime state directory exists at /run/mode-controller

1. Enter The NixOS WSL Distro

If you do not already have a working nix-community/NixOS-WSL distro, stop here and install that first. This runbook starts after that base is already in place.

Enter the distro:

wsl -d NixOS

Inside the distro, go to the repo:

cd /path/to/dubnium
pwd
git status --short

Use the actual checkout path for the machine. Avoid hardcoding personal paths in reusable docs or scripts.

2. Evaluate The WSL Target

If your shell does not already have flakes enabled, use explicit flags:

nix --extra-experimental-features "nix-command flakes" flake show .

Confirm the new target exists:

  • nixosConfigurations.wsl

Optional targeted checks:

nix --extra-experimental-features "nix-command flakes" eval .#nixosConfigurations.wsl.config.wsl.enable
nix --extra-experimental-features "nix-command flakes" eval .#nixosConfigurations.wsl.config.dubnium.boot.defaultMode
nix --extra-experimental-features "nix-command flakes" eval .#nixosConfigurations.wsl.config.dubnium.k3s.enable
nix --extra-experimental-features "nix-command flakes" eval .#nixosConfigurations.wsl.config.dubnium.vllm.enable
nix --extra-experimental-features "nix-command flakes" eval .#nixosConfigurations.wsl.config.systemd.targets.compute.wants

Expected:

  • wsl.enable = true
  • default mode is compute
  • dubnium.k3s.enable = false
  • dubnium.vllm.enable = false
  • compute.target has no vllm.service dependency

This confirms the dubnium host is using the upstream community WSL module, not an ad hoc local WSL implementation.

3. Switch The Running Distro To .#wsl

Use:

sudo nixos-rebuild switch --flake .#wsl

If flakes are not enabled globally in the current shell:

sudo nixos-rebuild switch --extra-experimental-features "nix-command flakes" --flake .#wsl

This is the main WSL install/activation command.

Keep config truth and live runtime truth separate. A targeted nix eval proves the flake expression, while nixos-rebuild switch and systemctl prove the running distro. If a full switch fails with an environmental WSL error, record that separately from whether the flake evaluated correctly.

4. Verify Dubnium Runtime Basics

Check mode/runtime state:

mode status
mode current
mode desired
sudo ls -la /run/mode-controller

Check that the heavy services are not part of the lightweight WSL profile:

systemctl status k3s --no-pager
systemctl status vllm --no-pager

Both units should be absent or inactive in the default WSL target. Enable them intentionally in a local override only when the task is specifically to validate their WSL runtime behavior.

Check the WSL target’s current interpretation:

  • wsl is headless
  • compute is the default desired mode
  • k3s and vllm are disabled by default to keep WSL activation lightweight
  • workstation-only graphics/audio expectations do not apply here

5. Known Differences From workstation

Important differences:

  • .#wsl assumes the distro itself was originally created with nix-community/NixOS-WSL
  • do not run nixos-generate-config --dir ./hosts/workstation for WSL testing
  • do not expect .#workstation to build cleanly until a real bare-metal hardware config has replaced the placeholder
  • do not use the fresh-install checklist for WSL bring-up
  • do not treat WSL results as proof of EFI, bootloader, Hyprland, or audio correctness

The wsl target is for:

  • flake composition
  • lightweight activation
  • shared Dubnium control-plane behavior

6. Common Failure Buckets

  • flakes not enabled in the current shell
  • repo is present but the running system has not been switched to .#wsl
  • Windows PATH injection adds noisy warnings during WSL startup
  • if repo tooling looks missing after switch, check git --version
  • Home Manager activation can fail if the active WSL login user does not match the configured Home Manager home; check whoami, getent passwd, and /etc/wsl.conf before changing modules
  • mode desired/current state is seeded but not yet reconciled automatically at boot

If .#workstation fails during WSL development, first check whether the failure comes from the placeholder hosts/workstation/hardware-configuration.nix instead of the new wsl target.

Dual-Mode NixOS Workstation / AI Node

Unified Planning + Mode State Machine Document (v0.3 — Living)


1. Purpose

Design a single NixOS system that operates as a policy-driven multi-mode host with support for future workload externalization:

  • Desktop / Dev workstation
  • Optional local Studio / Audio profile
  • Compute / Headless AI node

The broader workstation environment may also externalize selected capabilities, especially Studio/Audio, to a separate machine such as a Mac mini.

The system must support:

  • low-latency audio workloads (DAW / live)
  • GUI desktop usage via Hyprland
  • GPU inference via vLLM
  • k3s control-plane duties for Micrantha Laboratory / Hyperion
  • explicit, auditable, reproducible transitions between modes

This document defines:

  • planning assumptions
  • architectural boundaries
  • host-local mode definitions
  • capability placement model
  • invariants
  • state machine
  • guards and guard functions
  • source-of-truth model
  • reconciliation model
  • implementation mapping to systemd
  • design alternatives and tradeoffs

2. Core Principles

2.1 Modes Are Operational Contracts

A mode is not just a set of enabled services. A mode defines:

  • resource ownership
  • permitted workloads
  • latency/throughput expectations
  • security posture
  • transition preconditions

2.2 Explicit Over Implicit

Mode transitions should be:

  • explicit when possible
  • observable
  • reversible
  • logged
  • idempotent

Automation may request a transition, but the controller must decide whether it is safe.

2.3 Latency and Throughput Are Competing Objectives

  • Desktop / Studio-Local optimize for responsiveness and bounded latency
  • Compute optimizes for throughput and hardware utilization

The design must not pretend both can be maximized simultaneously.

2.4 One Physical Host, Multiple Logical Planes

This system is treated as:

one shared substrate hosting multiple logical operating modes

2.5 Declarative First, Runtime Reconciliation Second

  • NixOS declares steady-state intent and system structure
  • a mode controller reconciles runtime state toward desired operational mode

2.6 Host-Local Modes Must Survive Capability Relocation

The host-local state model should remain coherent even if some capabilities, especially Studio/Audio, move to another machine.


3. System Overview

flowchart TD
    HW[Hardware]

    subgraph BaseOS[NixOS Base Layer]
        Kernel
        Drivers[NVIDIA / CUDA]
        Network
        Storage
        Nix
        systemd
    end

    subgraph Control[Mode Control Plane]
        Desired[Desired State]
        Current[Current State]
        Reconcile[Reconciler]
        Guards[Guard Checks]
    end

    subgraph LocalModes[Host-Local Modes]
        Desktop[Desktop / Dev]
        StudioLocal[Studio-Local / Audio-Priority]
        Compute[Compute / Headless]
    end

    subgraph Placement[Capability Placement]
        StudioCap[Studio Capability]
        AICap[AI Capability]
        PlatformCap[Platform Capability]
    end

    subgraph Workloads[Workloads]
        Hyprland
        PipeWire
        Reaper
        vLLM
        k3s
    end

    HW --> BaseOS
    BaseOS --> Control
    Control --> LocalModes
    LocalModes --> Workloads
    LocalModes --> Placement

4. Mode Definitions and Capability Placement

This document distinguishes between:

  • host-local operational modes for the NixOS machine
  • capability placement for functions that may later move to another machine

4.1 Host-Local Modes

Desktop / Dev Mode

Intent

Balanced interactive mode for programming, office work, light desktop use, and bounded AI.

Properties

  • GUI enabled
  • audio enabled for ordinary desktop use
  • GPU0 reserved for display/compositor
  • GPU1 may be used by AI workloads
  • vLLM constrained to single-GPU operation or disabled
  • k3s control plane may remain active
  • CPU/RAM contention must remain bounded

Studio-Local / Audio-Priority Profile

Intent

A stricter local operating profile for low-latency audio work when Studio remains on the NixOS host.

Properties

  • modeled as a protected interactive profile closely related to Desktop
  • GUI enabled
  • audio stack prioritized
  • display GPU reserved exclusively for desktop responsibilities
  • AI workloads disabled or reduced to near-zero
  • heavy I/O and background maintenance jobs disallowed
  • scheduler and system policy biased toward stable audio behavior

Design note

This profile is considered conditional and potentially temporary. It exists so the NixOS host can support local audio/studio workflows now, without assuming that Studio remains a permanent first-class local mode forever.

Implementation note

For the first implementation pass, studio-local should be modeled as a policy overlay on desktop, not as a first-class top-level systemd target. The operational state still exists in the controller/state model, but its enactment should initially be handled by marker/helper units layered onto the desktop path.


Compute / Headless Mode

Intent

Throughput-oriented headless mode for AI serving and platform duties.

Properties

  • GUI disabled
  • audio stack off or irrelevant
  • both GPUs available to AI workloads
  • vLLM may use both GPUs
  • k3s workloads may run more aggressively
  • CPU/RAM/storage can be utilized much more aggressively than in interactive modes

4.2 Capability Placement Model

Certain capabilities may be placed either:

  • locally on the NixOS host
  • externally on another machine

Capability: Studio / Audio

Possible placements:

  • local
  • external-mac-mini

Capability: AI / Inference

Expected placement:

  • primarily local-nixos-host

Capability: Platform / k3s Control

Expected placement:

  • primarily local-nixos-host

4.3 Design Implication

The host-local state machine should remain valid even if Studio/Audio is moved to a Mac mini. That means Studio-specific policy should be represented as a local profile or conditional mode, not as the permanent center of the entire host architecture.


5. Resource Ownership Model

5.0 Implementation Note — Hardware-Tolerant Bring-Up

The architecture should continue to plan for the intended dual-GPU topology, but the NixOS implementation should remain tolerant of transitional hardware states while the second GPU is not yet installed or configured.

That means:

  • the policy model may still describe the intended two-GPU end state
  • module options should encode planned GPU ownership explicitly
  • active service profiles must only reference GPUs that are currently present
  • missing future hardware must not cause ordinary evaluation or steady-state services to fail unnecessarily

5.1 GPU Ownership

ModeGPU0GPU1
DesktopDisplay / compositorAI optional
Studio-LocalDisplay / compositor (protected)AI off or minimal
ComputeAIAI

5.2 CPU Ownership

  • Shared via cgroups/systemd slices
  • interactive slices retain priority/headroom in Desktop and Studio-Local
  • compute slices may saturate cores in Compute

5.3 Memory Ownership

  • bounded AI memory usage in Desktop
  • stricter constraints in Studio-Local
  • relaxed/high utilization in Compute

5.4 Storage Ownership

  • heavy background I/O restricted in Studio-Local
  • permitted but bounded in Desktop
  • broadly permitted in Compute

5.5 Audio Ownership

  • effectively exclusive in Studio-Local
  • protected in Desktop
  • not guaranteed in Compute

6. Invariants

These are system-level properties that must remain true regardless of transition path or future Studio placement.

6.1 Safety Invariants

  1. At most one host-local operational mode is authoritative at a time.
  2. A transition must either complete to a stable target state or abort back to a known-safe prior state.
  3. Mode transitions must be idempotent. Re-running a transition toward an already-satisfied state must not cause harm.
  4. When Studio-Local is active, heavyweight compute workloads must not materially jeopardize audio latency.
  5. Compute mode must not require a running graphical session.
  6. GPU0 must not be simultaneously treated as both protected display GPU and unrestricted compute GPU.
  7. The controller must not promote the system into Compute if guard failures indicate active user/audio risk.
  8. The system must always expose a way to determine current mode, desired mode, and last transition result.
  9. The host-local mode model must remain coherent if Studio/Audio capability is externalized to another machine.

6.2 State Invariants

  1. Desired state is authoritative intent.
  2. Current state is observed runtime fact.
  3. Reconciliation moves current state toward desired state; it never rewrites observed state to match wishful intent.
  4. A guard failure blocks transition, but does not silently change desired state unless policy explicitly says so.

6.3 Operational Invariants

  1. Models and mutable runtime data must live outside the Nix store.
  2. Dotfiles may influence user experience, not machine-critical mode policy.
  3. Mode policy must remain expressible and inspectable via systemd and Nix configuration.
  4. Capability placement decisions must not silently invalidate host-local invariants.

7. Desired State vs Current State

7.1 Desired State

The host-local mode the user or automation wants the system to be in.

Examples:

  • desktop
  • studio-local
  • compute

7.2 Current State

The host-local mode the system is actually in, as determined by observation.

Examples:

  • graphical target active, PipeWire active, vLLM limited → likely desktop
  • graphical target inactive, compute services active, both GPUs exposed to AI → likely compute
  • GUI active, audio priority raised, compute services reduced → likely studio-local

7.3 Why This Split Matters

Without this split, the system can lie to itself:

  • a command says “switch to compute”
  • but GPU is still held by compositor
  • vLLM failed to scale up
  • audio services are still active

In that case:

  • desired state = compute
  • current state = transitioning or desktop (degraded)

The control plane must detect and reconcile this rather than assuming success.


8. Source of Truth for Mode

The system needs one authoritative representation of requested host-local mode.

8.1 Options Considered

Option A — File-Based Source of Truth

Example:

  • /run/mode-controller/desired
  • /var/lib/mode-controller/desired

Pros

  • simple
  • easy to inspect
  • works outside active user session
  • easy for scripts and systemd units

Cons

  • can drift from actual runtime state
  • needs permissions and lifecycle handling

Option B — Environment Variable Source of Truth

Example:

  • MODE=compute

Pros

  • simple for one-shot commands
  • easy in shell contexts

Cons

  • poor system-wide authority
  • ephemeral
  • fragile across sessions/reboots
  • bad fit for authoritative machine state

Option C — systemd State as Source of Truth

Example:

  • compute.target active implies desired mode is compute

Pros

  • tightly aligned with implementation
  • introspectable
  • avoids duplicate state stores

Cons

  • desired state and current state can become conflated
  • harder to represent “requested but not yet achieved”
  • recovery/abort semantics become more awkward

Use a hybrid model:

  • Desired state source of truth: file in /run/mode-controller/desired
  • Current state source of truth: observed systemd/runtime facts
  • Transition machinery: systemd targets + controller service

This cleanly separates:

  • intent
  • observation
  • enforcement

8.3 Proposed Files

  • /run/mode-controller/desired
  • /run/mode-controller/current
  • /run/mode-controller/last-transition.json

current may be a cached observation, but observation should always be derivable from system state.


9. State Machine

9.1 States

S0: Boot

Initial state before default operating mode is established.

S1: Desktop

Interactive general-purpose mode.

S2: StudioLocal

Strict interactive low-latency local audio profile.

S3: Compute

Headless throughput-oriented mode.

S4: Transitioning

Ephemeral reconciliation state while moving toward desired mode.

S5: FailedTransition

A recoverable error state indicating that desired state was not achieved.

9.2 State Diagram

stateDiagram-v2
    [*] --> Boot

    Boot --> Desktop : default boot

    Desktop --> StudioLocal : request(studio-local)
    StudioLocal --> Desktop : request(desktop)

    Desktop --> Transitioning : request(compute)
    StudioLocal --> Transitioning : request(compute)
    Compute --> Transitioning : request(desktop)
    Desktop --> Transitioning : request(desktop) / reconcile
    StudioLocal --> Transitioning : request(studio-local) / reconcile
    Compute --> Transitioning : request(compute) / reconcile

    Transitioning --> Desktop : reached(desktop)
    Transitioning --> StudioLocal : reached(studio-local)
    Transitioning --> Compute : reached(compute)
    Transitioning --> FailedTransition : guard_fail / action_fail / timeout

    FailedTransition --> Desktop : recover(previous=desktop)
    FailedTransition --> StudioLocal : recover(previous=studio-local)
    FailedTransition --> Compute : recover(previous=compute)

9.3 Notes

  • Direct StudioLocal -> Compute may be allowed only through guarded reconciliation, not blind immediate promotion.
  • Reconciliation should be able to handle “already in desired mode” as a no-op success.
  • Externalized Studio capability must not require redesign of the host-local state machine; it should only disable or deprecate studio-local usage.

10. Guards

Guards are explicit check functions. They return exit codes and optionally structured diagnostics.

10.1 Guard Interface

Each guard function should follow a predictable interface:

check_<name>
exit 0   = pass
exit 10+ = policy failure / guard blocked
exit 20+ = check execution error / indeterminate

Structured output should ideally emit JSON or key=value diagnostics to stdout/stderr for logs.

10.2 Guard Set

G1: check_audio_idle

Purpose:

  • verify no active low-latency local audio session that would make compute transition unsafe

Possible checks:

  • no active REAPER process
  • no active PipeWire/JACK graph beyond baseline

Exit codes:

  • 0 pass
  • 10 audio active
  • 20 unable to inspect audio graph

G2: check_gpu_display_released

Purpose:

  • verify display/compositor has released GPU before compute promotion

Possible checks:

  • no active Hyprland session
  • no relevant graphical GPU consumers

Exit codes:

  • 0 pass
  • 11 display GPU still owned by GUI
  • 21 GPU inspection failure

G3: check_cpu_load_safe

Purpose:

  • ensure transition is not occurring during obviously unsafe heavy local activity when policy requires quieting first

Exit codes:

  • 0 pass
  • 12 CPU load too high
  • 22 unable to inspect load

G4: check_user_jobs_safe

Purpose:

  • detect known long-running interactive/user jobs that should block auto-transition

Possible checks:

  • selected process patterns
  • optional allowlist/denylist

Exit codes:

  • 0 pass
  • 13 user jobs active
  • 23 inspection failure

G5: check_memory_headroom

Purpose:

  • ensure sufficient memory exists to perform transition or launch target services

Exit codes:

  • 0 pass
  • 14 insufficient headroom
  • 24 inspection failure

G6: check_vllm_drainable

Purpose:

  • ensure compute workloads can be safely reduced when returning to Desktop/Studio-Local

Exit codes:

  • 0 pass
  • 15 compute workload not drainable
  • 25 inspection failure

G7: check_studio_capability_local

Purpose:

  • verify that local Studio capability is still available on the NixOS host before allowing studio-local

Possible checks:

  • local policy flag indicates studio capability still hosted locally
  • local audio stack and workflow prerequisites are not intentionally disabled due to externalization

Exit codes:

  • 0 pass
  • 19 requested local studio capability not available
  • 29 inspection failure

10.3 Guard Policy by Transition

TransitionRequired Guards
Desktop -> StudioLocalcheck_target_reachable, check_studio_capability_local, check_user_jobs_safe (optional policy), compute downscale checks
StudioLocal -> Desktopcheck_target_reachable
Desktop -> Computecheck_target_reachable, check_audio_idle, check_gpu_display_released, check_cpu_load_safe, check_user_jobs_safe, check_memory_headroom
StudioLocal -> Computecheck_target_reachable, check_audio_idle, check_gpu_display_released, check_cpu_load_safe, check_user_jobs_safe, check_memory_headroom
Compute -> Desktopcheck_target_reachable, check_vllm_drainable, check_memory_headroom
Compute -> StudioLocalcheck_target_reachable, check_studio_capability_local, check_vllm_drainable, check_memory_headroom

11. Actions and Transition Semantics

Actions are the concrete operations used to move from one state to another.

11.1 Action Vocabulary

  • stop/terminate GUI session
  • isolate a target
  • stop/start units
  • wait for quiescence
  • update desired/current state files
  • restart services with different environment/policies

11.2 Action Interface

Each action should return:

  • 0 success
  • non-zero failure with logged reason

12. Exact Transition Mapping to systemd Operations

This is the implementation-oriented mapping.

12.1 Assumptions

Systemd targets:

  • desktop.target
  • compute.target

studio-local is intentionally not a first-class target in v1. It is represented as a desktop overlay through studio-local-policy.service and audio-priority.service.

Supporting services:

  • mode-controller.service
  • vllm.service
  • k3s.service
  • pipewire.service / user session services
  • graphical session manager or direct Hyprland session

Helper oneshot services/scripts:

  • mode-prepare-compute.service
  • mode-prepare-desktop.service
  • mode-prepare-studio-local.service
  • mode-observe.service

12.2 Desktop -> StudioLocal

Desired change

  • desired mode file = studio-local

systemd operations

  1. systemctl start mode-controller.service (with target=studio-local)
  2. controller runs guard set for Desktop -> StudioLocal
  3. controller verifies local Studio capability still exists
  4. controller stops or constrains AI workloads as needed
    • v1 policy: systemctl stop vllm.service
  5. controller isolates or verifies desktop.target
  6. controller starts studio-local-policy.service
  7. controller starts audio-priority.service
  8. controller updates current state observation

Example exact operations

write /run/mode-controller/desired = studio-local
systemctl start mode-controller@studio-local.service
systemctl stop vllm.service
systemctl isolate desktop.target
systemctl start studio-local-policy.service
systemctl start audio-priority.service

12.3 StudioLocal -> Desktop

Desired change

  • desired mode file = desktop

systemd operations

  1. write desired state
  2. start controller
  3. restore normal interactive policies
  4. optionally allow bounded AI services
  5. stop audio-priority.service
  6. stop studio-local-policy.service
  7. systemctl isolate desktop.target
  8. update current observation

Example exact operations

write /run/mode-controller/desired = desktop
systemctl start mode-controller@desktop.service
systemctl stop audio-priority.service
systemctl stop studio-local-policy.service
systemctl isolate desktop.target

12.4 Desktop -> Compute

Desired change

  • desired mode file = compute

systemd operations

  1. write desired state
  2. start controller for compute
  3. run guards:
    • check_target_reachable
    • check_audio_idle
    • check_gpu_display_released (or prepare to release)
    • check_cpu_load_safe
    • check_user_jobs_safe
    • check_memory_headroom
  4. if interactive session exists, controller requests/forces session termination
    • loginctl terminate-session <id>
  5. wait until compositor releases GPU
  6. stop or de-prioritize audio services if needed
  7. stop desktop-specific services not wanted in compute
  8. set service environment/profile for dual-GPU vLLM
  9. systemctl isolate compute.target
  10. start/restart vllm.service
  11. verify current state

Example exact operations

write /run/mode-controller/desired = compute
systemctl start mode-controller@compute.service
loginctl terminate-session <desktop-session>
systemctl stop graphical-session.target   # if such target exists in design
systemctl isolate compute.target
systemctl restart vllm.service

12.5 Compute -> Desktop

Desired change

  • desired mode file = desktop

systemd operations

  1. write desired state
  2. start controller for desktop
  3. run guards:
    • check_target_reachable
    • check_vllm_drainable
    • check_memory_headroom
  4. drain/stop or downscale vLLM
  5. constrain compute workloads
  6. systemctl isolate desktop.target
  7. start GUI path
  8. ensure GPU0 reserved for display
  9. start/restore audio path
  10. verify current state

Example exact operations

write /run/mode-controller/desired = desktop
systemctl start mode-controller@desktop.service
systemctl stop vllm.service              # or restart single-GPU profile
systemctl isolate desktop.target

12.6 StudioLocal -> Compute

Two possible policies:

Policy A — direct guarded transition

Allowed if all compute guards pass and Studio-Local resources are cleanly relinquished.

Policy B — normalize through Desktop first

Transition path:

  • studio-local -> desktop -> compute

Recommendation: Use Policy A in implementation, but conceptually treat it as the same reconciliation pipeline with stricter guards.


13. Reconciliation Model

13.1 Motivation

A single mode request compute command should not blindly assume success. The system should:

  1. record desired mode
  2. observe current state
  3. compare desired vs current
  4. compute required transition plan
  5. execute actions
  6. re-observe
  7. either declare success or enter failed transition state

13.2 Reconciliation Loop

flowchart TD
    Req[Request mode] --> Write[Write desired state]
    Write --> Observe[Observe current state]
    Observe --> Compare{Desired == Current?}
    Compare -->|Yes| Done[No-op success]
    Compare -->|No| Plan[Select transition plan]
    Plan --> Guards[Run guards]
    Guards -->|Fail| Fail[Record failure]
    Guards -->|Pass| Act[Execute actions]
    Act --> Reobserve[Observe current state again]
    Reobserve --> Verify{Reached desired?}
    Verify -->|Yes| Success[Record success]
    Verify -->|No| RetryOrFail[Retry boundedly or fail]

13.3 Reconciliation Semantics

  • bounded retries only
  • no infinite loops
  • every failure is logged with:
    • desired state
    • prior state
    • failing guard or action
    • timestamp

13.4 Why This Matters

This lets you support:

  • manual requests
  • idle-triggered auto-switching
  • boot-time default mode
  • recovery after partial failures

all through one mechanism.


14. Specialisations vs Runtime Switching

This is the main architectural fork.

14.1 Option A — Runtime Switching Only

Use one host definition with multiple systemd targets and runtime policies.

Pros

  • fast transitions
  • no reboot required
  • best UX for switching between Desktop and Studio-Local
  • simpler for day-to-day operation

Cons

  • weaker isolation
  • harder to fully guarantee all services/resources are cleanly re-bound
  • risk of state leakage between modes
  • some kernel/driver tuning differences are awkward live

Best fit

  • Desktop <-> Studio-Local
  • Desktop <-> Compute where flexibility matters more than hard isolation

14.2 Option B — NixOS Specialisations Only

Use separate NixOS specialisations for Desktop and Compute (and possibly Studio-Local).

Pros

  • stronger isolation between role profiles
  • easier to vary deeper system settings, kernel params, service sets
  • clearer recovery story
  • closer to “logical separate machines”

Cons

  • slower transitions, often reboot-oriented in practice
  • poorer UX for frequent switching
  • more configuration duplication risk if not structured well

Best fit

  • Desktop vs Compute if you want very strong separation
  • not ideal for rapid Studio-Local toggling

14.3 Option C — Hybrid Model

Use:

  • runtime switching for Desktop <-> Studio-Local
  • specialisation boundary between Interactive and Compute families

Example:

  • default specialisation = interactive
    • runtime modes inside it: desktop, studio-local
  • compute specialisation = headless compute

Pros

  • strongest overall architecture
  • preserves good UX for Studio-Local transitions
  • lets Compute differ more deeply if needed
  • handles future externalization of Studio more cleanly than treating Studio as a permanent top-level host identity

Cons

  • more design complexity
  • transition from interactive to compute may become reboot-oriented or at least heavier
  • more machinery to maintain

14.4 Recommendation

For your current goal, use runtime switching first, with the design shaped so it can later evolve into a hybrid model.

Reasoning

  • you need to learn actual contention boundaries first
  • Desktop <-> Studio-Local benefits heavily from live switching
  • Desktop <-> Compute can start as runtime-switched
  • if the system proves too “sticky” or leaky, you can later promote Compute into a specialisation without redesigning the higher-level state machine
  • if Studio moves to a Mac mini, the host-local model remains intact

Practical recommendation

Phase the design like this:

  1. Phase 1: one host, runtime switching only
  2. Phase 2: strong slices/targets/guards
  3. Phase 3: evaluate whether Compute should become a specialisation
  4. Phase 4: if Studio is externalized, deprecate or disable studio-local without changing the operator-facing control model

This preserves velocity while keeping the abstraction clean.


15. Service Placement

15.1 Host-Level Services

  • Hyprland
  • PipeWire
  • Reaper
  • NVIDIA drivers/runtime
  • mode controller
  • possibly vLLM initially
  • SSH / system services

15.2 k3s-Level Services

  • Hyperion services
  • platform/orchestration services
  • dashboards and supporting workloads
  • possibly model-serving abstractions later

First-pass implementation note

In v1, prefer keeping k3s.service continuously available while varying:

  • platform.slice resource budgets
  • which workloads are allowed to run aggressively
  • how much local compute capacity cluster workloads may consume

This is preferable to stopping and starting the cluster runtime during ordinary mode transitions.

15.3 Externalized Services (Possible Future)

  • Studio/Audio workflows on Mac mini
  • DAW/plugin-heavy sessions
  • live audio interfaces and controllers

15.4 Recommendation

Keep hardware-near, latency-sensitive, and GPU-debug-sensitive components on the host first. Move services into k3s only after the host-level mode model is stable. Treat Mac mini externalization as a placement decision, not as a redesign trigger for the host-local state machine.


16. Idle Detection Policy

16.1 Role of Idle Detection

Idle detection is an input signal to the reconciler, not authority on its own.

16.2 Signals

  • input inactivity
  • audio activity
  • GPU utilization / ownership
  • CPU load
  • selected user-job checks

16.3 Policy

Idle-triggered promotion to Compute should:

  • update desired state to compute
  • run the normal reconciliation pipeline
  • abort safely if guards fail

It must never bypass guards.

16.4 Studio-Local Policy

Auto-promotion from studio-local to compute should generally be disabled unless explicitly requested. This remains true even if Studio capability later moves off-box.


17. Security Boundaries

Zones

  • user desktop zone
  • system service zone
  • AI workload zone
  • cluster service zone
  • optional external Studio zone

Controls

  • bind services to appropriate interfaces
  • keep secrets outside dotfiles, e.g. SOPS/agenix
  • keep mode control operations privileged and auditable
  • do not let externalized capability assumptions silently weaken host-local controls

18. Risks and Failure Modes

18.1 Audio Degradation

Cause:

  • background contention

Mitigation:

  • Studio-Local invariants
  • strict guard/action policy

18.2 GPU Contention

Cause:

  • compositor and AI workloads racing for ownership

Mitigation:

  • explicit GPU ownership model
  • guard checks before Compute promotion

18.3 Partial Transition

Cause:

  • GUI exits but vLLM fails to restart
  • desired state written but current state never converges

Mitigation:

  • reconciliation loop
  • bounded retries
  • failed-transition state

18.4 Configuration Drift

Cause:

  • policy split across ad hoc scripts and dotfiles

Mitigation:

  • keep mode policy in Nix + systemd-controlled scripts

18.5 Capability Drift

Cause:

  • Studio capability moved to Mac mini, but local state machine or guards still assume it is local

Mitigation:

  • explicit capability placement model
  • check_studio_capability_local
  • ADR-backed deprecation path for studio-local

19. Open Questions

  1. Should vLLM be host-managed or profile-switched through separate unit templates?
  2. When should Compute graduate into a NixOS specialisation?
  3. How strict should auto-transition be about user jobs and unsaved work heuristics?
  4. Should current state be derived on demand only, or also cached to /run/mode-controller/current?
  5. At what point should local Studio capability be considered officially externalized to a Mac mini?
  6. What data/project sync model is required if Studio is split across machines?

19.1 Resolved Near-Term Decision

For v1:

  • studio-local is not a first-class target
  • studio-local is represented as a protected interactive policy overlay on desktop
  • desktop and compute are the only first-class top-level target families

This keeps the first implementation smaller while preserving the higher-level operational model and leaving room to strengthen Studio semantics later if needed.

19.2 Future Alternatives

Alternative A — Keep studio-local as an overlay permanently

Pros:

  • less target duplication
  • easier future deprecation if Studio moves to a Mac mini
  • simpler runtime switching model

Cons:

  • weaker systemd-level separability
  • more policy encoded in helper units and controller logic

Alternative B — Promote studio-local into a first-class target later

Pros:

  • stronger explicitness in systemd
  • easier inspection of Studio-specific dependencies
  • potentially clearer resource-policy boundaries

Cons:

  • higher maintenance cost
  • more duplication with desktop
  • less aligned with the likely future externalization path

Recommendation

Start with the overlay model. Revisit only if empirical evidence shows that audio-protection policy is too hard to express or validate without a dedicated target.

19.3 Resolved Near-Term Decision — vLLM Service Shape

Target architecture:

  • vllm@desktop.service
  • vllm@compute.service

However, for the first implementation pass, a single vllm.service is acceptable if:

  • desktop and compute profiles are still modeled explicitly in configuration
  • controller actions remain profile-aware
  • observation logic can still determine which profile is active

This allows the first bootable milestone to stay small without locking the architecture into a monolithic service model.

19.4 Resolved Near-Term Decision — k3s Service Shape

For v1:

  • k3s.service should remain stable across host-local modes
  • mode differences should be expressed through:
    • slice/resource budgets
    • workload-placement or workload-intensity policy
    • optional node labels/taints later

This keeps the control plane smaller and avoids coupling every host-mode transition to cluster-runtime teardown and recovery.

Future alternative

If empirical operation shows that stable-across-modes k3s still creates unacceptable interference or ambiguity, stronger k3s mode switching can be introduced later. That should be treated as a deliberate escalation, not the default starting point.

19.5 Resolved Near-Term Decision — Desktop AI Policy

For v1:

  • keep vLLM off in desktop for the first convergence milestone
  • prove desktopcompute transitions before enabling bounded desktop-mode AI

Future alternative

After the control plane is reliable, bounded desktop-mode AI may be introduced as an explicit profile with clear GPU1 ownership and resource limits.

19.6 Resolved Near-Term Decision — studio-local Overlay Shape

For v1, represent studio-local with:

  • studio-local-policy.service
  • audio-priority.service

This gives the controller and observation logic a clear marker plus an explicit enforcement unit without promoting Studio into a first-class top-level target.

Future alternative

If this proves too implicit, studio-local can later be promoted into a stronger grouped target or target-like overlay.

19.7 Resolved Near-Term Decision — Capability Placement Source

For v1, capability-placement.json should be generated from Nix configuration rather than edited ad hoc at runtime.

Rationale

  • keeps placement policy reproducible
  • avoids silent runtime drift
  • matches the design goal that machine-critical policy remain inspectable in Nix and systemd-managed artifacts

Future alternative

If operational experimentation later requires it, an explicit runtime override layer may be added with well-defined precedence and auditability.

19.8 Resolved Near-Term Decision — mode force

For v1, defer mode force.

Rationale

  • keeps attention on making the ordinary reconciliation path correct
  • avoids masking immature guard or transition logic
  • reduces the chance of bypassing safety boundaries during initial bring-up

Future alternative

Add mode force later only after hard-vs-soft guard semantics are stable and well tested.

19.9 Resolved Near-Term Decision — GUI Teardown Semantics

For v1, compute promotion should require:

  • graphical session absence
  • explicit GPU-release verification

It should not initially depend on forcibly stopping every greeter or display-manager path unless empirical testing shows those components interfere with reliable GPU handoff.

19.10 Resolved Near-Term Decision — Desktop Target Ownership

For v1, desktop.target should not directly own the greeter/login path.

Rationale

  • keeps mode ownership focused on operational policy rather than full session-manager orchestration
  • reduces coupling to whichever login/session stack is chosen
  • lets session presence remain an observed fact rather than an aggressively managed requirement

Future alternative

If desktop recovery proves unreliable without tighter control, greeter or display-manager paths can later be pulled under stronger mode ownership.

19.11 Resolved Near-Term Decision — studio-local-policy.service Scope

For v1, studio-local-policy.service should be:

  • a reliable marker for observation/classification
  • a light policy-application unit
  • explicitly limited in scope

It should not become a giant all-in-one Studio behavior controller.

Rationale

  • preserves clear observability
  • avoids burying controller logic inside a catch-all helper unit
  • keeps Studio overlay behavior inspectable and decomposable

19.12 Resolved Near-Term Decision — observe-current Implementation Language

For v1, implement observe-current in shell.

Constraints

  • keep the output contract stable:
    • plain mode name for shell use
    • structured JSON for diagnostics
  • structure the implementation so it can later be replaced by a typed helper without changing callers

Future alternative

If classifier complexity or JSON handling becomes unwieldy, replace only the classifier implementation with a small typed helper while keeping the same external contract.

19.13 Resolved Near-Term Decision — mode CLI Packaging

For v1:

  • keep the script sources in the repository
  • package them in pkgs/
  • install them through the NixOS module

Rationale

  • keeps the tool packaging clean and testable
  • avoids scattering ad hoc scripts directly into module definitions
  • preserves a clean path to reuse across hosts later

19.14 Resolved Near-Term Decision — Reconciler Trigger Model

For v1:

  • use parameterized oneshot reconciliation only
  • do not enable timer-driven or path-triggered background reconciliation yet

Rationale

  • keeps failure behavior easier to understand during bring-up
  • avoids masking transition bugs behind background retries
  • lets manual transitions prove the model first

Future alternative

After manual transitions are reliable, add periodic or path-triggered reconciliation for self-healing behavior.

19.15 Resolved Near-Term Decision — Boot Policy

For v1:

  • normalize to desktop on boot
  • do not replay persistent desired mode across reboot

Rationale

  • gives the system a predictable safe recovery posture
  • avoids booting directly back into a problematic compute path while the controller is still maturing
  • keeps early operational behavior easier to reason about

Future alternative

Once transitions are reliable, desired-state persistence across reboot can be introduced as an explicit policy feature.


19A. Architectural Decision Record — Potential Studio Externalization

Context

There is a realistic possibility that low-latency Studio/Audio workloads will migrate from the NixOS machine to a Mac mini.

Decision

The NixOS host architecture should treat Studio as a conditional local profile (studio-local) rather than a permanently central host mode.

Consequences

  • the host-local state machine remains stable if Studio moves off-box
  • Compute and Desktop remain the durable primary host-local modes
  • Studio capability can be represented separately through workload placement decisions
  • local audio support can still exist now without overcommitting the architecture to a permanent local Studio role

Follow-on Design Implications

  • add check_studio_capability_local guard for any studio-local transition
  • keep local audio policy isolated from core Compute/Desktop mechanics where practical
  • document future sync, control, and workflow boundaries if Studio becomes externalized

20. Control Interface and Implementation Contract

20.1 mode CLI Contract

The system should expose a single operator-facing interface:

mode status
mode request <desktop|studio-local|compute>
mode reconcile
mode current
mode desired
mode explain <desktop|studio-local|compute>
mode dry-run <desktop|studio-local|compute>
mode force <desktop|studio-local|compute>

Command Semantics

mode status

Returns:

  • desired mode
  • observed current mode
  • whether reconciliation is needed
  • last transition result
  • blocking guard failures, if any

mode request <mode>

Behavior:

  1. write desired state
  2. invoke reconciliation
  3. return success only if reconciliation converged

mode reconcile

Behavior:

  • observe current state
  • compare to desired
  • select transition plan
  • run guards
  • execute actions
  • record results

mode current

Returns only the observed current mode.

mode desired

Returns only the desired mode file contents.

mode explain <mode>

Prints:

  • target state properties
  • expected services
  • resource ownership rules
  • guards required for entering that mode
  • capability placement assumptions, where relevant

mode dry-run <mode>

Simulates the full reconciliation plan without mutating state.

mode force <mode>

Privileged path that bypasses selected non-safety guards, but must never bypass hard safety guards such as GPU/display or active audio protections unless explicitly designed to allow that.

Implementation note:

  • defer this command in v1
  • keep it in the long-term interface contract so the design remains forward-compatible

21. State Storage Layout

21.1 Runtime State Paths

/run/mode-controller/
  desired
  current
  lock
  last-transition.json
  last-guards.json
  reconcile.pid
  capability-placement.json
  hardware-topology.json

21.2 File Semantics

desired

Contains the requested mode:

  • desktop
  • studio-local
  • compute

current

Cached observation of current state. This is convenience state only; it must be derivable from system facts.

lock

Used to serialize reconciliation so only one transition runs at a time.

last-transition.json

Stores:

  • requested mode
  • prior observed mode
  • final observed mode
  • success/failure
  • guard results
  • action results
  • timestamps

last-guards.json

Stores latest guard results for diagnostics.

capability-placement.json

Stores environment-level placement facts, for example:

  • studio: local
  • studio: external-mac-mini

This file is not the host-local mode source of truth. It is an environment metadata input used by guards and planning logic.

hardware-topology.json

Stores the currently configured hardware view, for example:

  • planned GPU count
  • currently present GPU indexes
  • display GPU assignment
  • desktop-mode AI GPU set
  • compute-mode AI GPU set

This allows the implementation to preserve the intended dual-GPU architecture while remaining tolerant of temporary single-GPU bring-up phases.


22. systemd Unit and Target Layout

22.1 Targets

desktop.target

Wants:

  • graphical-session target path
  • bounded interactive services
  • optional constrained AI services

First-pass implementation note:

  • do not make desktop.target directly own greeter/login-manager startup in v1
  • treat graphical session presence as an observed runtime fact
  • strengthen ownership later only if empirical recovery behavior requires it

compute.target

Wants:

  • headless service profile
  • vLLM compute profile
  • k3s compute-allowed policy/profile

22.2 Core Services

mode-controller@.service

Parameterized oneshot service.

Instance values:

  • mode-controller@desktop.service
  • mode-controller@studio-local.service
  • mode-controller@compute.service

Responsibilities:

  • load desired mode
  • observe current mode
  • run reconciliation
  • update state files and logs

First-pass implementation note:

  • use this parameterized oneshot service as the sole reconciler trigger in v1
  • defer timer/path-triggered background reconciliation until manual operation is proven reliable

mode-observe.service

Optional oneshot helper to compute observed current mode and refresh /run/mode-controller/current.

vllm@.service

Optional templated service for profile-specific operation:

  • vllm@desktop.service
  • vllm@studio-local.service
  • vllm@compute.service

Alternative:

  • single vllm.service with environment file switching

First-pass implementation guidance:

  • prefer separate desktop and compute profiles conceptually
  • studio-local should not require its own dedicated vLLM unit in v1 if Studio is implemented as a desktop overlay
  • a single vllm.service is acceptable initially if it preserves a clean migration path to templated units later
  • keep desktop-mode vLLM disabled for the first transition-proof milestone

mode-guard@.service

Optional wrapper pattern for reusable guard execution, though plain scripts may be simpler initially.

studio-local overlay units

Recommended first-pass representation:

  • audio-priority.service
  • studio-local-policy.service
  • optional environment/policy file consumed by observation and guard logic

These units should layer on top of desktop.target rather than replacing it with a distinct top-level target in v1.

Recommended scope for studio-local-policy.service:

  • expose a clear mode marker
  • apply only light, explicit Studio-specific policy
  • delegate heavyweight orchestration to the controller or dedicated helper units

22.3 Suggested Slice Layout

system.slice
├── interactive.slice
│   ├── graphical-session scope/services
│   ├── audio-related helpers
│   └── bounded desktop workloads
├── ai.slice
│   ├── vllm service
│   └── AI helpers
└── platform.slice
    ├── k3s service
    └── supporting infra services

Slice Intent

  • interactive.slice gets priority and headroom in Desktop/Studio-Local
  • ai.slice is heavily constrained in Studio-Local, moderately constrained in Desktop, relaxed in Compute
  • platform.slice remains comparatively stable but may have tighter resource budgets in interactive modes and relaxed budgets in Compute

23. Current State Observation Logic

Current state must be observed, not assumed.

23.1 Observation Inputs

GUI Indicators

  • graphical.target or session-specific equivalent active
  • active user session via loginctl
  • Hyprland process/session present

Audio Indicators

  • PipeWire user service active
  • active audio clients or REAPER process
  • optional JACK graph activity

AI Indicators

  • vllm*.service active
  • environment/profile indicates single-GPU or dual-GPU mode
  • optional nvidia-smi-based observation of active GPU usage

Platform Indicators

  • k3s.service active
  • optional workload-class indicators

23.2 Observation Heuristic

Observed mode should be derived using a deterministic classifier.

Proposed classifier logic

Observe compute

If all of the following are true:

  • no active graphical session
  • compute target active or compute service profile active
  • vLLM compute profile active or both GPUs assigned to AI policy

Then observed current mode = compute

Observe studio-local

If all of the following are true:

  • graphical session active
  • audio stack active
  • studio-local policy marker active
  • AI profile disabled or highly constrained

Then observed current mode = studio-local

Observe desktop

If all of the following are true:

  • graphical session active
  • desktop policy marker active
  • no studio-local policy marker

Then observed current mode = desktop

Observe transitioning

If:

  • desired != inferred stable mode
  • controller is running or lock file exists

Then observed current mode = transitioning

Observe failed-transition

If:

  • last transition failed
  • current does not match desired
  • no controller currently reconciling

Then observed current mode = failed-transition

23.3 Recommendation

Use a small classifier script:

/usr/local/libexec/mode-controller/observe-current

Outputs:

  • plain mode name for shell use
  • optional JSON with evidence for debugging

First-pass implementation note:

  • implement this in shell first
  • preserve a stable output contract so the implementation language can change later without changing the control plane

24. Guard Function Contract

24.1 Guard Naming

check_audio_idle
check_gpu_display_released
check_cpu_load_safe
check_user_jobs_safe
check_memory_headroom
check_vllm_drainable
check_graphical_session_absent
check_graphical_session_present
check_target_reachable
check_studio_capability_local

24.2 Exit Code Convention

0   pass
10  policy block: audio active
11  policy block: display GPU still owned
12  policy block: CPU load too high
13  policy block: user jobs active
14  policy block: insufficient memory headroom
15  policy block: vLLM not drainable
16  policy block: graphical session absent when required
17  policy block: graphical session present when forbidden
18  policy block: target unreachable / invalid request
19  policy block: requested local studio capability not available
20+ execution/inspection errors
30+ internal controller misuse

24.3 Guard Output Contract

Each guard should emit a concise structured line or JSON object such as:

{"guard":"check_audio_idle","ok":false,"code":10,"reason":"reaper process active"}

24.4 Hard vs Soft Guards

Hard guards

Must never be bypassed by ordinary automation:

  • active audio protection for Studio-Local -> Compute or Desktop -> Compute
  • GPU/display ownership guard
  • target validity checks
  • local Studio capability checks for studio-local

Soft guards

May be bypassed by privileged operator action or policy:

  • generic CPU load threshold
  • selected user-job heuristics
  • non-critical memory thresholds

25. Transition Plans with Exact Operations

This section normalizes each transition into explicit steps.

25.1 Common Transition Framework

All transitions should follow:

  1. acquire lock
  2. observe current state
  3. validate requested mode
  4. if current == desired, exit success
  5. select transition plan
  6. run transition guards
  7. execute pre-actions
  8. isolate or start target
  9. execute post-actions
  10. re-observe current state
  11. record success/failure
  12. release lock

25.2 Plan: Desktop -> StudioLocal

Preconditions

  • desktop currently observed
  • request = studio-local
  • local Studio capability is still hosted on the NixOS machine

Guards

  • check_target_reachable
  • check_studio_capability_local
  • optional check_user_jobs_safe

Exact operations

write desired=studio-local
flock /run/mode-controller/lock
observe current
run guards
systemctl start audio-priority.service      # if modeled separately
systemctl start studio-local-policy.service
observe current
record result

Notes

  • GUI remains up
  • audio policy is strengthened
  • AI capacity is reduced or removed
  • if Studio capability has been externalized, this transition must fail cleanly with an explanatory reason

25.3 Plan: StudioLocal -> Desktop

Guards

  • check_target_reachable

Exact operations

write desired=desktop
flock /run/mode-controller/lock
observe current
run guards
systemctl stop audio-priority.service       # if separate helper exists
systemctl stop studio-local-policy.service
systemctl isolate desktop.target
observe current
record result

25.4 Plan: Desktop -> Compute

Guards

  • check_target_reachable
  • check_audio_idle
  • check_cpu_load_safe
  • check_user_jobs_safe
  • check_memory_headroom

Pre-actions

  • terminate graphical session
  • wait for GUI disappearance
  • verify GPU/display release

Exact operations

write desired=compute
flock /run/mode-controller/lock
observe current
run initial guards
loginctl terminate-session <session-id>
wait until observe-current no longer sees graphical session
run check_gpu_display_released
systemctl isolate compute.target
systemctl start vllm@compute.service
observe current
record result

Additional notes

  • systemctl isolate compute.target should conflict with interactive/graphical targets in your target design
  • GPU release must be verified after GUI shutdown, not merely assumed

25.5 Plan: Compute -> Desktop

Guards

  • check_target_reachable
  • check_vllm_drainable
  • check_memory_headroom

Exact operations

write desired=desktop
flock /run/mode-controller/lock
observe current
run guards
systemctl stop vllm@compute.service         # or downscale path
systemctl isolate desktop.target
systemctl start vllm@desktop.service        # optional bounded single-GPU profile
observe current
record result

Notes

  • graphical session may be started by display manager or login path depending on design
  • GPU0 becomes protected for display once Desktop converges

25.6 Plan: StudioLocal -> Compute

Preferred behavior

Treat as a direct guarded transition using the same compute-entry pipeline.

Guards

  • check_target_reachable
  • check_audio_idle
  • check_cpu_load_safe
  • check_user_jobs_safe
  • check_memory_headroom

Exact operations

write desired=compute
flock /run/mode-controller/lock
observe current
run guards
loginctl terminate-session <session-id>
wait until graphical session absent
run check_gpu_display_released
systemctl isolate compute.target
systemctl start vllm@compute.service
observe current
record result

Policy note

Because Studio-Local is the most protected interactive mode, auto-promotion from Studio-Local to Compute should generally be disabled unless explicitly requested.


26. NixOS Specialisations vs Runtime Switching — Decision Guidance

26.1 Decision Matrix

CriterionRuntime SwitchingSpecialisationsHybrid
Desktop <-> Studio-Local speedExcellentPoorExcellent
Desktop <-> Compute isolationModerateStrongStronger
ComplexityLowerModerateHighest
Early experimentationBestSlowerModerate
Deep kernel/boot divergenceWeakStrongStrong
Operational convenienceHighLowerModerate
Future externalization of StudioGoodGoodBest

Adopt runtime switching now unless one or more of the following become true:

  1. compute mode needs materially different kernel parameters or boot-time config
  2. graphical/interactive teardown proves unreliable in practice
  3. GPU role handoff remains too leaky under runtime-only switching
  4. you want Compute to be operationally closer to a dedicated server persona than a temporary mode

If any two of the above become persistent problems, promote Compute into a specialisation.

Phase 1

  • single NixOS host definition
  • runtime switching only
  • targets + slices + controller + guards

Phase 2

  • strengthen target separation
  • gather empirical failure/latency data

Phase 3

  • if needed, introduce specialisation.compute
  • preserve same desired/current/reconcile interface so operator UX does not change

Phase 4

  • if Studio is externalized, deprecate or disable studio-local
  • retain the same operator-facing control model for the host-local system

That means mode request compute could later choose:

  • runtime reconcile, or
  • request/reboot into compute specialisation

without changing the higher-level model.


27. Recommended Next Implementation Steps

  1. define exact systemd target dependencies/conflicts in Nix
  2. implement mode CLI wrapper script
  3. implement observe-current
  4. implement guard scripts with fixed exit-code contract
  5. choose between:
    • vllm@desktop.service / vllm@compute.service
    • one service with profile env file
  6. define slice resource policies for interactive vs AI
  7. wire idle detector to mode request compute
  8. validate transition behavior manually before enabling automation
  9. add a capability-placement flag/model for future Studio externalization

28. Summary

This system should behave like a reconciled state machine for host-local operational modes.

The core model is:

  • desired mode is explicit runtime intent
  • current mode is observed reality
  • reconciliation closes the gap
  • guards prevent unsafe transitions
  • systemd targets/services perform the actual mode enactment

The implementation should start with runtime switching, but preserve a clean path to hybrid specialisation if operational evidence justifies stronger separation later.

Studio/Audio should be treated as a conditional local profile plus a capability-placement decision, so that a future move to a Mac mini does not invalidate the host-local architecture.

Mode State Machine Design (v0.1 — Living)

Purpose

Define an explicit, enforceable state machine governing operational modes for a dual-use NixOS system (desktop + AI compute), including states, transitions, guards, and actions.


1. State Definitions

S0: Boot

  • Initial system state
  • Minimal services active
  • Transitions automatically to default mode

S1: Desktop (Dev)

  • Interactive workstation mode
  • Balanced resource usage
  • GUI + audio enabled
  • Limited AI workloads allowed

S2: Studio (Audio)

  • Strict low-latency mode
  • Audio prioritized
  • AI workloads disabled or near-zero

S3: Compute (Headless)

  • Throughput-oriented mode
  • No GUI
  • Full AI utilization (multi-GPU)

S4: Transitioning

  • Temporary state
  • Ensures safe handoff between modes

2. State Diagram

stateDiagram-v2
    [*] --> Boot

    Boot --> Desktop : default

    Desktop --> Studio : enter_studio
    Studio --> Desktop : exit_studio

    Desktop --> Transitioning : to_compute
    Transitioning --> Compute : success
    Transitioning --> Desktop : abort

    Compute --> Transitioning : to_desktop
    Transitioning --> Desktop : success

    Studio --> Desktop : enforced_exit

3. State Properties

Desktop

  • GUI: ON
  • Audio: ON
  • GPU0: Display
  • GPU1: AI (optional)
  • vLLM: constrained (1 GPU)
  • k3s: control plane only

Studio

  • GUI: ON
  • Audio: RT priority
  • GPU0: Display (exclusive)
  • GPU1: disabled or minimal
  • vLLM: OFF
  • k3s: minimal

Compute

  • GUI: OFF
  • Audio: OFF/minimal
  • GPU0 + GPU1: AI
  • vLLM: multi-GPU
  • k3s: full workloads

4. Transitions

T1: Desktop → Studio

Trigger: user command

Guards:

  • No active compute jobs above threshold

Actions:

  • Reduce/stop vLLM
  • Raise audio priority
  • Restrict background jobs

T2: Studio → Desktop

Trigger: user command

Guards: none

Actions:

  • Restore normal scheduling
  • Allow background workloads

T3: Desktop → Compute

Trigger:

  • manual command
  • idle-triggered event

Guards:

  • No active audio sessions (PipeWire graph empty)
  • No REAPER process OR project inactive
  • GPU not held by compositor
  • CPU load below threshold
  • No long-running user jobs

Actions:

  1. Notify user (if interactive)
  2. Terminate GUI session
  3. Wait for GPU release
  4. Stop audio services
  5. Expand vLLM to multi-GPU
  6. Enable compute services (k3s workloads)

T4: Compute → Desktop

Trigger: user command

Guards:

  • vLLM can scale down OR be stopped
  • GPU memory can be reclaimed

Actions:

  1. Drain or stop AI workloads
  2. Reduce vLLM to single GPU or stop
  3. Start graphical target
  4. Reassign GPU0 to display
  5. Start audio stack

T5: Studio → Compute

Trigger: (not allowed)

Policy:

  • Must transition via Desktop

5. Guards (Detailed)

G1: Audio Idle

  • PipeWire graph contains no active nodes
  • No JACK clients

G2: GPU Availability

  • No compositor process using GPU
  • Low GPU utilization

G3: CPU Load

  • Load average below threshold (configurable)

G4: User Workload Safety

  • No known long-running dev tasks
  • Optional: no foreground terminals

G5: Memory Headroom

  • Sufficient free RAM for mode switch

6. Actions (Atomic Steps)

A1: Stop GUI

  • loginctl terminate-session

A2: Release GPU

  • Wait until no graphical processes hold GPU

A3: Adjust Services

  • systemd isolate target

A4: Adjust Resource Limits

  • Modify cgroups/slices

A5: Scale AI Services

  • Adjust CUDA_VISIBLE_DEVICES
  • Restart vLLM

7. Failure Handling

Abort Conditions

  • Guard failure
  • Timeout waiting for GPU release
  • Service failure

Behavior

  • Log reason
  • Return to previous stable state

8. Observability

Required Signals

  • Current mode
  • Last transition
  • Guard evaluation results
  • Resource usage snapshot

Interfaces

  • CLI: mode status
  • Logs: journald

9. Extensibility

Future states may include:

  • Maintenance mode
  • Remote-only desktop mode
  • GPU-partitioned mode

10. Notes

  • This state machine should be implemented via systemd targets + controller script
  • Transitions must be idempotent
  • Guards should be configurable
  • Prefer dry-run capability before execution

Summary

This system treats operational modes as a formal state machine with:

  • explicit states
  • guarded transitions
  • deterministic actions

This enables safe coexistence of:

  • low-latency desktop workloads
  • high-throughput AI services

Dual-Mode NixOS Workstation AI Node — Unified Planning and Mode State Machine


Implementation Checklist Plan

This is structured to get you from doc → bootable system with minimal thrash.


Phase 0 — Ground Truth (before touching Nix)

Hardware + constraints

  • Confirm GPU topology (which is GPU0 vs GPU1)
  • Confirm display wiring (which GPU drives monitor)
  • Confirm audio interface + latency requirements
  • Validate NVIDIA driver compatibility with NixOS + Wayland/Hyprland

Decisions to lock

  • Use runtime switching (no specialisations yet)

  • Studio = studio-local (conditional policy overlay on desktop, not a first-class target in v1)

  • Source of truth = /run/mode-controller/desired

  • mode request is synchronous: return success only after convergence

  • Choose vLLM unit model for v1:

    • v1 fast path: single compute-only vllm.service
    • target architecture: vllm@desktop.service and vllm@compute.service
  • k3s policy for v1:

    • keep k3s.service running across modes
    • change slice budgets and allowed workload intensity by mode
    • defer full k3s mode switching unless operational evidence justifies it
  • desktop-mode AI policy for v1:

    • keep vLLM off in desktop for the first convergence milestone
    • only add bounded desktop-mode AI after desktopcompute switching is reliable
  • studio-local overlay representation for v1:

    • studio-local-policy.service
    • audio-priority.service
  • capability-placement.json source for v1:

    • generated from Nix configuration
    • no runtime override unless a real need emerges
  • defer mode force in v1

  • GUI teardown policy for compute transitions:

    • require graphical session absence
    • require explicit GPU-release verification
    • only add display-manager/greeter stop logic if testing proves it necessary
  • desktop.target should not directly own greeter/login in v1

  • studio-local-policy.service should be:

    • a reliable marker for observation
    • a light policy-application unit
    • not a giant all-in-one Studio controller
  • observe-current implementation for v1:

    • shell first
    • stable plain-text + JSON output contract
    • replace with typed helper later only if complexity justifies it
  • package mode tools in pkgs/ and install them through the module

  • controller trigger model for v1:

    • parameterized oneshot only
    • no timer/path-triggered reconcile until manual transitions are proven
  • boot policy for v1:

    • normalize to desktop on boot
    • defer persistent desired-state replay across reboot
  • Define hard vs soft guards before automation


Phase 0.5 — Control Contract (before full workload integration)

Runtime state contract

  • Define /run/mode-controller/

    • desired
    • current
    • lock
    • last-transition.json
    • last-guards.json
    • capability-placement.json

CLI contract

  • Implement or stub:

    • mode request
    • mode status
    • mode reconcile
    • mode current
    • mode desired
    • mode dry-run
    • mode explain
  • defer mode force until guard policy is battle-tested

Observation contract

  • Classifier can return:

    • desktop
    • studio-local
    • compute
    • transitioning
    • failed-transition

Guard contract

  • Add check_target_reachable
  • Standardize exit codes
  • Standardize structured output
  • Mark guards as hard vs soft

Phase 1 — Base NixOS System

Core system

  • Create flake repo (if not already)
  • Install NixOS (minimal)
  • Enable flakes + nix-command
  • Add SSH + basic hardening

GPU + CUDA

  • Install NVIDIA drivers (matching kernel)
  • Validate nvidia-smi
  • Validate CUDA runtime

Desktop

  • Install Hyprland
  • Configure login/session (greetd or similar)
  • Validate Wayland stability with NVIDIA

Audio

  • Install PipeWire + WirePlumber
  • Validate low-latency config
  • Test REAPER baseline

Phase 2 — systemd Mode Skeleton

Targets / policy markers

  • Define first-class targets:

    • desktop.target
    • compute.target
  • Define studio-local as a policy overlay on desktop

  • Add explicit policy marker/service for studio-local

  • Decide whether studio-local is represented by:

    • audio-priority.service
    • studio-local-policy.service layered over desktop
    • another lightweight marker unit

Relationships

  • Add Conflicts= between:

    • compute ↔ graphical targets
  • Add Wants= / After= dependencies

Slices

  • Define:

    • interactive.slice
    • ai.slice
    • platform.slice
  • Assign services to slices


Phase 3 — Mode Controller (Core)

Core controller

  • mode-controller@.service
  • observe-current
  • reconcile
  • lock handling
  • state-file updates
  • dry-run path

Failure model

  • Record failed-transition
  • Record prior mode
  • Record guard/action failures
  • Verify abort-to-safe-state behavior

Phase 4 — Workload Layer

AI / vLLM

  • Package or install vLLM

  • Create profile-specific config/env for:

    • desktop profile
    • compute profile
  • Implement either:

    • v1 fast path: single vllm.service
    • target path: vllm@desktop.service + vllm@compute.service
  • keep vLLM disabled in desktop for the first bootable transition milestone

  • Validate single-GPU mode

  • Validate dual-GPU mode

  • Keep controller actions profile-aware so later split is mechanical

Platform / k3s

  • Install k3s

  • Configure control node

  • Validate cluster health

  • Deploy minimal workload

  • Keep k3s.service stable across desktop and compute in v1

  • Express mode differences via:

    • platform.slice budgets
    • workload policy / allowed intensity
    • optional node labels / taints later

Phase 5 — State Observation

Implement classifier

  • observe-current script

Detect:

  • graphical session (loginctl / process)
  • PipeWire / audio activity
  • vLLM service state
  • GPU usage (optional: nvidia-smi)

Output

  • plain mode
  • optional JSON (debug)
  • classify transitioning
  • classify failed-transition

Phase 6 — Guards

Implement guards (scripts)

  • check_target_reachable
  • check_audio_idle
  • check_gpu_display_released
  • check_cpu_load_safe
  • check_user_jobs_safe
  • check_memory_headroom
  • check_vllm_drainable
  • check_studio_capability_local

Standardize

  • exit codes
  • JSON output
  • logging
  • hard vs soft guard policy

Phase 7 — Transition Execution

Implement transition flows

  • Desktop → StudioLocal
  • StudioLocal → Desktop
  • Desktop → Compute
  • Compute → Desktop
  • StudioLocal → Compute

Verify explicitly

  • graphical session absence before compute promotion
  • GPU release after GUI shutdown
  • vLLM profile switching
  • audio protection works
  • transitions are idempotent
  • failed guard returns to prior safe state
  • failed action records failed-transition

Phase 8 — Idle + Automation

Idle detection

  • implement idle signal (input + audio + load)
  • threshold tuning

Policy

  • idle → mode request compute
  • guard failures → no transition

Safety

  • never auto-promote from studio-local

Phase 9 — Observability

Logging

  • structured logs for:

    • transitions
    • guards
    • failures

Status

  • mode status shows:

    • desired
    • current
    • last transition
    • blocking guards
    • capability placement

Phase 10 — Hardening

Failure handling

  • retry logic (bounded)
  • failed-transition state handling

Resource tuning

  • CPU quotas per slice
  • memory limits
  • I/O priority
  • tune platform.slice conservatively for desktop / studio-local, relaxed for compute

Security

  • restrict mode controller to root
  • audit transitions
  • isolate AI services

Phase 11 — Optional Evolution

If runtime switching is insufficient

  • introduce specialisation.compute
  • keep same mode interface
  • optionally promote studio-local overlay into a stronger first-class target only if operational evidence justifies the added complexity
  • consider stronger k3s mode-switching only if slice-governed steady-state behavior is inadequate

If Studio moves to Mac mini

  • set capability-placement.json
  • disable studio-local
  • keep controller intact

Critical Path (short version)

If you want the fastest path to something real:

  1. Base NixOS + GPU + Hyprland
  2. vLLM working (single GPU)
  3. Define targets (desktop, compute)
  4. Simple mode CLI + desired file
  5. Hardcoded transitions (no guards yet)
  6. Add guards + observation
  7. Add idle automation
  8. Add studio-local last

Where this can go wrong (worth calling out)

  • GPU release is the hardest boundary → don’t assume, always verify

  • Audio is fragile → treat StudioLocal invariants as strict

  • systemd isolate can surprise you → test with minimal configs first

  • too much cleverness early → get a dumb working version first, then refine

First Bring-Up Checklist

This is the shortest practical path to getting the first live build onto a real NixOS machine.

It assumes:

  • this repo is available on the target machine
  • the target machine is the intended workstation host
  • the current v1 policy remains:
    • boot default = desktop
    • studio-local is an overlay on desktop
    • vLLM is compute-only when explicitly enabled

1. Put the Repo on the Target Machine

git clone <repo-url> /path/to/dubnium
cd /path/to/dubnium

If the repo is already local:

cd /path/to/dubnium

2. Generate Real Hardware Configuration

The scaffold currently contains a placeholder hardware file.

On the target NixOS machine:

sudo nixos-generate-config --dir ./hosts/workstation

This should populate:

  • hosts/workstation/hardware-configuration.nix

Review that file and make sure:

  • it matches the actual boot disk/filesystem layout
  • it does not remove the existing import structure in hosts/workstation/default.nix

3. Review Host-Specific Settings Before First Build

Check hosts/workstation/default.nix.

Important values to confirm:

  • networking.hostName
  • dubnium.hardware.presentGpus
  • dubnium.hardware.displayGpu
  • dubnium.hardware.computeGpus
  • dubnium.vllm.enable
  • dubnium.vllm.model

Current intended first live model:

  • Qwen/Qwen2.5-Coder-14B-Instruct

Current intended first hardware phase:

  • planned architecture: 2 GPUs
  • currently present: GPU 0
  • compute GPU set: [ 0 ]

4. Build Without Switching First

Do a dry build first:

sudo nixos-rebuild build --flake .#workstation

If this fails:

  • fix Nix evaluation issues first
  • do not jump into switch

Common first-failure areas:

  • hardware configuration mismatch
  • NVIDIA options
  • package evaluation problems
  • typos in host-local settings

5. Switch to the New Configuration

If the build succeeds:

sudo nixos-rebuild switch --flake .#workstation

6. Verify Core Pieces After Switch

Check the mode CLI:

mode status
mode current
mode desired

Check runtime state files:

sudo ls -la /run/mode-controller
sudo cat /run/mode-controller/desired
sudo cat /run/mode-controller/current
sudo cat /run/mode-controller/capability-placement.json
sudo cat /run/mode-controller/hardware-topology.json

Check systemd units:

systemctl status desktop.target
systemctl status compute.target
systemctl status studio-local-policy.service
systemctl status audio-priority.service
systemctl status vllm.service

Notes:

  • vllm.service should not be active in desktop
  • with default workstation settings, vllm.service should not exist until dubnium.vllm.enable = true
  • studio-local-policy.service and audio-priority.service should not be active unless studio-local is requested

7. Test desktop -> studio-local

sudo mode request studio-local
mode status
systemctl status studio-local-policy.service
systemctl status audio-priority.service

Expected result:

  • current mode becomes studio-local
  • studio-local-policy.service is active
  • audio-priority.service is active

Then return:

sudo mode request desktop
mode status

8. Test desktop -> compute

Before testing:

  • close REAPER
  • avoid active audio work
  • avoid long-running foreground development jobs
  • seed the local model bundle from USB
  • explicitly enable dubnium.vllm.enable = true if this test should exercise the vLLM service

Then:

sudo mode request compute
mode status
systemctl status compute.target
systemctl status vllm.service

Expected result:

  • graphical session is terminated
  • system converges to compute
  • if vLLM is enabled, vllm.service is started by compute.target

Important caveat:

Seed the local model bundle from USB before the first compute transition. If the bundle is absent, vLLM should fail clearly rather than relying on a first-run network download.


9. Test compute -> desktop

sudo mode request desktop
mode status
systemctl status vllm.service

Expected result:

  • vllm.service is stopped
  • system converges back to desktop

10. If Something Fails

Check:

mode status
sudo cat /run/mode-controller/last-transition.json
sudo cat /run/mode-controller/last-guards.json
journalctl -u 'mode-controller@*' -b
journalctl -u vllm.service -b

Most useful first diagnosis buckets:

  • guard blocked transition
  • graphical session did not terminate cleanly
  • GPU did not look released
  • vLLM service failed to start
  • model/runtime/CUDA issue

11. First Successful Milestone

You should consider first bring-up successful when all of the following are true:

  • nixos-rebuild switch --flake .#workstation succeeds
  • mode status works
  • desktop -> studio-local -> desktop works
  • desktop -> compute -> desktop works
  • last-transition.json and last-guards.json are useful for failures

At that point, the next iteration is:

  • tighten NVIDIA/vLLM runtime behavior
  • improve observe-current
  • tune audio-priority.service
  • refine slice policy
  • add second GPU when ready

Fresh Install Checklist

This checklist is for installing dubnium onto a machine from scratch using a NixOS live USB.

Use this when:

  • the target machine does not already run NixOS
  • you are replacing the current OS
  • you want the flake to be the source of truth from first boot

If the machine already runs NixOS, use docs/first-bring-up-checklist.md instead.

Each top-level step has:

  • Start when: what must already be true before starting the step
  • Outcomes: what should be true when the step is complete

1. Prepare a NixOS Installer USB

Current preferred path: use the Dubnium custom installer USB, not a stock ISO. The custom installer bakes a source export of this private repo plus external/dotfiles into the live image. Write it to USB as a raw disk image, matching Rufus “DD image mode”. Use separate writable media for a local model seed bundle.

Build the ISO and prepare the seed model:

scripts/build-installer-iso.sh \
  --iso ./dubnium-installer.iso

This writes ./dubnium-installer.iso into the checkout. By default the helper uses the current Dubnium default model bundle, but the USB layout only requires a materialized model directory with config.json and SHA256SUMS. Pass --seed-model when using a different local bundle.

Then prepare the USB with the guarded writer for the current platform.

Windows PowerShell:

.\scripts\write-installer-usb.ps1 `
  -IsoPath .\dubnium-installer.iso `
  -DiskNumber 7 `
  -ExpectedFriendlyName "USB SanDisk 3.2Gen1"

The writer requires the disk identity check and final y/N confirmation. It overwrites the whole USB disk with the ISO image.

Optional one-shot Windows path:

.\scripts\build-installer-usb.ps1 `
  -DiskNumber 7 `
  -ExpectedFriendlyName "USB SanDisk 3.2Gen1"

Optional one-shot Linux or macOS path:

bash scripts/build-installer-usb.sh \
  --disk /dev/sdX \
  --expected SanDisk

On macOS, use a whole disk such as /dev/diskN. On Linux, use a whole USB disk such as /dev/sdX, not a partition.

Manual Linux USB write path:

scripts/write-installer-usb.sh \
  --iso ./dubnium-installer.iso \
  --disk /dev/sdX \
  --expected SanDisk

Expected USB layout:

dubnium-installer.iso -> whole USB disk

Verify the installer media from whichever drive letter Windows assigns:

Test-Path I:\EFI\BOOT\BOOTX64.EFI
Test-Path I:\nix-store.squashfs
Get-Volume -DriveLetter I

Separate seed media should contain:

models/selected-model-bundle/

See docs/runbooks/custom-installer-iso.md for the full USB process and docs/runbooks/model-seeding.md for the seed bundle commands.

Start when

  • existing Nix-capable build machine with the Dubnium repo checkout
  • USB stick that can be erased
  • materialized model bundle is available locally, if seeding the model now
  • permission to run the guarded USB writer for the platform

Outcomes

  • custom Dubnium ISO is built from the intended flake source
  • USB device identity was checked by the platform helper before erase
  • USB has a bootable raw-written Dubnium installer image
  • separate seed media has the local model bundle, if seeding now
  • the install path requires no GitHub token, private SSH key, or Hugging Face download in the live installer

1.1 USB Security And Drift Check

Before leaving the build machine, confirm:

git status --short
git -C external/dotfiles status --short

The ISO bakes tracked flake source, including external/dotfiles, into the installer. Stage or commit intentional changes before building, and do not bake decrypted secrets, long-lived tokens, SSH private keys, local caches, or model weights into the repo.

The USB is private media. The installer payload contains private source, and separate seed media contains unencrypted model files.

1.2 Seamless USB Acceptance Check

Before booting the target, verify the prepared stick:

EFI/BOOT/BOOTX64.EFI
nix-store.squashfs
seed-media/models/selected-model-bundle/config.json
seed-media/models/selected-model-bundle/SHA256SUMS

If the model bundle is not on the USB yet, use docs/runbooks/model-seeding.md before booting the target.

1.3 Stock ISO Fallback

A stock NixOS ISO remains useful for rescue, but it is not the preferred fresh Dubnium install path. If using stock media, you must bring the Dubnium source and initialized dotfiles submodule on separate private media, then install from that local checkout. Do not depend on live-session GitHub credentials for a private-repo install.


2. Boot the Target Machine From USB

Start when

  • prepared NixOS installer USB
  • physical access to the target machine
  • firmware access or boot-menu access

Outcomes

  • target machine is booted into the NixOS live environment
  • firmware boot mode and target disk visibility are confirmed
  • keyboard, display, disk visibility, and network are usable
  • source import tools are available before repo setup steps begin
  • private repo source is reachable from the live environment
  • optional SSH access to the live environment is available if needed

2.1 Confirm Firmware Settings

Before booting the installer, review firmware settings:

  • boot mode should be UEFI, not legacy/CSM
  • Secure Boot should be disabled unless you intentionally handle it
  • internal install disk should be visible
  • primary display GPU should be the one you expect
  • Above 4G decoding should be enabled if the firmware exposes it and you plan to use multiple GPUs
  • virtualization/IOMMU can be enabled if you expect to use it later

Do not proceed if the firmware cannot see the target install disk.

2.2 Enter the Boot Menu

Insert the USB stick into the target machine, power it on, and enter the firmware boot menu.

Common boot-menu keys:

  • F8
  • F11
  • F12
  • Esc
  • Del

Choose the USB entry. Prefer the UEFI entry if the firmware shows both legacy and UEFI options.

2.3 Confirm Live Environment Basics

After the NixOS live environment boots, open a terminal.

Check that the machine sees CPU, memory, disks, and network devices:

lscpu | head
free -h
lsblk -o NAME,SIZE,MODEL,TYPE,MOUNTPOINTS
ip link

Check network connectivity:

ip addr
ping -c 3 1.1.1.1
ping -c 3 github.com

If networking is not up:

  • connect Ethernet if available
  • use the graphical network manager in the GNOME ISO
  • on the minimal ISO, use nmtui if available:
sudo nmtui

Exit criteria:

  • keyboard works
  • display works
  • target install disk is visible
  • internet access works

2.4 Ensure Source Import Tools Are Available In the Live Environment

On the custom Dubnium installer USB, confirm the baked source helper exists:

command -v unpack-dubnium
tar --version

If unpack-dubnium is available, section 3 can use the baked source snapshot directly and does not need GitHub credentials.

Before importing the repo source from the live USB session, confirm git and basic archive tools are available:

git --version
tar --version

If git is missing and you need it for validation, install it in the live environment:

nix-shell -p git --run 'git --version'

If you need git for more than one command, enter a shell with it available:

nix-shell -p git
git --version

Exit criteria:

  • git --version succeeds in the current shell or in the shell you will use to inspect the repo
  • tar --version succeeds if you are extracting an archive

2.5 Optional: Enable SSH Into the Live Environment

Use this if the target machine is easier to drive from another computer. Note: If using the Custom Installer, your SSH keys may already be authorized. Otherwise:

  1. Set a temporary password: passwd
  2. Or add your key: mkdir -p ~/.ssh && echo "ssh-ed25519 ..." >> ~/.ssh/authorized_keys

Start SSH:

sudo systemctl start sshd
ip addr

Then connect from another machine using the live environment IP address:

ssh nixos@<target-ip>

This access is temporary and only applies to the live USB environment.


3. Make the Repo Available in the Live Environment

Start when

  • live NixOS environment is running
  • the custom installer source snapshot is available, or a separate private source export is attached to the machine

Outcomes

  • Dubnium repo exists in the live environment
  • repo contains flake.nix
  • repo contains hosts/workstation/default.nix
  • repo contains external/dotfiles/flake.nix
  • commands are being run from the repo root

3.0 Preferred: Unpack From Custom Installer Media

For the current one-shot install path, run the guarded installer helper:

install-dubnium-from-usb

This replaces the manual section 3 through section 9 flow for the simple unencrypted layout. The helper prints lsblk, prompts for the target whole disk, and asks for final y/N confirmation before erasing anything. Defaults are btrfs, dubnium home profile, passwd password mode, and copying the install snapshot into the installed system. Use --password-mode hash to write a host-local initial password hash before install, or --password-mode skip when another login path already exists. Use --dry-run first if disk identity is not yet obvious.

If booted from the Dubnium custom installer USB, use the baked source snapshot:

unpack-dubnium
cd ~/local/src/dubnium

This is the token-free private repo path. It does not clone from GitHub during install.

To choose the installed normal user, create hosts/workstation/user.nix before install:

{
  dubnium.user.name = "alice";
  dubnium.user.description = "Example User";
}

3.1 Alternate: Copy Source From Local Media

If you brought the repo on separate media, attach it now and identify it:

lsblk -o NAME,SIZE,MODEL,TRAN,TYPE,MOUNTPOINTS

Mount the removable media read-only if practical, then extract or copy the exported source into your working directory.

Example for a separate git archive-style export:

mkdir -p ~/installer-src
cd ~/installer-src
tar -xzf /path/to/dubnium-installer-src.tgz
cd dubnium

Example for a plain copied export tree:

mkdir -p ~/Projects
cp -a /path/to/dubnium ~/Projects/dubnium
cd ~/Projects/dubnium

This path avoids depending on live-session GitHub credentials. Prefer the custom installer payload when available, because it keeps the source path and helper behavior consistent.

3.2 Alternate: Extract A Separate Source Archive

If you are not using the current custom installer payload, bring a separate source archive and extract it to the same live-session path:

mkdir -p ~/local/src
tar -xzf /path/to/dubnium-installer-src.tgz -C ~/local/src
cd ~/local/src/dubnium

3.3 Verify Repo Contents

pwd
ls
git status --short
test -f flake.nix
test -f hosts/workstation/default.nix
test -f external/dotfiles/flake.nix

Exit criteria:

  • repo is present locally, whether copied, extracted, or imported
  • flake.nix exists
  • hosts/workstation/default.nix exists
  • external/dotfiles/flake.nix exists

4. Partition the Target Disk

Start when

  • target install disk is visible in lsblk
  • disk encryption decision is made
  • swap/hibernation decision is made
  • target disk has been positively identified and is safe to erase

Outcomes

  • target disk has a new GPT partition table
  • EFI system partition exists
  • root partition exists
  • EFI_PART and ROOT_PART point to real block devices
  • no partitioning commands have touched the USB installer

This repo does not yet prescribe a disk layout.

The example below uses a simple UEFI layout:

  • EFI system partition: 1 GiB, FAT32, mounted at /boot
  • root partition: rest of disk, ext4, mounted at /

This example does not create a separate /home partition and does not create a swap partition. Add those only if you deliberately want them.

This example does not enable disk encryption. If you want LUKS or a separate encrypted data layout, stop here and use a different partition/filesystem plan.

This example also does not create a swap partition. If hibernation is required, stop here and design swap explicitly. If hibernation is not required, zram can be handled later in NixOS configuration.

4.1 Identify the Install Disk

List disks:

lsblk -o NAME,SIZE,MODEL,TRAN,TYPE,MOUNTPOINTS

Example NVMe disk:

nvme0n1  1.8T Samsung_SSD disk

Example SATA/SAS/USB-style disk:

sda  1.8T Samsung_SSD disk

Set the target disk variable:

DISK=/dev/nvme0n1

or:

DISK=/dev/sda

Important:

  • this must be the internal install disk
  • this must not be the USB installer
  • all data on this disk will be destroyed once partitioning begins

4.2 Confirm Existing Layout

Before touching the disk:

echo "$DISK"
lsblk -o NAME,SIZE,MODEL,TYPE,FSTYPE,MOUNTPOINTS "$DISK"
sudo fdisk -l "$DISK"

Before touching disks, decide:

  • disk device name
  • EFI size
  • root filesystem choice
  • whether you want swap or zram only
  • whether you want a separate /home

Minimum sane layout:

  • EFI system partition
  • root partition

Example tools:

  • lsblk
  • blkid
  • fdisk
  • parted
  • gdisk

Do not proceed until you are sure which disk you are installing to.

4.3 Preview and Clear Existing Signatures

Preview existing filesystem and partition signatures:

sudo wipefs -n "$DISK"

If the disk is definitely the install target, clear old signatures:

sudo wipefs -a "$DISK"

This is destructive. Do not run it against the USB installer or any disk you intend to preserve.

4.4 Create a GPT Partition Table

This is destructive. Only run it after confirming DISK.

echo "About to partition: $DISK"
lsblk -o NAME,SIZE,MODEL,TYPE,MOUNTPOINTS "$DISK"

Create the partition table and partitions:

sudo parted "$DISK" -- mklabel gpt
sudo parted "$DISK" -- mkpart ESP fat32 1MiB 1025MiB
sudo parted "$DISK" -- set 1 esp on
sudo parted "$DISK" -- mkpart primary ext4 1025MiB 100%

Ask the kernel to re-read the partition table:

sudo partprobe "$DISK"
sleep 2
lsblk -o NAME,SIZE,MODEL,TYPE,FSTYPE,MOUNTPOINTS "$DISK"

4.5 Set Partition Variables

For NVMe disks, partitions are usually named with p1 / p2:

EFI_PART="${DISK}p1"
ROOT_PART="${DISK}p2"

For SATA/SAS-style disks, partitions are usually named 1 / 2:

EFI_PART="${DISK}1"
ROOT_PART="${DISK}2"

Verify:

echo "EFI_PART=$EFI_PART"
echo "ROOT_PART=$ROOT_PART"
test -b "$EFI_PART"
test -b "$ROOT_PART"
lsblk -o NAME,SIZE,MODEL,TYPE,FSTYPE,MOUNTPOINTS "$DISK"

5. Create Filesystems and Mount Them

Start when

  • EFI_PART points to the EFI partition
  • ROOT_PART points to the root partition
  • both partition variables have been verified with test -b

Outcomes

  • EFI partition is formatted FAT32
  • root partition is formatted ext4
  • root partition is mounted at /mnt
  • EFI partition is mounted at /mnt/boot
  • mount layout matches the future NixOS filesystem config

5.1 Format the Partitions

This is destructive to the selected partitions.

sudo mkfs.fat -F 32 -n NIXBOOT "$EFI_PART"
sudo mkfs.ext4 -L nixos "$ROOT_PART"

5.2 Mount the Root Filesystem

sudo mount "$ROOT_PART" /mnt

5.3 Mount the EFI Filesystem

The current host config expects systemd-boot, so mount the EFI filesystem at /mnt/boot:

sudo mkdir -p /mnt/boot
sudo mount "$EFI_PART" /mnt/boot

5.4 Verify Mount Layout

Once mounted, verify:

findmnt /mnt
findmnt /mnt/boot
lsblk -o NAME,SIZE,FSTYPE,LABEL,MOUNTPOINTS "$DISK"

Expected:

  • root partition mounted at /mnt
  • EFI partition mounted at /mnt/boot

6. Generate Hardware Configuration Into the Repo

Start when

  • repo is available and current shell is at repo root
  • target root filesystem is mounted at /mnt
  • target EFI filesystem is mounted at /mnt/boot

Outcomes

  • hosts/workstation/hardware-configuration.nix reflects the target hardware
  • generated filesystem entries match /mnt and /mnt/boot
  • placeholder hardware config has been replaced
  • git diff shows the hardware config change

6.1 Generate Config

From the repo root:

sudo nixos-generate-config --root /mnt --dir ./hosts/workstation

This should populate:

  • hosts/workstation/hardware-configuration.nix

Important:

  • this file must reflect the real disk layout you just mounted
  • this replaces the scaffold placeholder currently in the repo

6.2 Review Generated Hardware Config

sed -n '1,220p' hosts/workstation/hardware-configuration.nix

Confirm:

  • root filesystem points at the root partition or its filesystem label/UUID
  • /boot points at the EFI partition
  • generated imports look normal
  • no obvious reference to the USB installer disk exists

6.3 Confirm Git Diff

git diff -- hosts/workstation/hardware-configuration.nix

Exit criteria:

  • hardware config changed from placeholder to real host config
  • filesystem entries match the mounted target disk

7. Review Host Config Before Install

Start when

  • generated hardware config exists
  • host config exists at hosts/workstation/default.nix
  • hardware facts are known well enough to set GPU options accurately
  • login/access strategy is known

Outcomes

  • hostname, bootloader, SSH, GPU, vLLM, and k3s settings are reviewed
  • GPU settings reference only installed/visible GPUs
  • vLLM first-install stance is explicit
  • k3s first-install stance is explicit
  • at least one installed-system login path is known

7.1 Inspect Host Config

Check hosts/workstation/default.nix.

sed -n '1,240p' hosts/workstation/default.nix

At minimum confirm:

  • hostname
  • current GPU assumptions
  • vLLM model choice
  • any network or SSH expectations
  • bootloader settings
  • k3s enablement

Current scaffold assumptions:

  • boot default is desktop
  • studio-local is a desktop overlay
  • vLLM is compute-only
  • planned topology is 2 GPUs
  • currently present GPU set defaults to [ 0 ]

7.2 Confirm GPU Settings

If the target currently has only one NVIDIA GPU:

dubnium.hardware.presentGpus = [ 0 ];
dubnium.hardware.displayGpu = 0;
dubnium.hardware.computeGpus = [ 0 ];

If the target has two NVIDIA GPUs and you are ready to expose both to compute, update only after confirming nvidia-smi ordering:

dubnium.hardware.presentGpus = [ 0 1 ];
dubnium.hardware.displayGpu = 0;
dubnium.hardware.computeGpus = [ 0 1 ];

For first bring-up, prefer the most conservative accurate setting. Do not list a GPU that is not installed and visible.

7.3 Confirm vLLM Settings

Current host config disables vLLM by default so the workstation can prove the base desktop system before model/runtime work:

dubnium.vllm.enable = false;

If opting into vLLM for compute testing, set dubnium.vllm.enable = true and consider explicit first-run guardrails:

dubnium.vllm.extraArgs = [
  "--max-model-len" "8192"
  "--gpu-memory-utilization" "0.70"
  "--enforce-eager"
];

7.4 Confirm k3s Settings

Current host config has:

dubnium.k3s.enable = false;

Keep k3s disabled for the first install unless you specifically want to validate k3s during the first boot.

7.5 Confirm User and Access Settings

Before installing, confirm how you will log into the installed system:

rg -n "users\\.users|openssh|authorizedKeys|initialPassword|hashedPassword" hosts modules

The current host config enables SSH, but this checklist should not assume a normal user account exists unless the NixOS config declares it.

Choose one access strategy before install:

  • root password set by nixos-install
  • declared normal user with password or SSH key
  • SSH key access configured in NixOS

For the default workstation user, keep the password hash local by adding hosts/workstation/user.nix before install:

{
  users.users.ryjen.initialHashedPassword = "$y$j9T$...";
}

Generate the hash in the live environment with:

mkpasswd -m yescrypt

Do not reboot into the installed system without knowing at least one login path.


8. Optional Dry Evaluation Before Install

Start when

  • repo is at install-ready state
  • generated hardware config exists
  • network access is working in the live environment
  • Nix can evaluate flakes in the live environment

Outcomes

  • flake evaluation has been attempted
  • mode-tools package build has been attempted
  • any evaluation/build failure is understood before install
  • no unknown evaluation error is carried into nixos-install

8.1 Build the Target System

If the live environment has working Nix daemon support and networking, try:

sudo nixos-rebuild build --flake .#workstation

This is optional but useful.

If it fails:

  • fix evaluation problems before running the installer

8.2 Build the Mode Tools Package

nix build .#packages.x86_64-linux.mode-tools

8.3 Inspect Common Evaluation Failures

Common buckets:

  • hardware configuration references the wrong disk
  • NVIDIA package/options fail to evaluate
  • vLLM package is unavailable or expensive to build in the live environment
  • unfree packages are blocked
  • host option assertions fail

Exit criteria:

  • the flake evaluates
  • the system build either succeeds or fails for a known reason you have decided to accept before nixos-install

9. Install From the Flake

Start when

  • /mnt and /mnt/boot are mounted correctly
  • hardware config and host config are reviewed
  • dirty repo state is intentional
  • installed-system login path is known
  • repo persistence plan is explicit

Outcomes

  • NixOS is installed from .#workstation
  • bootloader installation result is known
  • root password or equivalent access path is established
  • repo is copied into the installed filesystem or a post-boot source import plan is explicit

9.1 Final Preinstall Check

Before installing:

findmnt /mnt
findmnt /mnt/boot
lsblk -o NAME,SIZE,FSTYPE,LABEL,MOUNTPOINTS "$DISK"
git status --short

Confirm:

  • /mnt is the target root filesystem
  • /mnt/boot is the target EFI filesystem
  • generated hardware config is present
  • host config is reviewed
  • any dirty repo state is intentional

9.2 Confirm Repo Persistence Plan

The live USB environment is temporary. The install itself uses the live checkout at ~/local/src/dubnium, but that path does not automatically become an installed-system checkout.

If you want the flake source to be available immediately after first boot, copy the current repo into the target filesystem before installing.

Example target location:

sudo mkdir -p /mnt/home/<user>/Projects
sudo cp -a "$(pwd)" /mnt/home/<user>/Projects/dubnium

If the installed system will have a different user or home path, adjust the destination.

If you prefer not to copy from the live environment, plan how you will import the repo source again after first boot. Do not assume the live-environment checkout survives reboot.

The custom installer source payload belongs to the USB live system. It is enough to install from, but it does not automatically become a checkout on the installed system. If install-time changes need to go back to the private Dubnium repo, reconcile them after first boot using Post-Install Source Reconciliation.

9.3 Run Installer

From the repo root:

sudo nixos-install --flake .#workstation

If the installer asks for a root password, set one unless you have already configured another access path.

9.4 Capture Install Result

If install succeeds, note:

  • whether bootloader installation succeeded
  • whether any warnings appeared
  • whether a root password was set

If install fails, do not reboot yet. Inspect the error while still in the live environment.


10. Reboot Into the Installed System

Start when

  • nixos-install --flake .#workstation completed successfully
  • bootloader result is known
  • root password or other access path exists
  • no unresolved install error remains

Outcomes

  • machine boots from the internal disk
  • USB installer is removed or not selected
  • installed NixOS system reaches a login/session path
  • if boot fails, rescue path is known and documented

10.1 Unmount and Reboot

If install succeeded:

sync
sudo reboot

Remove the USB stick when appropriate so the machine boots from disk.

10.2 Select Installed Disk

If the machine boots back into the USB installer:

  • remove the USB stick
  • enter firmware boot menu
  • select the internal disk or Linux Boot Manager

10.3 Recovery If Boot Fails

If the installed system does not boot:

  • boot the USB installer again
  • mount root and EFI partitions back under /mnt
  • inspect /mnt/etc/nixos and the generated hardware config
  • check firmware boot entries with bootctl from a chroot if needed

Concrete rescue mount:

sudo mount "$ROOT_PART" /mnt
sudo mount "$EFI_PART" /mnt/boot

Enter the installed system:

sudo nixos-enter --root /mnt

Inside the chroot:

bootctl status
nixos-rebuild boot --flake /home/<user>/Projects/dubnium#workstation
exit

If the repo was not copied into the installed filesystem, use the path where it actually exists or import it again from your prepared source media.


11. First Boot Verification

Start when

  • installed system has booted from internal disk
  • operator can log in locally or over SSH
  • repo exists on the installed system or can be imported immediately

Outcomes

  • installed system identity is verified
  • repo source is available on the installed system
  • mode CLI works
  • runtime state files exist
  • first observed mode is desktop
  • vLLM and studio overlay services are inactive in desktop
  • NVIDIA basics are verified before any compute testing

11.1 Verify Basic System Identity

After booting the installed system:

hostname
uname -a
ip addr

11.2 Verify Repo Location

If you copied the repo before install:

test -d ~/Projects/dubnium
cd ~/Projects/dubnium
git status --short

If the repo is missing, import it now before treating the system as fully owned by the flake source.

11.3 Verify Mode CLI

mode status
mode current
mode desired

11.4 Verify systemd Units

systemctl status desktop.target
systemctl status compute.target
systemctl status vllm.service
sudo ls -la /run/mode-controller

11.5 Verify Runtime State Files

sudo cat /run/mode-controller/desired
sudo cat /run/mode-controller/current
sudo cat /run/mode-controller/capability-placement.json
sudo cat /run/mode-controller/hardware-topology.json

Expected first-boot posture:

  • current mode should be desktop
  • vLLM should not be active in desktop
  • studio-local-policy.service should not be active
  • audio-priority.service should not be active

11.6 Verify NVIDIA Before Compute Testing

Before testing compute, verify NVIDIA basics:

nvidia-smi
lsmod | grep nvidia

Do not run mode request compute from the fresh-install checklist. Compute transition testing belongs in the bring-up and transition-testing runbooks after the desktop baseline, observer, and NVIDIA runtime all look correct.


12. Continue With Bring-Up

Start when

  • fresh install success criteria are satisfied
  • desktop baseline is usable
  • mode CLI and runtime state files work

Outcomes

  • ownership transfers to the first bring-up checklist
  • transition testing is not started from the fresh-install checklist
  • compute testing is gated behind the bring-up/transition runbooks

After the machine is installed and boots correctly, continue with:

That covers:

  • dry build vs switch
  • mode transition tests
  • studio-local checks
  • compute checks
  • failure inspection paths

13. Common Failure Areas

Start when

  • an install, boot, or first verification step failed
  • error output or observed failure is available

Outcomes

  • failure is categorized before more changes are made
  • recovery work targets the likely failure bucket
  • repeated failures are recorded with evidence

Fresh installs usually fail in one of these buckets:

  • wrong disk selected during partitioning
  • incorrect mount layout before nixos-generate-config
  • hardware config not regenerated into the repo
  • bootloader/EFI mismatch
  • NVIDIA/runtime issues after first boot
  • vLLM/model/runtime issues once compute mode is exercised

14. Success Criteria

Start when

  • all previous steps either passed or were intentionally skipped with a reason
  • first boot verification has been completed

Outcomes

  • the machine is installed from the flake
  • the system boots from disk
  • the repo-based configuration owns the machine
  • the machine is ready for first bring-up, not yet full compute operation

A successful fresh install means:

  • the machine boots from disk into the flake-managed system
  • mode status works
  • the repo-based configuration owns the system from first boot
  • you can move on to the bring-up checklist without reinstalling