Architecture Overview
Status: living
This is the arc42-lite entrypoint for Dubnium. It describes the system shape, constraints, building blocks, runtime behavior, deployment view, and current risks without replacing lower-level implementation docs.
Purpose
Dubnium is a policy-driven NixOS workstation and AI node. It supports multiple host-local operational contracts on one physical machine:
desktop: interactive Hyprland workstation and development mode.studio-local: conditional low-latency audio overlay ondesktop.compute: headless throughput-oriented AI/platform mode.
The architecture exists to make mode transitions explicit, observable, guard-driven, auditable, and reversible.
Constraints
- Desired state is not current state.
- Current state must be derived from runtime observation.
- Runtime reconciliation is mandatory for mode changes.
- systemd targets, services, and slices are the enforcement mechanism.
- Runtime switching comes before NixOS specialisations.
studio-localis conditional and must not dominate the architecture.- Host-local modes must remain separate from capability placement.
- Failure, degraded, and blocked states must be modeled explicitly.
System Context
Actors and adjacent systems:
- Local operator: requests mode changes, checks status, recovers failures.
- NixOS host: owns systemd enforcement, hardware, services, and runtime state.
- GPU/display/audio hardware: shared resources with conflicting latency and throughput requirements.
- vLLM: compute workload, active only in
computefor v1. - k3s: platform workload, stable across modes for v1.
- Possible external studio host: future placement for audio/studio capability.
Building Blocks
- Nix flake: declares the host configuration and packaged tools.
modules/dubnium: mode policy, options, targets, slices, controller units, state files, and guard installation.modules/workloads: workload-specific service definitions such as Hyprland, audio, NVIDIA, vLLM, and k3s.modeCLI: operator surface for requests, status, desired/current state, and explanation.- Reconciler: privileged transition executor.
- Observer: evidence-based classifier for current mode.
- Guards: small checks that return pass, policy block, or execution error.
- systemd: target, service, slice, and cgroup enforcement layer.
Runtime View
All mode changes follow the same control-loop shape:
- Authorize the request.
- Write desired state.
- Acquire the controller lock.
- Observe current state from runtime facts.
- Validate target and capability placement.
- Run transition guards.
- Execute bounded actions through systemd and helper scripts.
- Re-observe.
- Classify success, degraded state, blocked state, or failure.
- Write transition and guard records.
Success is never inferred from attempted actions. Success requires post-transition observation that satisfies the target mode predicates.
Deployment View
Primary deployment target:
- one
x86_64-linuxNixOS workstation host namedworkstation - Hyprland desktop
- NVIDIA/CUDA runtime
- planned dual-GPU topology, with hardware-tolerant transitional config
- vLLM model/cache state outside the Nix store
- k3s control-node duties
Runtime state:
- live state under
/run/mode-controller - future persistent audit history under
/var/lib/mode-controlleror/persist/var/lib/mode-controllerwhen impermanence lands
Cross-Cutting Concerns
- Safety: guards block destructive transitions and distinguish policy blocks from execution errors.
- Observability: status must show desired state, observed state, conflicts, guard failures, and latest transition result.
- Auditability: every reconciliation attempt should produce structured records.
- Resource ownership: GPU, CPU, memory, I/O, audio, AI, and platform planes must not silently overlap in conflicting ways.
- Security: unprivileged users must not forge desired/current state or transition success.
Current Risks
- NVIDIA/Wayland GPU release may not be reliable enough for runtime-only compute promotion.
- Mixed runtime states may confuse a shell observer unless conflicts are handled conservatively.
systemctl isolatecan stop required services if target dependencies are not explicit enough.- Rollback must prove restored desktop behavior through observation, not just successful systemd commands.
See also: