Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Architecture Overview

Status: living

This is the arc42-lite entrypoint for Dubnium. It describes the system shape, constraints, building blocks, runtime behavior, deployment view, and current risks without replacing lower-level implementation docs.

Purpose

Dubnium is a policy-driven NixOS workstation and AI node. It supports multiple host-local operational contracts on one physical machine:

  • desktop: interactive Hyprland workstation and development mode.
  • studio-local: conditional low-latency audio overlay on desktop.
  • compute: headless throughput-oriented AI/platform mode.

The architecture exists to make mode transitions explicit, observable, guard-driven, auditable, and reversible.

Constraints

  • Desired state is not current state.
  • Current state must be derived from runtime observation.
  • Runtime reconciliation is mandatory for mode changes.
  • systemd targets, services, and slices are the enforcement mechanism.
  • Runtime switching comes before NixOS specialisations.
  • studio-local is conditional and must not dominate the architecture.
  • Host-local modes must remain separate from capability placement.
  • Failure, degraded, and blocked states must be modeled explicitly.

System Context

Actors and adjacent systems:

  • Local operator: requests mode changes, checks status, recovers failures.
  • NixOS host: owns systemd enforcement, hardware, services, and runtime state.
  • GPU/display/audio hardware: shared resources with conflicting latency and throughput requirements.
  • vLLM: compute workload, active only in compute for v1.
  • k3s: platform workload, stable across modes for v1.
  • Possible external studio host: future placement for audio/studio capability.

Building Blocks

  • Nix flake: declares the host configuration and packaged tools.
  • modules/dubnium: mode policy, options, targets, slices, controller units, state files, and guard installation.
  • modules/workloads: workload-specific service definitions such as Hyprland, audio, NVIDIA, vLLM, and k3s.
  • mode CLI: operator surface for requests, status, desired/current state, and explanation.
  • Reconciler: privileged transition executor.
  • Observer: evidence-based classifier for current mode.
  • Guards: small checks that return pass, policy block, or execution error.
  • systemd: target, service, slice, and cgroup enforcement layer.

Runtime View

All mode changes follow the same control-loop shape:

  1. Authorize the request.
  2. Write desired state.
  3. Acquire the controller lock.
  4. Observe current state from runtime facts.
  5. Validate target and capability placement.
  6. Run transition guards.
  7. Execute bounded actions through systemd and helper scripts.
  8. Re-observe.
  9. Classify success, degraded state, blocked state, or failure.
  10. Write transition and guard records.

Success is never inferred from attempted actions. Success requires post-transition observation that satisfies the target mode predicates.

Deployment View

Primary deployment target:

  • one x86_64-linux NixOS workstation host named workstation
  • Hyprland desktop
  • NVIDIA/CUDA runtime
  • planned dual-GPU topology, with hardware-tolerant transitional config
  • vLLM model/cache state outside the Nix store
  • k3s control-node duties

Runtime state:

  • live state under /run/mode-controller
  • future persistent audit history under /var/lib/mode-controller or /persist/var/lib/mode-controller when impermanence lands

Cross-Cutting Concerns

  • Safety: guards block destructive transitions and distinguish policy blocks from execution errors.
  • Observability: status must show desired state, observed state, conflicts, guard failures, and latest transition result.
  • Auditability: every reconciliation attempt should produce structured records.
  • Resource ownership: GPU, CPU, memory, I/O, audio, AI, and platform planes must not silently overlap in conflicting ways.
  • Security: unprivileged users must not forge desired/current state or transition success.

Current Risks

  • NVIDIA/Wayland GPU release may not be reliable enough for runtime-only compute promotion.
  • Mixed runtime states may confuse a shell observer unless conflicts are handled conservatively.
  • systemctl isolate can stop required services if target dependencies are not explicit enough.
  • Rollback must prove restored desktop behavior through observation, not just successful systemd commands.

See also: