Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Dual-Mode NixOS Workstation / AI Node

Unified Planning + Mode State Machine Document (v0.3 — Living)


1. Purpose

Design a single NixOS system that operates as a policy-driven multi-mode host with support for future workload externalization:

  • Desktop / Dev workstation
  • Optional local Studio / Audio profile
  • Compute / Headless AI node

The broader workstation environment may also externalize selected capabilities, especially Studio/Audio, to a separate machine such as a Mac mini.

The system must support:

  • low-latency audio workloads (DAW / live)
  • GUI desktop usage via Hyprland
  • GPU inference via vLLM
  • k3s control-plane duties for Micrantha Laboratory / Hyperion
  • explicit, auditable, reproducible transitions between modes

This document defines:

  • planning assumptions
  • architectural boundaries
  • host-local mode definitions
  • capability placement model
  • invariants
  • state machine
  • guards and guard functions
  • source-of-truth model
  • reconciliation model
  • implementation mapping to systemd
  • design alternatives and tradeoffs

2. Core Principles

2.1 Modes Are Operational Contracts

A mode is not just a set of enabled services. A mode defines:

  • resource ownership
  • permitted workloads
  • latency/throughput expectations
  • security posture
  • transition preconditions

2.2 Explicit Over Implicit

Mode transitions should be:

  • explicit when possible
  • observable
  • reversible
  • logged
  • idempotent

Automation may request a transition, but the controller must decide whether it is safe.

2.3 Latency and Throughput Are Competing Objectives

  • Desktop / Studio-Local optimize for responsiveness and bounded latency
  • Compute optimizes for throughput and hardware utilization

The design must not pretend both can be maximized simultaneously.

2.4 One Physical Host, Multiple Logical Planes

This system is treated as:

one shared substrate hosting multiple logical operating modes

2.5 Declarative First, Runtime Reconciliation Second

  • NixOS declares steady-state intent and system structure
  • a mode controller reconciles runtime state toward desired operational mode

2.6 Host-Local Modes Must Survive Capability Relocation

The host-local state model should remain coherent even if some capabilities, especially Studio/Audio, move to another machine.


3. System Overview

flowchart TD
    HW[Hardware]

    subgraph BaseOS[NixOS Base Layer]
        Kernel
        Drivers[NVIDIA / CUDA]
        Network
        Storage
        Nix
        systemd
    end

    subgraph Control[Mode Control Plane]
        Desired[Desired State]
        Current[Current State]
        Reconcile[Reconciler]
        Guards[Guard Checks]
    end

    subgraph LocalModes[Host-Local Modes]
        Desktop[Desktop / Dev]
        StudioLocal[Studio-Local / Audio-Priority]
        Compute[Compute / Headless]
    end

    subgraph Placement[Capability Placement]
        StudioCap[Studio Capability]
        AICap[AI Capability]
        PlatformCap[Platform Capability]
    end

    subgraph Workloads[Workloads]
        Hyprland
        PipeWire
        Reaper
        vLLM
        k3s
    end

    HW --> BaseOS
    BaseOS --> Control
    Control --> LocalModes
    LocalModes --> Workloads
    LocalModes --> Placement

4. Mode Definitions and Capability Placement

This document distinguishes between:

  • host-local operational modes for the NixOS machine
  • capability placement for functions that may later move to another machine

4.1 Host-Local Modes

Desktop / Dev Mode

Intent

Balanced interactive mode for programming, office work, light desktop use, and bounded AI.

Properties

  • GUI enabled
  • audio enabled for ordinary desktop use
  • GPU0 reserved for display/compositor
  • GPU1 may be used by AI workloads
  • vLLM constrained to single-GPU operation or disabled
  • k3s control plane may remain active
  • CPU/RAM contention must remain bounded

Studio-Local / Audio-Priority Profile

Intent

A stricter local operating profile for low-latency audio work when Studio remains on the NixOS host.

Properties

  • modeled as a protected interactive profile closely related to Desktop
  • GUI enabled
  • audio stack prioritized
  • display GPU reserved exclusively for desktop responsibilities
  • AI workloads disabled or reduced to near-zero
  • heavy I/O and background maintenance jobs disallowed
  • scheduler and system policy biased toward stable audio behavior

Design note

This profile is considered conditional and potentially temporary. It exists so the NixOS host can support local audio/studio workflows now, without assuming that Studio remains a permanent first-class local mode forever.

Implementation note

For the first implementation pass, studio-local should be modeled as a policy overlay on desktop, not as a first-class top-level systemd target. The operational state still exists in the controller/state model, but its enactment should initially be handled by marker/helper units layered onto the desktop path.


Compute / Headless Mode

Intent

Throughput-oriented headless mode for AI serving and platform duties.

Properties

  • GUI disabled
  • audio stack off or irrelevant
  • both GPUs available to AI workloads
  • vLLM may use both GPUs
  • k3s workloads may run more aggressively
  • CPU/RAM/storage can be utilized much more aggressively than in interactive modes

4.2 Capability Placement Model

Certain capabilities may be placed either:

  • locally on the NixOS host
  • externally on another machine

Capability: Studio / Audio

Possible placements:

  • local
  • external-mac-mini

Capability: AI / Inference

Expected placement:

  • primarily local-nixos-host

Capability: Platform / k3s Control

Expected placement:

  • primarily local-nixos-host

4.3 Design Implication

The host-local state machine should remain valid even if Studio/Audio is moved to a Mac mini. That means Studio-specific policy should be represented as a local profile or conditional mode, not as the permanent center of the entire host architecture.


5. Resource Ownership Model

5.0 Implementation Note — Hardware-Tolerant Bring-Up

The architecture should continue to plan for the intended dual-GPU topology, but the NixOS implementation should remain tolerant of transitional hardware states while the second GPU is not yet installed or configured.

That means:

  • the policy model may still describe the intended two-GPU end state
  • module options should encode planned GPU ownership explicitly
  • active service profiles must only reference GPUs that are currently present
  • missing future hardware must not cause ordinary evaluation or steady-state services to fail unnecessarily

5.1 GPU Ownership

ModeGPU0GPU1
DesktopDisplay / compositorAI optional
Studio-LocalDisplay / compositor (protected)AI off or minimal
ComputeAIAI

5.2 CPU Ownership

  • Shared via cgroups/systemd slices
  • interactive slices retain priority/headroom in Desktop and Studio-Local
  • compute slices may saturate cores in Compute

5.3 Memory Ownership

  • bounded AI memory usage in Desktop
  • stricter constraints in Studio-Local
  • relaxed/high utilization in Compute

5.4 Storage Ownership

  • heavy background I/O restricted in Studio-Local
  • permitted but bounded in Desktop
  • broadly permitted in Compute

5.5 Audio Ownership

  • effectively exclusive in Studio-Local
  • protected in Desktop
  • not guaranteed in Compute

6. Invariants

These are system-level properties that must remain true regardless of transition path or future Studio placement.

6.1 Safety Invariants

  1. At most one host-local operational mode is authoritative at a time.
  2. A transition must either complete to a stable target state or abort back to a known-safe prior state.
  3. Mode transitions must be idempotent. Re-running a transition toward an already-satisfied state must not cause harm.
  4. When Studio-Local is active, heavyweight compute workloads must not materially jeopardize audio latency.
  5. Compute mode must not require a running graphical session.
  6. GPU0 must not be simultaneously treated as both protected display GPU and unrestricted compute GPU.
  7. The controller must not promote the system into Compute if guard failures indicate active user/audio risk.
  8. The system must always expose a way to determine current mode, desired mode, and last transition result.
  9. The host-local mode model must remain coherent if Studio/Audio capability is externalized to another machine.

6.2 State Invariants

  1. Desired state is authoritative intent.
  2. Current state is observed runtime fact.
  3. Reconciliation moves current state toward desired state; it never rewrites observed state to match wishful intent.
  4. A guard failure blocks transition, but does not silently change desired state unless policy explicitly says so.

6.3 Operational Invariants

  1. Models and mutable runtime data must live outside the Nix store.
  2. Dotfiles may influence user experience, not machine-critical mode policy.
  3. Mode policy must remain expressible and inspectable via systemd and Nix configuration.
  4. Capability placement decisions must not silently invalidate host-local invariants.

7. Desired State vs Current State

7.1 Desired State

The host-local mode the user or automation wants the system to be in.

Examples:

  • desktop
  • studio-local
  • compute

7.2 Current State

The host-local mode the system is actually in, as determined by observation.

Examples:

  • graphical target active, PipeWire active, vLLM limited → likely desktop
  • graphical target inactive, compute services active, both GPUs exposed to AI → likely compute
  • GUI active, audio priority raised, compute services reduced → likely studio-local

7.3 Why This Split Matters

Without this split, the system can lie to itself:

  • a command says “switch to compute”
  • but GPU is still held by compositor
  • vLLM failed to scale up
  • audio services are still active

In that case:

  • desired state = compute
  • current state = transitioning or desktop (degraded)

The control plane must detect and reconcile this rather than assuming success.


8. Source of Truth for Mode

The system needs one authoritative representation of requested host-local mode.

8.1 Options Considered

Option A — File-Based Source of Truth

Example:

  • /run/mode-controller/desired
  • /var/lib/mode-controller/desired

Pros

  • simple
  • easy to inspect
  • works outside active user session
  • easy for scripts and systemd units

Cons

  • can drift from actual runtime state
  • needs permissions and lifecycle handling

Option B — Environment Variable Source of Truth

Example:

  • MODE=compute

Pros

  • simple for one-shot commands
  • easy in shell contexts

Cons

  • poor system-wide authority
  • ephemeral
  • fragile across sessions/reboots
  • bad fit for authoritative machine state

Option C — systemd State as Source of Truth

Example:

  • compute.target active implies desired mode is compute

Pros

  • tightly aligned with implementation
  • introspectable
  • avoids duplicate state stores

Cons

  • desired state and current state can become conflated
  • harder to represent “requested but not yet achieved”
  • recovery/abort semantics become more awkward

Use a hybrid model:

  • Desired state source of truth: file in /run/mode-controller/desired
  • Current state source of truth: observed systemd/runtime facts
  • Transition machinery: systemd targets + controller service

This cleanly separates:

  • intent
  • observation
  • enforcement

8.3 Proposed Files

  • /run/mode-controller/desired
  • /run/mode-controller/current
  • /run/mode-controller/last-transition.json

current may be a cached observation, but observation should always be derivable from system state.


9. State Machine

9.1 States

S0: Boot

Initial state before default operating mode is established.

S1: Desktop

Interactive general-purpose mode.

S2: StudioLocal

Strict interactive low-latency local audio profile.

S3: Compute

Headless throughput-oriented mode.

S4: Transitioning

Ephemeral reconciliation state while moving toward desired mode.

S5: FailedTransition

A recoverable error state indicating that desired state was not achieved.

9.2 State Diagram

stateDiagram-v2
    [*] --> Boot

    Boot --> Desktop : default boot

    Desktop --> StudioLocal : request(studio-local)
    StudioLocal --> Desktop : request(desktop)

    Desktop --> Transitioning : request(compute)
    StudioLocal --> Transitioning : request(compute)
    Compute --> Transitioning : request(desktop)
    Desktop --> Transitioning : request(desktop) / reconcile
    StudioLocal --> Transitioning : request(studio-local) / reconcile
    Compute --> Transitioning : request(compute) / reconcile

    Transitioning --> Desktop : reached(desktop)
    Transitioning --> StudioLocal : reached(studio-local)
    Transitioning --> Compute : reached(compute)
    Transitioning --> FailedTransition : guard_fail / action_fail / timeout

    FailedTransition --> Desktop : recover(previous=desktop)
    FailedTransition --> StudioLocal : recover(previous=studio-local)
    FailedTransition --> Compute : recover(previous=compute)

9.3 Notes

  • Direct StudioLocal -> Compute may be allowed only through guarded reconciliation, not blind immediate promotion.
  • Reconciliation should be able to handle “already in desired mode” as a no-op success.
  • Externalized Studio capability must not require redesign of the host-local state machine; it should only disable or deprecate studio-local usage.

10. Guards

Guards are explicit check functions. They return exit codes and optionally structured diagnostics.

10.1 Guard Interface

Each guard function should follow a predictable interface:

check_<name>
exit 0   = pass
exit 10+ = policy failure / guard blocked
exit 20+ = check execution error / indeterminate

Structured output should ideally emit JSON or key=value diagnostics to stdout/stderr for logs.

10.2 Guard Set

G1: check_audio_idle

Purpose:

  • verify no active low-latency local audio session that would make compute transition unsafe

Possible checks:

  • no active REAPER process
  • no active PipeWire/JACK graph beyond baseline

Exit codes:

  • 0 pass
  • 10 audio active
  • 20 unable to inspect audio graph

G2: check_gpu_display_released

Purpose:

  • verify display/compositor has released GPU before compute promotion

Possible checks:

  • no active Hyprland session
  • no relevant graphical GPU consumers

Exit codes:

  • 0 pass
  • 11 display GPU still owned by GUI
  • 21 GPU inspection failure

G3: check_cpu_load_safe

Purpose:

  • ensure transition is not occurring during obviously unsafe heavy local activity when policy requires quieting first

Exit codes:

  • 0 pass
  • 12 CPU load too high
  • 22 unable to inspect load

G4: check_user_jobs_safe

Purpose:

  • detect known long-running interactive/user jobs that should block auto-transition

Possible checks:

  • selected process patterns
  • optional allowlist/denylist

Exit codes:

  • 0 pass
  • 13 user jobs active
  • 23 inspection failure

G5: check_memory_headroom

Purpose:

  • ensure sufficient memory exists to perform transition or launch target services

Exit codes:

  • 0 pass
  • 14 insufficient headroom
  • 24 inspection failure

G6: check_vllm_drainable

Purpose:

  • ensure compute workloads can be safely reduced when returning to Desktop/Studio-Local

Exit codes:

  • 0 pass
  • 15 compute workload not drainable
  • 25 inspection failure

G7: check_studio_capability_local

Purpose:

  • verify that local Studio capability is still available on the NixOS host before allowing studio-local

Possible checks:

  • local policy flag indicates studio capability still hosted locally
  • local audio stack and workflow prerequisites are not intentionally disabled due to externalization

Exit codes:

  • 0 pass
  • 19 requested local studio capability not available
  • 29 inspection failure

10.3 Guard Policy by Transition

TransitionRequired Guards
Desktop -> StudioLocalcheck_target_reachable, check_studio_capability_local, check_user_jobs_safe (optional policy), compute downscale checks
StudioLocal -> Desktopcheck_target_reachable
Desktop -> Computecheck_target_reachable, check_audio_idle, check_gpu_display_released, check_cpu_load_safe, check_user_jobs_safe, check_memory_headroom
StudioLocal -> Computecheck_target_reachable, check_audio_idle, check_gpu_display_released, check_cpu_load_safe, check_user_jobs_safe, check_memory_headroom
Compute -> Desktopcheck_target_reachable, check_vllm_drainable, check_memory_headroom
Compute -> StudioLocalcheck_target_reachable, check_studio_capability_local, check_vllm_drainable, check_memory_headroom

11. Actions and Transition Semantics

Actions are the concrete operations used to move from one state to another.

11.1 Action Vocabulary

  • stop/terminate GUI session
  • isolate a target
  • stop/start units
  • wait for quiescence
  • update desired/current state files
  • restart services with different environment/policies

11.2 Action Interface

Each action should return:

  • 0 success
  • non-zero failure with logged reason

12. Exact Transition Mapping to systemd Operations

This is the implementation-oriented mapping.

12.1 Assumptions

Systemd targets:

  • desktop.target
  • compute.target

studio-local is intentionally not a first-class target in v1. It is represented as a desktop overlay through studio-local-policy.service and audio-priority.service.

Supporting services:

  • mode-controller.service
  • vllm.service
  • k3s.service
  • pipewire.service / user session services
  • graphical session manager or direct Hyprland session

Helper oneshot services/scripts:

  • mode-prepare-compute.service
  • mode-prepare-desktop.service
  • mode-prepare-studio-local.service
  • mode-observe.service

12.2 Desktop -> StudioLocal

Desired change

  • desired mode file = studio-local

systemd operations

  1. systemctl start mode-controller.service (with target=studio-local)
  2. controller runs guard set for Desktop -> StudioLocal
  3. controller verifies local Studio capability still exists
  4. controller stops or constrains AI workloads as needed
    • v1 policy: systemctl stop vllm.service
  5. controller isolates or verifies desktop.target
  6. controller starts studio-local-policy.service
  7. controller starts audio-priority.service
  8. controller updates current state observation

Example exact operations

write /run/mode-controller/desired = studio-local
systemctl start mode-controller@studio-local.service
systemctl stop vllm.service
systemctl isolate desktop.target
systemctl start studio-local-policy.service
systemctl start audio-priority.service

12.3 StudioLocal -> Desktop

Desired change

  • desired mode file = desktop

systemd operations

  1. write desired state
  2. start controller
  3. restore normal interactive policies
  4. optionally allow bounded AI services
  5. stop audio-priority.service
  6. stop studio-local-policy.service
  7. systemctl isolate desktop.target
  8. update current observation

Example exact operations

write /run/mode-controller/desired = desktop
systemctl start mode-controller@desktop.service
systemctl stop audio-priority.service
systemctl stop studio-local-policy.service
systemctl isolate desktop.target

12.4 Desktop -> Compute

Desired change

  • desired mode file = compute

systemd operations

  1. write desired state
  2. start controller for compute
  3. run guards:
    • check_target_reachable
    • check_audio_idle
    • check_gpu_display_released (or prepare to release)
    • check_cpu_load_safe
    • check_user_jobs_safe
    • check_memory_headroom
  4. if interactive session exists, controller requests/forces session termination
    • loginctl terminate-session <id>
  5. wait until compositor releases GPU
  6. stop or de-prioritize audio services if needed
  7. stop desktop-specific services not wanted in compute
  8. set service environment/profile for dual-GPU vLLM
  9. systemctl isolate compute.target
  10. start/restart vllm.service
  11. verify current state

Example exact operations

write /run/mode-controller/desired = compute
systemctl start mode-controller@compute.service
loginctl terminate-session <desktop-session>
systemctl stop graphical-session.target   # if such target exists in design
systemctl isolate compute.target
systemctl restart vllm.service

12.5 Compute -> Desktop

Desired change

  • desired mode file = desktop

systemd operations

  1. write desired state
  2. start controller for desktop
  3. run guards:
    • check_target_reachable
    • check_vllm_drainable
    • check_memory_headroom
  4. drain/stop or downscale vLLM
  5. constrain compute workloads
  6. systemctl isolate desktop.target
  7. start GUI path
  8. ensure GPU0 reserved for display
  9. start/restore audio path
  10. verify current state

Example exact operations

write /run/mode-controller/desired = desktop
systemctl start mode-controller@desktop.service
systemctl stop vllm.service              # or restart single-GPU profile
systemctl isolate desktop.target

12.6 StudioLocal -> Compute

Two possible policies:

Policy A — direct guarded transition

Allowed if all compute guards pass and Studio-Local resources are cleanly relinquished.

Policy B — normalize through Desktop first

Transition path:

  • studio-local -> desktop -> compute

Recommendation: Use Policy A in implementation, but conceptually treat it as the same reconciliation pipeline with stricter guards.


13. Reconciliation Model

13.1 Motivation

A single mode request compute command should not blindly assume success. The system should:

  1. record desired mode
  2. observe current state
  3. compare desired vs current
  4. compute required transition plan
  5. execute actions
  6. re-observe
  7. either declare success or enter failed transition state

13.2 Reconciliation Loop

flowchart TD
    Req[Request mode] --> Write[Write desired state]
    Write --> Observe[Observe current state]
    Observe --> Compare{Desired == Current?}
    Compare -->|Yes| Done[No-op success]
    Compare -->|No| Plan[Select transition plan]
    Plan --> Guards[Run guards]
    Guards -->|Fail| Fail[Record failure]
    Guards -->|Pass| Act[Execute actions]
    Act --> Reobserve[Observe current state again]
    Reobserve --> Verify{Reached desired?}
    Verify -->|Yes| Success[Record success]
    Verify -->|No| RetryOrFail[Retry boundedly or fail]

13.3 Reconciliation Semantics

  • bounded retries only
  • no infinite loops
  • every failure is logged with:
    • desired state
    • prior state
    • failing guard or action
    • timestamp

13.4 Why This Matters

This lets you support:

  • manual requests
  • idle-triggered auto-switching
  • boot-time default mode
  • recovery after partial failures

all through one mechanism.


14. Specialisations vs Runtime Switching

This is the main architectural fork.

14.1 Option A — Runtime Switching Only

Use one host definition with multiple systemd targets and runtime policies.

Pros

  • fast transitions
  • no reboot required
  • best UX for switching between Desktop and Studio-Local
  • simpler for day-to-day operation

Cons

  • weaker isolation
  • harder to fully guarantee all services/resources are cleanly re-bound
  • risk of state leakage between modes
  • some kernel/driver tuning differences are awkward live

Best fit

  • Desktop <-> Studio-Local
  • Desktop <-> Compute where flexibility matters more than hard isolation

14.2 Option B — NixOS Specialisations Only

Use separate NixOS specialisations for Desktop and Compute (and possibly Studio-Local).

Pros

  • stronger isolation between role profiles
  • easier to vary deeper system settings, kernel params, service sets
  • clearer recovery story
  • closer to “logical separate machines”

Cons

  • slower transitions, often reboot-oriented in practice
  • poorer UX for frequent switching
  • more configuration duplication risk if not structured well

Best fit

  • Desktop vs Compute if you want very strong separation
  • not ideal for rapid Studio-Local toggling

14.3 Option C — Hybrid Model

Use:

  • runtime switching for Desktop <-> Studio-Local
  • specialisation boundary between Interactive and Compute families

Example:

  • default specialisation = interactive
    • runtime modes inside it: desktop, studio-local
  • compute specialisation = headless compute

Pros

  • strongest overall architecture
  • preserves good UX for Studio-Local transitions
  • lets Compute differ more deeply if needed
  • handles future externalization of Studio more cleanly than treating Studio as a permanent top-level host identity

Cons

  • more design complexity
  • transition from interactive to compute may become reboot-oriented or at least heavier
  • more machinery to maintain

14.4 Recommendation

For your current goal, use runtime switching first, with the design shaped so it can later evolve into a hybrid model.

Reasoning

  • you need to learn actual contention boundaries first
  • Desktop <-> Studio-Local benefits heavily from live switching
  • Desktop <-> Compute can start as runtime-switched
  • if the system proves too “sticky” or leaky, you can later promote Compute into a specialisation without redesigning the higher-level state machine
  • if Studio moves to a Mac mini, the host-local model remains intact

Practical recommendation

Phase the design like this:

  1. Phase 1: one host, runtime switching only
  2. Phase 2: strong slices/targets/guards
  3. Phase 3: evaluate whether Compute should become a specialisation
  4. Phase 4: if Studio is externalized, deprecate or disable studio-local without changing the operator-facing control model

This preserves velocity while keeping the abstraction clean.


15. Service Placement

15.1 Host-Level Services

  • Hyprland
  • PipeWire
  • Reaper
  • NVIDIA drivers/runtime
  • mode controller
  • possibly vLLM initially
  • SSH / system services

15.2 k3s-Level Services

  • Hyperion services
  • platform/orchestration services
  • dashboards and supporting workloads
  • possibly model-serving abstractions later

First-pass implementation note

In v1, prefer keeping k3s.service continuously available while varying:

  • platform.slice resource budgets
  • which workloads are allowed to run aggressively
  • how much local compute capacity cluster workloads may consume

This is preferable to stopping and starting the cluster runtime during ordinary mode transitions.

15.3 Externalized Services (Possible Future)

  • Studio/Audio workflows on Mac mini
  • DAW/plugin-heavy sessions
  • live audio interfaces and controllers

15.4 Recommendation

Keep hardware-near, latency-sensitive, and GPU-debug-sensitive components on the host first. Move services into k3s only after the host-level mode model is stable. Treat Mac mini externalization as a placement decision, not as a redesign trigger for the host-local state machine.


16. Idle Detection Policy

16.1 Role of Idle Detection

Idle detection is an input signal to the reconciler, not authority on its own.

16.2 Signals

  • input inactivity
  • audio activity
  • GPU utilization / ownership
  • CPU load
  • selected user-job checks

16.3 Policy

Idle-triggered promotion to Compute should:

  • update desired state to compute
  • run the normal reconciliation pipeline
  • abort safely if guards fail

It must never bypass guards.

16.4 Studio-Local Policy

Auto-promotion from studio-local to compute should generally be disabled unless explicitly requested. This remains true even if Studio capability later moves off-box.


17. Security Boundaries

Zones

  • user desktop zone
  • system service zone
  • AI workload zone
  • cluster service zone
  • optional external Studio zone

Controls

  • bind services to appropriate interfaces
  • keep secrets outside dotfiles, e.g. SOPS/agenix
  • keep mode control operations privileged and auditable
  • do not let externalized capability assumptions silently weaken host-local controls

18. Risks and Failure Modes

18.1 Audio Degradation

Cause:

  • background contention

Mitigation:

  • Studio-Local invariants
  • strict guard/action policy

18.2 GPU Contention

Cause:

  • compositor and AI workloads racing for ownership

Mitigation:

  • explicit GPU ownership model
  • guard checks before Compute promotion

18.3 Partial Transition

Cause:

  • GUI exits but vLLM fails to restart
  • desired state written but current state never converges

Mitigation:

  • reconciliation loop
  • bounded retries
  • failed-transition state

18.4 Configuration Drift

Cause:

  • policy split across ad hoc scripts and dotfiles

Mitigation:

  • keep mode policy in Nix + systemd-controlled scripts

18.5 Capability Drift

Cause:

  • Studio capability moved to Mac mini, but local state machine or guards still assume it is local

Mitigation:

  • explicit capability placement model
  • check_studio_capability_local
  • ADR-backed deprecation path for studio-local

19. Open Questions

  1. Should vLLM be host-managed or profile-switched through separate unit templates?
  2. When should Compute graduate into a NixOS specialisation?
  3. How strict should auto-transition be about user jobs and unsaved work heuristics?
  4. Should current state be derived on demand only, or also cached to /run/mode-controller/current?
  5. At what point should local Studio capability be considered officially externalized to a Mac mini?
  6. What data/project sync model is required if Studio is split across machines?

19.1 Resolved Near-Term Decision

For v1:

  • studio-local is not a first-class target
  • studio-local is represented as a protected interactive policy overlay on desktop
  • desktop and compute are the only first-class top-level target families

This keeps the first implementation smaller while preserving the higher-level operational model and leaving room to strengthen Studio semantics later if needed.

19.2 Future Alternatives

Alternative A — Keep studio-local as an overlay permanently

Pros:

  • less target duplication
  • easier future deprecation if Studio moves to a Mac mini
  • simpler runtime switching model

Cons:

  • weaker systemd-level separability
  • more policy encoded in helper units and controller logic

Alternative B — Promote studio-local into a first-class target later

Pros:

  • stronger explicitness in systemd
  • easier inspection of Studio-specific dependencies
  • potentially clearer resource-policy boundaries

Cons:

  • higher maintenance cost
  • more duplication with desktop
  • less aligned with the likely future externalization path

Recommendation

Start with the overlay model. Revisit only if empirical evidence shows that audio-protection policy is too hard to express or validate without a dedicated target.

19.3 Resolved Near-Term Decision — vLLM Service Shape

Target architecture:

  • vllm@desktop.service
  • vllm@compute.service

However, for the first implementation pass, a single vllm.service is acceptable if:

  • desktop and compute profiles are still modeled explicitly in configuration
  • controller actions remain profile-aware
  • observation logic can still determine which profile is active

This allows the first bootable milestone to stay small without locking the architecture into a monolithic service model.

19.4 Resolved Near-Term Decision — k3s Service Shape

For v1:

  • k3s.service should remain stable across host-local modes
  • mode differences should be expressed through:
    • slice/resource budgets
    • workload-placement or workload-intensity policy
    • optional node labels/taints later

This keeps the control plane smaller and avoids coupling every host-mode transition to cluster-runtime teardown and recovery.

Future alternative

If empirical operation shows that stable-across-modes k3s still creates unacceptable interference or ambiguity, stronger k3s mode switching can be introduced later. That should be treated as a deliberate escalation, not the default starting point.

19.5 Resolved Near-Term Decision — Desktop AI Policy

For v1:

  • keep vLLM off in desktop for the first convergence milestone
  • prove desktopcompute transitions before enabling bounded desktop-mode AI

Future alternative

After the control plane is reliable, bounded desktop-mode AI may be introduced as an explicit profile with clear GPU1 ownership and resource limits.

19.6 Resolved Near-Term Decision — studio-local Overlay Shape

For v1, represent studio-local with:

  • studio-local-policy.service
  • audio-priority.service

This gives the controller and observation logic a clear marker plus an explicit enforcement unit without promoting Studio into a first-class top-level target.

Future alternative

If this proves too implicit, studio-local can later be promoted into a stronger grouped target or target-like overlay.

19.7 Resolved Near-Term Decision — Capability Placement Source

For v1, capability-placement.json should be generated from Nix configuration rather than edited ad hoc at runtime.

Rationale

  • keeps placement policy reproducible
  • avoids silent runtime drift
  • matches the design goal that machine-critical policy remain inspectable in Nix and systemd-managed artifacts

Future alternative

If operational experimentation later requires it, an explicit runtime override layer may be added with well-defined precedence and auditability.

19.8 Resolved Near-Term Decision — mode force

For v1, defer mode force.

Rationale

  • keeps attention on making the ordinary reconciliation path correct
  • avoids masking immature guard or transition logic
  • reduces the chance of bypassing safety boundaries during initial bring-up

Future alternative

Add mode force later only after hard-vs-soft guard semantics are stable and well tested.

19.9 Resolved Near-Term Decision — GUI Teardown Semantics

For v1, compute promotion should require:

  • graphical session absence
  • explicit GPU-release verification

It should not initially depend on forcibly stopping every greeter or display-manager path unless empirical testing shows those components interfere with reliable GPU handoff.

19.10 Resolved Near-Term Decision — Desktop Target Ownership

For v1, desktop.target should not directly own the greeter/login path.

Rationale

  • keeps mode ownership focused on operational policy rather than full session-manager orchestration
  • reduces coupling to whichever login/session stack is chosen
  • lets session presence remain an observed fact rather than an aggressively managed requirement

Future alternative

If desktop recovery proves unreliable without tighter control, greeter or display-manager paths can later be pulled under stronger mode ownership.

19.11 Resolved Near-Term Decision — studio-local-policy.service Scope

For v1, studio-local-policy.service should be:

  • a reliable marker for observation/classification
  • a light policy-application unit
  • explicitly limited in scope

It should not become a giant all-in-one Studio behavior controller.

Rationale

  • preserves clear observability
  • avoids burying controller logic inside a catch-all helper unit
  • keeps Studio overlay behavior inspectable and decomposable

19.12 Resolved Near-Term Decision — observe-current Implementation Language

For v1, implement observe-current in shell.

Constraints

  • keep the output contract stable:
    • plain mode name for shell use
    • structured JSON for diagnostics
  • structure the implementation so it can later be replaced by a typed helper without changing callers

Future alternative

If classifier complexity or JSON handling becomes unwieldy, replace only the classifier implementation with a small typed helper while keeping the same external contract.

19.13 Resolved Near-Term Decision — mode CLI Packaging

For v1:

  • keep the script sources in the repository
  • package them in pkgs/
  • install them through the NixOS module

Rationale

  • keeps the tool packaging clean and testable
  • avoids scattering ad hoc scripts directly into module definitions
  • preserves a clean path to reuse across hosts later

19.14 Resolved Near-Term Decision — Reconciler Trigger Model

For v1:

  • use parameterized oneshot reconciliation only
  • do not enable timer-driven or path-triggered background reconciliation yet

Rationale

  • keeps failure behavior easier to understand during bring-up
  • avoids masking transition bugs behind background retries
  • lets manual transitions prove the model first

Future alternative

After manual transitions are reliable, add periodic or path-triggered reconciliation for self-healing behavior.

19.15 Resolved Near-Term Decision — Boot Policy

For v1:

  • normalize to desktop on boot
  • do not replay persistent desired mode across reboot

Rationale

  • gives the system a predictable safe recovery posture
  • avoids booting directly back into a problematic compute path while the controller is still maturing
  • keeps early operational behavior easier to reason about

Future alternative

Once transitions are reliable, desired-state persistence across reboot can be introduced as an explicit policy feature.


19A. Architectural Decision Record — Potential Studio Externalization

Context

There is a realistic possibility that low-latency Studio/Audio workloads will migrate from the NixOS machine to a Mac mini.

Decision

The NixOS host architecture should treat Studio as a conditional local profile (studio-local) rather than a permanently central host mode.

Consequences

  • the host-local state machine remains stable if Studio moves off-box
  • Compute and Desktop remain the durable primary host-local modes
  • Studio capability can be represented separately through workload placement decisions
  • local audio support can still exist now without overcommitting the architecture to a permanent local Studio role

Follow-on Design Implications

  • add check_studio_capability_local guard for any studio-local transition
  • keep local audio policy isolated from core Compute/Desktop mechanics where practical
  • document future sync, control, and workflow boundaries if Studio becomes externalized

20. Control Interface and Implementation Contract

20.1 mode CLI Contract

The system should expose a single operator-facing interface:

mode status
mode request <desktop|studio-local|compute>
mode reconcile
mode current
mode desired
mode explain <desktop|studio-local|compute>
mode dry-run <desktop|studio-local|compute>
mode force <desktop|studio-local|compute>

Command Semantics

mode status

Returns:

  • desired mode
  • observed current mode
  • whether reconciliation is needed
  • last transition result
  • blocking guard failures, if any

mode request <mode>

Behavior:

  1. write desired state
  2. invoke reconciliation
  3. return success only if reconciliation converged

mode reconcile

Behavior:

  • observe current state
  • compare to desired
  • select transition plan
  • run guards
  • execute actions
  • record results

mode current

Returns only the observed current mode.

mode desired

Returns only the desired mode file contents.

mode explain <mode>

Prints:

  • target state properties
  • expected services
  • resource ownership rules
  • guards required for entering that mode
  • capability placement assumptions, where relevant

mode dry-run <mode>

Simulates the full reconciliation plan without mutating state.

mode force <mode>

Privileged path that bypasses selected non-safety guards, but must never bypass hard safety guards such as GPU/display or active audio protections unless explicitly designed to allow that.

Implementation note:

  • defer this command in v1
  • keep it in the long-term interface contract so the design remains forward-compatible

21. State Storage Layout

21.1 Runtime State Paths

/run/mode-controller/
  desired
  current
  lock
  last-transition.json
  last-guards.json
  reconcile.pid
  capability-placement.json
  hardware-topology.json

21.2 File Semantics

desired

Contains the requested mode:

  • desktop
  • studio-local
  • compute

current

Cached observation of current state. This is convenience state only; it must be derivable from system facts.

lock

Used to serialize reconciliation so only one transition runs at a time.

last-transition.json

Stores:

  • requested mode
  • prior observed mode
  • final observed mode
  • success/failure
  • guard results
  • action results
  • timestamps

last-guards.json

Stores latest guard results for diagnostics.

capability-placement.json

Stores environment-level placement facts, for example:

  • studio: local
  • studio: external-mac-mini

This file is not the host-local mode source of truth. It is an environment metadata input used by guards and planning logic.

hardware-topology.json

Stores the currently configured hardware view, for example:

  • planned GPU count
  • currently present GPU indexes
  • display GPU assignment
  • desktop-mode AI GPU set
  • compute-mode AI GPU set

This allows the implementation to preserve the intended dual-GPU architecture while remaining tolerant of temporary single-GPU bring-up phases.


22. systemd Unit and Target Layout

22.1 Targets

desktop.target

Wants:

  • graphical-session target path
  • bounded interactive services
  • optional constrained AI services

First-pass implementation note:

  • do not make desktop.target directly own greeter/login-manager startup in v1
  • treat graphical session presence as an observed runtime fact
  • strengthen ownership later only if empirical recovery behavior requires it

compute.target

Wants:

  • headless service profile
  • vLLM compute profile
  • k3s compute-allowed policy/profile

22.2 Core Services

mode-controller@.service

Parameterized oneshot service.

Instance values:

  • mode-controller@desktop.service
  • mode-controller@studio-local.service
  • mode-controller@compute.service

Responsibilities:

  • load desired mode
  • observe current mode
  • run reconciliation
  • update state files and logs

First-pass implementation note:

  • use this parameterized oneshot service as the sole reconciler trigger in v1
  • defer timer/path-triggered background reconciliation until manual operation is proven reliable

mode-observe.service

Optional oneshot helper to compute observed current mode and refresh /run/mode-controller/current.

vllm@.service

Optional templated service for profile-specific operation:

  • vllm@desktop.service
  • vllm@studio-local.service
  • vllm@compute.service

Alternative:

  • single vllm.service with environment file switching

First-pass implementation guidance:

  • prefer separate desktop and compute profiles conceptually
  • studio-local should not require its own dedicated vLLM unit in v1 if Studio is implemented as a desktop overlay
  • a single vllm.service is acceptable initially if it preserves a clean migration path to templated units later
  • keep desktop-mode vLLM disabled for the first transition-proof milestone

mode-guard@.service

Optional wrapper pattern for reusable guard execution, though plain scripts may be simpler initially.

studio-local overlay units

Recommended first-pass representation:

  • audio-priority.service
  • studio-local-policy.service
  • optional environment/policy file consumed by observation and guard logic

These units should layer on top of desktop.target rather than replacing it with a distinct top-level target in v1.

Recommended scope for studio-local-policy.service:

  • expose a clear mode marker
  • apply only light, explicit Studio-specific policy
  • delegate heavyweight orchestration to the controller or dedicated helper units

22.3 Suggested Slice Layout

system.slice
├── interactive.slice
│   ├── graphical-session scope/services
│   ├── audio-related helpers
│   └── bounded desktop workloads
├── ai.slice
│   ├── vllm service
│   └── AI helpers
└── platform.slice
    ├── k3s service
    └── supporting infra services

Slice Intent

  • interactive.slice gets priority and headroom in Desktop/Studio-Local
  • ai.slice is heavily constrained in Studio-Local, moderately constrained in Desktop, relaxed in Compute
  • platform.slice remains comparatively stable but may have tighter resource budgets in interactive modes and relaxed budgets in Compute

23. Current State Observation Logic

Current state must be observed, not assumed.

23.1 Observation Inputs

GUI Indicators

  • graphical.target or session-specific equivalent active
  • active user session via loginctl
  • Hyprland process/session present

Audio Indicators

  • PipeWire user service active
  • active audio clients or REAPER process
  • optional JACK graph activity

AI Indicators

  • vllm*.service active
  • environment/profile indicates single-GPU or dual-GPU mode
  • optional nvidia-smi-based observation of active GPU usage

Platform Indicators

  • k3s.service active
  • optional workload-class indicators

23.2 Observation Heuristic

Observed mode should be derived using a deterministic classifier.

Proposed classifier logic

Observe compute

If all of the following are true:

  • no active graphical session
  • compute target active or compute service profile active
  • vLLM compute profile active or both GPUs assigned to AI policy

Then observed current mode = compute

Observe studio-local

If all of the following are true:

  • graphical session active
  • audio stack active
  • studio-local policy marker active
  • AI profile disabled or highly constrained

Then observed current mode = studio-local

Observe desktop

If all of the following are true:

  • graphical session active
  • desktop policy marker active
  • no studio-local policy marker

Then observed current mode = desktop

Observe transitioning

If:

  • desired != inferred stable mode
  • controller is running or lock file exists

Then observed current mode = transitioning

Observe failed-transition

If:

  • last transition failed
  • current does not match desired
  • no controller currently reconciling

Then observed current mode = failed-transition

23.3 Recommendation

Use a small classifier script:

/usr/local/libexec/mode-controller/observe-current

Outputs:

  • plain mode name for shell use
  • optional JSON with evidence for debugging

First-pass implementation note:

  • implement this in shell first
  • preserve a stable output contract so the implementation language can change later without changing the control plane

24. Guard Function Contract

24.1 Guard Naming

check_audio_idle
check_gpu_display_released
check_cpu_load_safe
check_user_jobs_safe
check_memory_headroom
check_vllm_drainable
check_graphical_session_absent
check_graphical_session_present
check_target_reachable
check_studio_capability_local

24.2 Exit Code Convention

0   pass
10  policy block: audio active
11  policy block: display GPU still owned
12  policy block: CPU load too high
13  policy block: user jobs active
14  policy block: insufficient memory headroom
15  policy block: vLLM not drainable
16  policy block: graphical session absent when required
17  policy block: graphical session present when forbidden
18  policy block: target unreachable / invalid request
19  policy block: requested local studio capability not available
20+ execution/inspection errors
30+ internal controller misuse

24.3 Guard Output Contract

Each guard should emit a concise structured line or JSON object such as:

{"guard":"check_audio_idle","ok":false,"code":10,"reason":"reaper process active"}

24.4 Hard vs Soft Guards

Hard guards

Must never be bypassed by ordinary automation:

  • active audio protection for Studio-Local -> Compute or Desktop -> Compute
  • GPU/display ownership guard
  • target validity checks
  • local Studio capability checks for studio-local

Soft guards

May be bypassed by privileged operator action or policy:

  • generic CPU load threshold
  • selected user-job heuristics
  • non-critical memory thresholds

25. Transition Plans with Exact Operations

This section normalizes each transition into explicit steps.

25.1 Common Transition Framework

All transitions should follow:

  1. acquire lock
  2. observe current state
  3. validate requested mode
  4. if current == desired, exit success
  5. select transition plan
  6. run transition guards
  7. execute pre-actions
  8. isolate or start target
  9. execute post-actions
  10. re-observe current state
  11. record success/failure
  12. release lock

25.2 Plan: Desktop -> StudioLocal

Preconditions

  • desktop currently observed
  • request = studio-local
  • local Studio capability is still hosted on the NixOS machine

Guards

  • check_target_reachable
  • check_studio_capability_local
  • optional check_user_jobs_safe

Exact operations

write desired=studio-local
flock /run/mode-controller/lock
observe current
run guards
systemctl start audio-priority.service      # if modeled separately
systemctl start studio-local-policy.service
observe current
record result

Notes

  • GUI remains up
  • audio policy is strengthened
  • AI capacity is reduced or removed
  • if Studio capability has been externalized, this transition must fail cleanly with an explanatory reason

25.3 Plan: StudioLocal -> Desktop

Guards

  • check_target_reachable

Exact operations

write desired=desktop
flock /run/mode-controller/lock
observe current
run guards
systemctl stop audio-priority.service       # if separate helper exists
systemctl stop studio-local-policy.service
systemctl isolate desktop.target
observe current
record result

25.4 Plan: Desktop -> Compute

Guards

  • check_target_reachable
  • check_audio_idle
  • check_cpu_load_safe
  • check_user_jobs_safe
  • check_memory_headroom

Pre-actions

  • terminate graphical session
  • wait for GUI disappearance
  • verify GPU/display release

Exact operations

write desired=compute
flock /run/mode-controller/lock
observe current
run initial guards
loginctl terminate-session <session-id>
wait until observe-current no longer sees graphical session
run check_gpu_display_released
systemctl isolate compute.target
systemctl start vllm@compute.service
observe current
record result

Additional notes

  • systemctl isolate compute.target should conflict with interactive/graphical targets in your target design
  • GPU release must be verified after GUI shutdown, not merely assumed

25.5 Plan: Compute -> Desktop

Guards

  • check_target_reachable
  • check_vllm_drainable
  • check_memory_headroom

Exact operations

write desired=desktop
flock /run/mode-controller/lock
observe current
run guards
systemctl stop vllm@compute.service         # or downscale path
systemctl isolate desktop.target
systemctl start vllm@desktop.service        # optional bounded single-GPU profile
observe current
record result

Notes

  • graphical session may be started by display manager or login path depending on design
  • GPU0 becomes protected for display once Desktop converges

25.6 Plan: StudioLocal -> Compute

Preferred behavior

Treat as a direct guarded transition using the same compute-entry pipeline.

Guards

  • check_target_reachable
  • check_audio_idle
  • check_cpu_load_safe
  • check_user_jobs_safe
  • check_memory_headroom

Exact operations

write desired=compute
flock /run/mode-controller/lock
observe current
run guards
loginctl terminate-session <session-id>
wait until graphical session absent
run check_gpu_display_released
systemctl isolate compute.target
systemctl start vllm@compute.service
observe current
record result

Policy note

Because Studio-Local is the most protected interactive mode, auto-promotion from Studio-Local to Compute should generally be disabled unless explicitly requested.


26. NixOS Specialisations vs Runtime Switching — Decision Guidance

26.1 Decision Matrix

CriterionRuntime SwitchingSpecialisationsHybrid
Desktop <-> Studio-Local speedExcellentPoorExcellent
Desktop <-> Compute isolationModerateStrongStronger
ComplexityLowerModerateHighest
Early experimentationBestSlowerModerate
Deep kernel/boot divergenceWeakStrongStrong
Operational convenienceHighLowerModerate
Future externalization of StudioGoodGoodBest

Adopt runtime switching now unless one or more of the following become true:

  1. compute mode needs materially different kernel parameters or boot-time config
  2. graphical/interactive teardown proves unreliable in practice
  3. GPU role handoff remains too leaky under runtime-only switching
  4. you want Compute to be operationally closer to a dedicated server persona than a temporary mode

If any two of the above become persistent problems, promote Compute into a specialisation.

Phase 1

  • single NixOS host definition
  • runtime switching only
  • targets + slices + controller + guards

Phase 2

  • strengthen target separation
  • gather empirical failure/latency data

Phase 3

  • if needed, introduce specialisation.compute
  • preserve same desired/current/reconcile interface so operator UX does not change

Phase 4

  • if Studio is externalized, deprecate or disable studio-local
  • retain the same operator-facing control model for the host-local system

That means mode request compute could later choose:

  • runtime reconcile, or
  • request/reboot into compute specialisation

without changing the higher-level model.


27. Recommended Next Implementation Steps

  1. define exact systemd target dependencies/conflicts in Nix
  2. implement mode CLI wrapper script
  3. implement observe-current
  4. implement guard scripts with fixed exit-code contract
  5. choose between:
    • vllm@desktop.service / vllm@compute.service
    • one service with profile env file
  6. define slice resource policies for interactive vs AI
  7. wire idle detector to mode request compute
  8. validate transition behavior manually before enabling automation
  9. add a capability-placement flag/model for future Studio externalization

28. Summary

This system should behave like a reconciled state machine for host-local operational modes.

The core model is:

  • desired mode is explicit runtime intent
  • current mode is observed reality
  • reconciliation closes the gap
  • guards prevent unsafe transitions
  • systemd targets/services perform the actual mode enactment

The implementation should start with runtime switching, but preserve a clean path to hybrid specialisation if operational evidence justifies stronger separation later.

Studio/Audio should be treated as a conditional local profile plus a capability-placement decision, so that a future move to a Mac mini does not invalidate the host-local architecture.