ADR-0003: vLLM Is Compute-Only in V1
Status: accepted
Context
Desktop-mode AI is possible in the target architecture, especially when a second GPU is installed. For the first reliable control-loop milestone, desktop AI adds resource contention and observer complexity.
Decision
Keep vLLM compute-only in v1.
Use one vllm.service attached to compute behavior. Shape options and
controller actions so vllm@compute.service and a future bounded desktop
profile can be added later.
Consequences
desktopandstudio-localshould leave vLLM inactive.computeowns vLLM activation.- The first milestone can focus on mode transitions and observation.
- Bounded desktop AI is deferred until
desktop <-> computeswitching is reliable on real hardware.