vLLM Memory Phase 1 Implementation Plan
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: Build a minimal local persistent memory prototype around Dubnium’s existing vLLM service without coupling durable memory to transformer KV state.
Architecture: vLLM remains the inference runtime. A separate memory workload provides Postgres/pgvector storage, optional Redis working context, summarization and embedding workers, and a scoped retrieval API that an orchestrator can use before calling vLLM. A future governance layer remains external; Phase 1 records metadata and lifecycle events but does not implement the governance authority.
Tech Stack: NixOS modules, Postgres, pgvector, Redis, Python service code, pytest, systemd services.
Scope
This plan implements the Phase 1 prototype described in ADR-0010 and vLLM Persistent Memory Prototype.
Do not implement multi-agent federation, Temporal, MinIO, cryptographic attestation, a production policy DSL, or durable KV-cache persistence in this phase.
Do not implement Letta or another MemGPT-style framework in Phase 1. Keep it as an incremental upgrade candidate after storage, retrieval filters, redaction, provenance, and replay checks are stable.
Do not implement MinIO, OCI artifact publishing, VLM artifact resolution, or binary artifact extraction in Phase 1. Store artifact references and metadata only where needed; binary artifact pipelines are a later architecture phase.
Trust Boundaries
Risk: medium.
Attacker-controlled inputs include user prompts, agent messages, model output, tool output, retrieved artifacts, imported documents, and model-generated summaries. Treat all of them as untrusted before storage and before prompt assembly.
The Phase 1 implementation must enforce:
- validation at API boundaries
- scoped retrieval before prompt assembly
- redaction before durable storage
- TTL filtering
- sensitivity metadata and filters
- provenance on every memory row and artifact reference
- retrieval event logging for later replay
- no secret values in logs or memory payloads
Planned Files
Create:
modules/workloads/memory.nix: NixOS workload module for Postgres, pgvector, Redis, memory API, and workers.pkgs/memory-service/default.nix: package the local Python memory service.pkgs/memory-service/pyproject.toml: Python package metadata.pkgs/memory-service/src/dubnium_memory/__init__.py: package marker.pkgs/memory-service/src/dubnium_memory/api.py: HTTP API boundary and input validation.pkgs/memory-service/src/dubnium_memory/config.py: environment parsing.pkgs/memory-service/src/dubnium_memory/db.py: database connection and migrations runner.pkgs/memory-service/src/dubnium_memory/models.py: typed request and memory models.pkgs/memory-service/src/dubnium_memory/filters.py: retrieval scope, TTL, and sensitivity filters.pkgs/memory-service/src/dubnium_memory/redaction.py: secret and sensitive payload redaction.pkgs/memory-service/src/dubnium_memory/retrieval.py: scoped query and ranking logic.pkgs/memory-service/src/dubnium_memory/storage.py: memory persistence.pkgs/memory-service/src/dubnium_memory/workers.py: summarization and embedding worker entrypoints.pkgs/memory-service/migrations/001_initial.sql: schema for sessions, memories, embeddings, tasks, artifacts, retrieval events, and provenance.pkgs/memory-service/tests/test_filters.py: retrieval filter tests.pkgs/memory-service/tests/test_redaction.py: redaction tests.pkgs/memory-service/tests/test_storage.py: storage contract tests.pkgs/memory-service/tests/test_retrieval.py: retrieval filter tests.docs/runbooks/memory-service.md: operator runbook for the prototype.
Modify:
modules/dubnium/options.nix: adddubnium.memoryoptions and assertions.hosts/workstation/default.nix: import and enable the memory workload for the workstation only after the module evaluates.flake.nix: expose thememory-servicepackage.docs/README.md: link the memory service runbook.docs/SUMMARY.md: link the memory service runbook.
Implementation Tasks
Task 1: Add Memory Options
Files:
-
Modify:
modules/dubnium/options.nix -
Step 1: Add a disabled-by-default
dubnium.memoryoption set
Add this next to the existing dubnium.vllm and dubnium.k3s options:
memory = {
enable = mkEnableOption "persistent memory services for local vLLM orchestration";
api = {
host = mkOption {
type = types.str;
default = "127.0.0.1";
description = "Host address bound by the Dubnium memory API.";
};
port = mkOption {
type = types.port;
default = 8090;
description = "Port bound by the Dubnium memory API.";
};
};
database = {
name = mkOption {
type = types.str;
default = "dubnium_memory";
description = "Postgres database used by the Dubnium memory subsystem.";
};
user = mkOption {
type = types.str;
default = "dubnium_memory";
description = "Postgres role used by the Dubnium memory service.";
};
};
redis = {
enable = mkOption {
type = types.bool;
default = true;
description = "Whether Redis is enabled for transient working context and worker queues.";
};
};
retention = {
defaultTtlDays = mkOption {
type = types.nullOr types.int;
default = null;
description = "Default TTL in days for memory objects without an explicit TTL.";
};
};
};
- Step 2: Add assertions for safe local defaults
Add these to the existing assertions list:
{
assertion = (!config.dubnium.memory.enable) || (config.dubnium.memory.api.host == "127.0.0.1");
message = "dubnium.memory.api.host must stay local-only for the Phase 1 prototype";
}
{
assertion =
(config.dubnium.memory.retention.defaultTtlDays == null)
|| (config.dubnium.memory.retention.defaultTtlDays > 0);
message = "dubnium.memory.retention.defaultTtlDays must be positive when set";
}
- Step 3: Verify option evaluation
Run:
nix --extra-experimental-features "nix-command flakes" eval .#nixosConfigurations.workstation.config.dubnium.memory.enable
Expected:
false
Task 2: Package The Memory Service Skeleton
Files:
-
Create:
pkgs/memory-service/default.nix -
Create:
pkgs/memory-service/pyproject.toml -
Create:
pkgs/memory-service/src/dubnium_memory/__init__.py -
Create:
pkgs/memory-service/src/dubnium_memory/config.py -
Create:
pkgs/memory-service/src/dubnium_memory/api.py -
Modify:
flake.nix -
Step 1: Create package metadata
Create pkgs/memory-service/pyproject.toml:
[project]
name = "dubnium-memory"
version = "0.1.0"
description = "Local persistent memory service for Dubnium vLLM orchestration"
requires-python = ">=3.12"
dependencies = [
"fastapi",
"pydantic",
"psycopg[binary]",
"uvicorn",
]
[project.scripts]
dubnium-memory-api = "dubnium_memory.api:main"
- Step 2: Create the Nix package
Create pkgs/memory-service/default.nix:
{ python312Packages }:
python312Packages.buildPythonApplication {
pname = "dubnium-memory";
version = "0.1.0";
pyproject = true;
src = ./.;
build-system = [
python312Packages.setuptools
python312Packages.wheel
];
dependencies = [
python312Packages.fastapi
python312Packages.pydantic
python312Packages.psycopg
python312Packages.uvicorn
];
}
- Step 3: Add minimal app entrypoint
Create pkgs/memory-service/src/dubnium_memory/__init__.py:
"""Dubnium persistent memory service."""
Create pkgs/memory-service/src/dubnium_memory/config.py:
from pydantic import BaseModel
class Settings(BaseModel):
database_url: str
host: str = "127.0.0.1"
port: int = 8090
Create pkgs/memory-service/src/dubnium_memory/api.py:
import os
from fastapi import FastAPI
import uvicorn
from dubnium_memory.config import Settings
app = FastAPI(title="Dubnium Memory API")
@app.get("/healthz")
def healthz() -> dict[str, str]:
return {"status": "ok"}
def settings_from_env() -> Settings:
return Settings(
database_url=os.environ["DATABASE_URL"],
host=os.environ.get("DUBNIUM_MEMORY_HOST", "127.0.0.1"),
port=int(os.environ.get("DUBNIUM_MEMORY_PORT", "8090")),
)
def main() -> None:
settings = settings_from_env()
uvicorn.run(app, host=settings.host, port=settings.port)
- Step 4: Expose the package from the flake
Modify flake.nix under packages.${system}:
memory-service = pkgs.callPackage ./pkgs/memory-service { };
- Step 5: Verify package build
Run:
nix --extra-experimental-features "nix-command flakes" build .#memory-service
Expected:
result/bin/dubnium-memory-api exists
Task 3: Add Schema And Storage Contracts
Files:
-
Create:
pkgs/memory-service/migrations/001_initial.sql -
Create:
pkgs/memory-service/src/dubnium_memory/models.py -
Create:
pkgs/memory-service/src/dubnium_memory/storage.py -
Create:
pkgs/memory-service/tests/test_storage.py -
Step 1: Create the first migration
Create pkgs/memory-service/migrations/001_initial.sql:
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE IF NOT EXISTS sessions (
id uuid PRIMARY KEY,
scope text NOT NULL,
created_at timestamptz NOT NULL DEFAULT now()
);
CREATE TABLE IF NOT EXISTS memories (
id uuid PRIMARY KEY,
session_id uuid REFERENCES sessions(id),
memory_type text NOT NULL CHECK (memory_type IN ('working', 'episodic', 'semantic')),
summary text NOT NULL,
scope text NOT NULL,
importance double precision NOT NULL DEFAULT 0.0,
confidence double precision NOT NULL DEFAULT 0.0,
sensitivity text NOT NULL DEFAULT 'internal',
validation_status text NOT NULL DEFAULT 'unverified',
ttl timestamptz,
source text NOT NULL,
provenance jsonb NOT NULL DEFAULT '{}'::jsonb,
created_at timestamptz NOT NULL DEFAULT now()
);
CREATE TABLE IF NOT EXISTS memory_embeddings (
memory_id uuid PRIMARY KEY REFERENCES memories(id) ON DELETE CASCADE,
embedding vector(384) NOT NULL,
model text NOT NULL,
created_at timestamptz NOT NULL DEFAULT now()
);
CREATE TABLE IF NOT EXISTS tasks (
id uuid PRIMARY KEY,
scope text NOT NULL,
status text NOT NULL,
state jsonb NOT NULL DEFAULT '{}'::jsonb,
created_at timestamptz NOT NULL DEFAULT now(),
updated_at timestamptz NOT NULL DEFAULT now()
);
CREATE TABLE IF NOT EXISTS artifacts (
id uuid PRIMARY KEY,
scope text NOT NULL,
uri text NOT NULL,
media_type text,
sensitivity text NOT NULL DEFAULT 'internal',
provenance jsonb NOT NULL DEFAULT '{}'::jsonb,
created_at timestamptz NOT NULL DEFAULT now()
);
CREATE TABLE IF NOT EXISTS provenance (
id uuid PRIMARY KEY,
memory_id uuid REFERENCES memories(id) ON DELETE CASCADE,
source_identity text NOT NULL,
source_event jsonb NOT NULL,
created_at timestamptz NOT NULL DEFAULT now()
);
CREATE TABLE IF NOT EXISTS retrieval_events (
id uuid PRIMARY KEY,
scope text NOT NULL,
query text NOT NULL,
returned_memory_ids uuid[] NOT NULL DEFAULT '{}',
returned_artifact_ids uuid[] NOT NULL DEFAULT '{}',
created_at timestamptz NOT NULL DEFAULT now()
);
CREATE INDEX IF NOT EXISTS memories_scope_created_at_idx
ON memories (scope, created_at DESC);
CREATE INDEX IF NOT EXISTS memories_ttl_idx
ON memories (ttl);
- Step 2: Define typed storage input
Create pkgs/memory-service/src/dubnium_memory/models.py:
from datetime import datetime
from typing import Literal
from uuid import UUID
from pydantic import BaseModel, Field
MemoryType = Literal["working", "episodic", "semantic"]
ValidationStatus = Literal["unverified", "verified", "rejected"]
class MemoryIn(BaseModel):
id: UUID
session_id: UUID | None = None
memory_type: MemoryType
summary: str = Field(min_length=1, max_length=8000)
scope: str = Field(min_length=1, max_length=256)
importance: float = Field(default=0.0, ge=0.0, le=1.0)
confidence: float = Field(default=0.0, ge=0.0, le=1.0)
sensitivity: str = Field(default="internal", max_length=64)
validation_status: ValidationStatus = "unverified"
ttl: datetime | None = None
source: str = Field(min_length=1, max_length=128)
provenance: dict
- Step 3: Implement storage with parameterized SQL
Create pkgs/memory-service/src/dubnium_memory/storage.py:
from psycopg import Connection
from dubnium_memory.models import MemoryIn
def store_memory(conn: Connection, memory: MemoryIn) -> None:
conn.execute(
"""
INSERT INTO memories (
id, session_id, memory_type, summary, scope, importance, confidence,
sensitivity, validation_status, ttl, source, provenance
)
VALUES (
%(id)s, %(session_id)s, %(memory_type)s, %(summary)s, %(scope)s,
%(importance)s, %(confidence)s, %(sensitivity)s, %(validation_status)s,
%(ttl)s, %(source)s, %(provenance)s
)
""",
memory.model_dump(),
)
- Step 4: Add a storage test
Create pkgs/memory-service/tests/test_storage.py:
from uuid import uuid4
from dubnium_memory.models import MemoryIn
def test_memory_requires_summary() -> None:
payload = {
"id": uuid4(),
"memory_type": "episodic",
"summary": "",
"scope": "project:dubnium",
"source": "conversation",
"provenance": {"origin": "test"},
}
try:
MemoryIn(**payload)
except Exception as exc:
assert "summary" in str(exc)
else:
raise AssertionError("empty summary should be rejected")
Task 4: Add Redaction And Retrieval Filters
Files:
-
Create:
pkgs/memory-service/src/dubnium_memory/redaction.py -
Create:
pkgs/memory-service/src/dubnium_memory/filters.py -
Create:
pkgs/memory-service/tests/test_redaction.py -
Create:
pkgs/memory-service/tests/test_filters.py -
Step 1: Implement conservative redaction
Create pkgs/memory-service/src/dubnium_memory/redaction.py:
import re
SECRET_PATTERNS = [
re.compile(r"(?i)(api[_-]?key|token|secret|password)\s*[:=]\s*([^\s]+)"),
]
def redact_text(value: str) -> str:
redacted = value
for pattern in SECRET_PATTERNS:
redacted = pattern.sub(r"\1=[REDACTED]", redacted)
return redacted
- Step 2: Test redaction
Create pkgs/memory-service/tests/test_redaction.py:
from dubnium_memory.redaction import redact_text
def test_redacts_api_key_like_values() -> None:
text = "OPENAI_API_KEY=sk-test-value"
assert redact_text(text) == "OPENAI_API_KEY=[REDACTED]"
- Step 3: Implement retrieval filtering
Create pkgs/memory-service/src/dubnium_memory/filters.py:
from datetime import datetime, timezone
from typing import TypedDict
class MemoryCandidate(TypedDict):
id: str
scope: str
sensitivity: str
validation_status: str
ttl: datetime | None
def is_retrievable(
memory: MemoryCandidate,
*,
scope: str,
allowed_sensitivity: set[str],
require_verified: bool,
) -> bool:
if memory["scope"] != scope:
return False
if memory["sensitivity"] not in allowed_sensitivity:
return False
if require_verified and memory["validation_status"] != "verified":
return False
if memory["ttl"] is not None and memory["ttl"] <= datetime.now(timezone.utc):
return False
return True
- Step 4: Test scope and TTL enforcement
Create pkgs/memory-service/tests/test_filters.py:
from datetime import datetime, timedelta, timezone
from dubnium_memory.filters import is_retrievable
def test_rejects_cross_scope_memory() -> None:
memory = {
"id": "m1",
"scope": "project:other",
"sensitivity": "internal",
"validation_status": "verified",
"ttl": None,
}
assert not is_retrievable(
memory,
scope="project:dubnium",
allowed_sensitivity={"internal"},
require_verified=True,
)
def test_rejects_expired_memory() -> None:
memory = {
"id": "m1",
"scope": "project:dubnium",
"sensitivity": "internal",
"validation_status": "verified",
"ttl": datetime.now(timezone.utc) - timedelta(days=1),
}
assert not is_retrievable(
memory,
scope="project:dubnium",
allowed_sensitivity={"internal"},
require_verified=True,
)
Task 5: Add Retrieval API Boundary
Files:
-
Modify:
pkgs/memory-service/src/dubnium_memory/api.py -
Create:
pkgs/memory-service/src/dubnium_memory/retrieval.py -
Create:
pkgs/memory-service/tests/test_retrieval.py -
Step 1: Add request and response models
Add to models.py:
class RetrieveRequest(BaseModel):
query: str = Field(min_length=1, max_length=4000)
scope: str = Field(min_length=1, max_length=256)
allowed_sensitivity: list[str] = Field(default_factory=lambda: ["internal"])
require_verified: bool = False
limit: int = Field(default=8, ge=1, le=32)
class RetrievedMemory(BaseModel):
id: UUID
summary: str
scope: str
sensitivity: str
validation_status: ValidationStatus
provenance: dict
- Step 2: Implement retrieval query contract
Create pkgs/memory-service/src/dubnium_memory/retrieval.py:
from psycopg import Connection
from dubnium_memory.models import RetrieveRequest, RetrievedMemory
def retrieve_memories(conn: Connection, request: RetrieveRequest) -> list[RetrievedMemory]:
rows = conn.execute(
"""
SELECT id, summary, scope, sensitivity, validation_status, provenance
FROM memories
WHERE scope = %(scope)s
AND sensitivity = ANY(%(allowed_sensitivity)s)
AND (%(require_verified)s = false OR validation_status = 'verified')
AND (ttl IS NULL OR ttl > now())
ORDER BY importance DESC, created_at DESC
LIMIT %(limit)s
""",
request.model_dump(),
).fetchall()
return [RetrievedMemory.model_validate(dict(row)) for row in rows]
- Step 3: Add API endpoint
Add to api.py:
from dubnium_memory.models import RetrieveRequest, RetrievedMemory
@app.post("/memory/retrieve")
def retrieve(request: RetrieveRequest) -> list[RetrievedMemory]:
raise NotImplementedError("database connection wiring is added in the service module task")
Keep this endpoint local-only until the database dependency is wired. Do not expose it on the network in Phase 1.
Task 6: Add NixOS Workload Module
Files:
-
Create:
modules/workloads/memory.nix -
Modify:
hosts/workstation/default.nix -
Step 1: Create the workload module
Create modules/workloads/memory.nix:
{ lib, config, pkgs, ... }:
let
cfg = config.dubnium.memory;
memoryPackage = pkgs.callPackage ../../pkgs/memory-service { };
in
{
config = lib.mkIf cfg.enable {
services.postgresql = {
enable = true;
extensions = ps: [ ps.pgvector ];
ensureDatabases = [ cfg.database.name ];
ensureUsers = [
{
name = cfg.database.user;
ensureDBOwnership = true;
}
];
};
services.redis.servers.dubnium-memory = lib.mkIf cfg.redis.enable {
enable = true;
bind = "127.0.0.1";
port = 6379;
};
systemd.services.dubnium-memory-api = {
description = "Dubnium persistent memory API";
wantedBy = [ "multi-user.target" ];
after = [ "postgresql.service" ];
requires = [ "postgresql.service" ];
environment = {
DUBNIUM_MEMORY_HOST = cfg.api.host;
DUBNIUM_MEMORY_PORT = toString cfg.api.port;
DATABASE_URL = "postgresql:///${cfg.database.name}?host=/run/postgresql";
};
serviceConfig = {
Type = "simple";
ExecStart = "${memoryPackage}/bin/dubnium-memory-api";
Restart = "always";
RestartSec = "5s";
NoNewPrivileges = true;
PrivateTmp = true;
ProtectHome = true;
Slice = "platform.slice";
};
};
};
}
- Step 2: Import the module without enabling it
Modify hosts/workstation/default.nix imports:
../../modules/workloads/memory.nix
Do not set dubnium.memory.enable = true until package build and module eval
pass.
- Step 3: Verify disabled module eval
Run:
nix --extra-experimental-features "nix-command flakes" eval .#nixosConfigurations.workstation.config.systemd.services.dubnium-memory-api.enable
Expected: the attribute should be absent or evaluation should show the service
is not defined while dubnium.memory.enable = false.
Task 7: Enable Prototype Locally
Files:
-
Modify:
hosts/workstation/default.nix -
Step 1: Enable the memory workload
Add under dubnium:
memory = {
enable = true;
api.host = "127.0.0.1";
api.port = 8090;
};
- Step 2: Verify generated service contracts
Run:
nix --extra-experimental-features "nix-command flakes" eval .#nixosConfigurations.workstation.config.services.postgresql.enable
nix --extra-experimental-features "nix-command flakes" eval .#nixosConfigurations.workstation.config.services.redis.servers.dubnium-memory.enable
nix --extra-experimental-features "nix-command flakes" eval .#nixosConfigurations.workstation.config.systemd.services.dubnium-memory-api.environment.DUBNIUM_MEMORY_HOST
Expected:
true
true
"127.0.0.1"
Task 8: Add Operator Runbook
Files:
-
Create:
docs/runbooks/memory-service.md -
Modify:
docs/README.md -
Modify:
docs/SUMMARY.md -
Step 1: Create the runbook
Create docs/runbooks/memory-service.md with:
# Runbook: Memory Service
Status: prototype
Use this after `dubnium.memory.enable = true`.
## Verify Services
```bash
systemctl status postgresql
systemctl status redis-dubnium-memory
systemctl status dubnium-memory-api
curl http://127.0.0.1:8090/healthz
Expected:
{"status":"ok"}
Security Checks
- the API binds to
127.0.0.1 - memories include scope, sensitivity, validation status, and provenance
- expired memories are not returned
- sensitive memories are not returned unless explicitly allowed
- retrieval events are logged with memory ids and artifact references
- logs do not contain raw token-like values
- [ ] **Step 2: Link the runbook**
Add `Memory Service` to the Runbooks lists in `docs/README.md` and
`docs/SUMMARY.md`.
- [ ] **Step 3: Build docs**
Run:
```bash
mdbook build
Expected: docs build succeeds. Generated web/docs changes may be reverted if
the review scope is source docs only.
Final Verification
Before committing Phase 1 implementation:
git diff --check
nix --extra-experimental-features "nix-command flakes" build .#memory-service
nix --extra-experimental-features "nix-command flakes" eval .#nixosConfigurations.workstation.config.dubnium.memory.enable
pytest pkgs/memory-service/tests
mdbook build
If a full workstation build still fails on the known placeholder hardware configuration, report that separately from targeted memory-module evaluation.
Follow-Up: MemGPT-Style Agent Upgrade
After Phase 1 is stable, create a separate ADR or spike plan for evaluating Letta as the maintained framework lineage from MemGPT. That spike should be read-only against existing memory rows at first, then test controlled agent-managed memory writes only after external governance hooks and replay evidence are in place.
Follow-Up: Artifact And OCI Architecture
After Phase 1 is stable, create a separate implementation plan for artifact handling. That work should start with filesystem content-addressed storage and metadata extraction, then evaluate MinIO and OCI-style exported cognition artifacts only after memory rows, retrieval events, and artifact references have stable ids.