Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

vLLM Memory Phase 1 Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Build a minimal local persistent memory prototype around Dubnium’s existing vLLM service without coupling durable memory to transformer KV state.

Architecture: vLLM remains the inference runtime. A separate memory workload provides Postgres/pgvector storage, optional Redis working context, summarization and embedding workers, and a scoped retrieval API that an orchestrator can use before calling vLLM. A future governance layer remains external; Phase 1 records metadata and lifecycle events but does not implement the governance authority.

Tech Stack: NixOS modules, Postgres, pgvector, Redis, Python service code, pytest, systemd services.


Scope

This plan implements the Phase 1 prototype described in ADR-0010 and vLLM Persistent Memory Prototype.

Do not implement multi-agent federation, Temporal, MinIO, cryptographic attestation, a production policy DSL, or durable KV-cache persistence in this phase.

Do not implement Letta or another MemGPT-style framework in Phase 1. Keep it as an incremental upgrade candidate after storage, retrieval filters, redaction, provenance, and replay checks are stable.

Do not implement MinIO, OCI artifact publishing, VLM artifact resolution, or binary artifact extraction in Phase 1. Store artifact references and metadata only where needed; binary artifact pipelines are a later architecture phase.

Trust Boundaries

Risk: medium.

Attacker-controlled inputs include user prompts, agent messages, model output, tool output, retrieved artifacts, imported documents, and model-generated summaries. Treat all of them as untrusted before storage and before prompt assembly.

The Phase 1 implementation must enforce:

  • validation at API boundaries
  • scoped retrieval before prompt assembly
  • redaction before durable storage
  • TTL filtering
  • sensitivity metadata and filters
  • provenance on every memory row and artifact reference
  • retrieval event logging for later replay
  • no secret values in logs or memory payloads

Planned Files

Create:

  • modules/workloads/memory.nix: NixOS workload module for Postgres, pgvector, Redis, memory API, and workers.
  • pkgs/memory-service/default.nix: package the local Python memory service.
  • pkgs/memory-service/pyproject.toml: Python package metadata.
  • pkgs/memory-service/src/dubnium_memory/__init__.py: package marker.
  • pkgs/memory-service/src/dubnium_memory/api.py: HTTP API boundary and input validation.
  • pkgs/memory-service/src/dubnium_memory/config.py: environment parsing.
  • pkgs/memory-service/src/dubnium_memory/db.py: database connection and migrations runner.
  • pkgs/memory-service/src/dubnium_memory/models.py: typed request and memory models.
  • pkgs/memory-service/src/dubnium_memory/filters.py: retrieval scope, TTL, and sensitivity filters.
  • pkgs/memory-service/src/dubnium_memory/redaction.py: secret and sensitive payload redaction.
  • pkgs/memory-service/src/dubnium_memory/retrieval.py: scoped query and ranking logic.
  • pkgs/memory-service/src/dubnium_memory/storage.py: memory persistence.
  • pkgs/memory-service/src/dubnium_memory/workers.py: summarization and embedding worker entrypoints.
  • pkgs/memory-service/migrations/001_initial.sql: schema for sessions, memories, embeddings, tasks, artifacts, retrieval events, and provenance.
  • pkgs/memory-service/tests/test_filters.py: retrieval filter tests.
  • pkgs/memory-service/tests/test_redaction.py: redaction tests.
  • pkgs/memory-service/tests/test_storage.py: storage contract tests.
  • pkgs/memory-service/tests/test_retrieval.py: retrieval filter tests.
  • docs/runbooks/memory-service.md: operator runbook for the prototype.

Modify:

  • modules/dubnium/options.nix: add dubnium.memory options and assertions.
  • hosts/workstation/default.nix: import and enable the memory workload for the workstation only after the module evaluates.
  • flake.nix: expose the memory-service package.
  • docs/README.md: link the memory service runbook.
  • docs/SUMMARY.md: link the memory service runbook.

Implementation Tasks

Task 1: Add Memory Options

Files:

  • Modify: modules/dubnium/options.nix

  • Step 1: Add a disabled-by-default dubnium.memory option set

Add this next to the existing dubnium.vllm and dubnium.k3s options:

memory = {
  enable = mkEnableOption "persistent memory services for local vLLM orchestration";

  api = {
    host = mkOption {
      type = types.str;
      default = "127.0.0.1";
      description = "Host address bound by the Dubnium memory API.";
    };

    port = mkOption {
      type = types.port;
      default = 8090;
      description = "Port bound by the Dubnium memory API.";
    };
  };

  database = {
    name = mkOption {
      type = types.str;
      default = "dubnium_memory";
      description = "Postgres database used by the Dubnium memory subsystem.";
    };

    user = mkOption {
      type = types.str;
      default = "dubnium_memory";
      description = "Postgres role used by the Dubnium memory service.";
    };
  };

  redis = {
    enable = mkOption {
      type = types.bool;
      default = true;
      description = "Whether Redis is enabled for transient working context and worker queues.";
    };
  };

  retention = {
    defaultTtlDays = mkOption {
      type = types.nullOr types.int;
      default = null;
      description = "Default TTL in days for memory objects without an explicit TTL.";
    };
  };
};
  • Step 2: Add assertions for safe local defaults

Add these to the existing assertions list:

{
  assertion = (!config.dubnium.memory.enable) || (config.dubnium.memory.api.host == "127.0.0.1");
  message = "dubnium.memory.api.host must stay local-only for the Phase 1 prototype";
}
{
  assertion =
    (config.dubnium.memory.retention.defaultTtlDays == null)
    || (config.dubnium.memory.retention.defaultTtlDays > 0);
  message = "dubnium.memory.retention.defaultTtlDays must be positive when set";
}
  • Step 3: Verify option evaluation

Run:

nix --extra-experimental-features "nix-command flakes" eval .#nixosConfigurations.workstation.config.dubnium.memory.enable

Expected:

false

Task 2: Package The Memory Service Skeleton

Files:

  • Create: pkgs/memory-service/default.nix

  • Create: pkgs/memory-service/pyproject.toml

  • Create: pkgs/memory-service/src/dubnium_memory/__init__.py

  • Create: pkgs/memory-service/src/dubnium_memory/config.py

  • Create: pkgs/memory-service/src/dubnium_memory/api.py

  • Modify: flake.nix

  • Step 1: Create package metadata

Create pkgs/memory-service/pyproject.toml:

[project]
name = "dubnium-memory"
version = "0.1.0"
description = "Local persistent memory service for Dubnium vLLM orchestration"
requires-python = ">=3.12"
dependencies = [
  "fastapi",
  "pydantic",
  "psycopg[binary]",
  "uvicorn",
]

[project.scripts]
dubnium-memory-api = "dubnium_memory.api:main"
  • Step 2: Create the Nix package

Create pkgs/memory-service/default.nix:

{ python312Packages }:

python312Packages.buildPythonApplication {
  pname = "dubnium-memory";
  version = "0.1.0";
  pyproject = true;

  src = ./.;

  build-system = [
    python312Packages.setuptools
    python312Packages.wheel
  ];

  dependencies = [
    python312Packages.fastapi
    python312Packages.pydantic
    python312Packages.psycopg
    python312Packages.uvicorn
  ];
}
  • Step 3: Add minimal app entrypoint

Create pkgs/memory-service/src/dubnium_memory/__init__.py:

"""Dubnium persistent memory service."""

Create pkgs/memory-service/src/dubnium_memory/config.py:

from pydantic import BaseModel


class Settings(BaseModel):
    database_url: str
    host: str = "127.0.0.1"
    port: int = 8090

Create pkgs/memory-service/src/dubnium_memory/api.py:

import os

from fastapi import FastAPI
import uvicorn

from dubnium_memory.config import Settings


app = FastAPI(title="Dubnium Memory API")


@app.get("/healthz")
def healthz() -> dict[str, str]:
    return {"status": "ok"}


def settings_from_env() -> Settings:
    return Settings(
        database_url=os.environ["DATABASE_URL"],
        host=os.environ.get("DUBNIUM_MEMORY_HOST", "127.0.0.1"),
        port=int(os.environ.get("DUBNIUM_MEMORY_PORT", "8090")),
    )


def main() -> None:
    settings = settings_from_env()
    uvicorn.run(app, host=settings.host, port=settings.port)
  • Step 4: Expose the package from the flake

Modify flake.nix under packages.${system}:

memory-service = pkgs.callPackage ./pkgs/memory-service { };
  • Step 5: Verify package build

Run:

nix --extra-experimental-features "nix-command flakes" build .#memory-service

Expected:

result/bin/dubnium-memory-api exists

Task 3: Add Schema And Storage Contracts

Files:

  • Create: pkgs/memory-service/migrations/001_initial.sql

  • Create: pkgs/memory-service/src/dubnium_memory/models.py

  • Create: pkgs/memory-service/src/dubnium_memory/storage.py

  • Create: pkgs/memory-service/tests/test_storage.py

  • Step 1: Create the first migration

Create pkgs/memory-service/migrations/001_initial.sql:

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE IF NOT EXISTS sessions (
  id uuid PRIMARY KEY,
  scope text NOT NULL,
  created_at timestamptz NOT NULL DEFAULT now()
);

CREATE TABLE IF NOT EXISTS memories (
  id uuid PRIMARY KEY,
  session_id uuid REFERENCES sessions(id),
  memory_type text NOT NULL CHECK (memory_type IN ('working', 'episodic', 'semantic')),
  summary text NOT NULL,
  scope text NOT NULL,
  importance double precision NOT NULL DEFAULT 0.0,
  confidence double precision NOT NULL DEFAULT 0.0,
  sensitivity text NOT NULL DEFAULT 'internal',
  validation_status text NOT NULL DEFAULT 'unverified',
  ttl timestamptz,
  source text NOT NULL,
  provenance jsonb NOT NULL DEFAULT '{}'::jsonb,
  created_at timestamptz NOT NULL DEFAULT now()
);

CREATE TABLE IF NOT EXISTS memory_embeddings (
  memory_id uuid PRIMARY KEY REFERENCES memories(id) ON DELETE CASCADE,
  embedding vector(384) NOT NULL,
  model text NOT NULL,
  created_at timestamptz NOT NULL DEFAULT now()
);

CREATE TABLE IF NOT EXISTS tasks (
  id uuid PRIMARY KEY,
  scope text NOT NULL,
  status text NOT NULL,
  state jsonb NOT NULL DEFAULT '{}'::jsonb,
  created_at timestamptz NOT NULL DEFAULT now(),
  updated_at timestamptz NOT NULL DEFAULT now()
);

CREATE TABLE IF NOT EXISTS artifacts (
  id uuid PRIMARY KEY,
  scope text NOT NULL,
  uri text NOT NULL,
  media_type text,
  sensitivity text NOT NULL DEFAULT 'internal',
  provenance jsonb NOT NULL DEFAULT '{}'::jsonb,
  created_at timestamptz NOT NULL DEFAULT now()
);

CREATE TABLE IF NOT EXISTS provenance (
  id uuid PRIMARY KEY,
  memory_id uuid REFERENCES memories(id) ON DELETE CASCADE,
  source_identity text NOT NULL,
  source_event jsonb NOT NULL,
  created_at timestamptz NOT NULL DEFAULT now()
);

CREATE TABLE IF NOT EXISTS retrieval_events (
  id uuid PRIMARY KEY,
  scope text NOT NULL,
  query text NOT NULL,
  returned_memory_ids uuid[] NOT NULL DEFAULT '{}',
  returned_artifact_ids uuid[] NOT NULL DEFAULT '{}',
  created_at timestamptz NOT NULL DEFAULT now()
);

CREATE INDEX IF NOT EXISTS memories_scope_created_at_idx
  ON memories (scope, created_at DESC);

CREATE INDEX IF NOT EXISTS memories_ttl_idx
  ON memories (ttl);
  • Step 2: Define typed storage input

Create pkgs/memory-service/src/dubnium_memory/models.py:

from datetime import datetime
from typing import Literal
from uuid import UUID

from pydantic import BaseModel, Field


MemoryType = Literal["working", "episodic", "semantic"]
ValidationStatus = Literal["unverified", "verified", "rejected"]


class MemoryIn(BaseModel):
    id: UUID
    session_id: UUID | None = None
    memory_type: MemoryType
    summary: str = Field(min_length=1, max_length=8000)
    scope: str = Field(min_length=1, max_length=256)
    importance: float = Field(default=0.0, ge=0.0, le=1.0)
    confidence: float = Field(default=0.0, ge=0.0, le=1.0)
    sensitivity: str = Field(default="internal", max_length=64)
    validation_status: ValidationStatus = "unverified"
    ttl: datetime | None = None
    source: str = Field(min_length=1, max_length=128)
    provenance: dict
  • Step 3: Implement storage with parameterized SQL

Create pkgs/memory-service/src/dubnium_memory/storage.py:

from psycopg import Connection

from dubnium_memory.models import MemoryIn


def store_memory(conn: Connection, memory: MemoryIn) -> None:
    conn.execute(
        """
        INSERT INTO memories (
          id, session_id, memory_type, summary, scope, importance, confidence,
          sensitivity, validation_status, ttl, source, provenance
        )
        VALUES (
          %(id)s, %(session_id)s, %(memory_type)s, %(summary)s, %(scope)s,
          %(importance)s, %(confidence)s, %(sensitivity)s, %(validation_status)s,
          %(ttl)s, %(source)s, %(provenance)s
        )
        """,
        memory.model_dump(),
    )
  • Step 4: Add a storage test

Create pkgs/memory-service/tests/test_storage.py:

from uuid import uuid4

from dubnium_memory.models import MemoryIn


def test_memory_requires_summary() -> None:
    payload = {
        "id": uuid4(),
        "memory_type": "episodic",
        "summary": "",
        "scope": "project:dubnium",
        "source": "conversation",
        "provenance": {"origin": "test"},
    }

    try:
        MemoryIn(**payload)
    except Exception as exc:
        assert "summary" in str(exc)
    else:
        raise AssertionError("empty summary should be rejected")

Task 4: Add Redaction And Retrieval Filters

Files:

  • Create: pkgs/memory-service/src/dubnium_memory/redaction.py

  • Create: pkgs/memory-service/src/dubnium_memory/filters.py

  • Create: pkgs/memory-service/tests/test_redaction.py

  • Create: pkgs/memory-service/tests/test_filters.py

  • Step 1: Implement conservative redaction

Create pkgs/memory-service/src/dubnium_memory/redaction.py:

import re


SECRET_PATTERNS = [
    re.compile(r"(?i)(api[_-]?key|token|secret|password)\s*[:=]\s*([^\s]+)"),
]


def redact_text(value: str) -> str:
    redacted = value
    for pattern in SECRET_PATTERNS:
        redacted = pattern.sub(r"\1=[REDACTED]", redacted)
    return redacted
  • Step 2: Test redaction

Create pkgs/memory-service/tests/test_redaction.py:

from dubnium_memory.redaction import redact_text


def test_redacts_api_key_like_values() -> None:
    text = "OPENAI_API_KEY=sk-test-value"

    assert redact_text(text) == "OPENAI_API_KEY=[REDACTED]"
  • Step 3: Implement retrieval filtering

Create pkgs/memory-service/src/dubnium_memory/filters.py:

from datetime import datetime, timezone
from typing import TypedDict


class MemoryCandidate(TypedDict):
    id: str
    scope: str
    sensitivity: str
    validation_status: str
    ttl: datetime | None


def is_retrievable(
    memory: MemoryCandidate,
    *,
    scope: str,
    allowed_sensitivity: set[str],
    require_verified: bool,
) -> bool:
    if memory["scope"] != scope:
        return False
    if memory["sensitivity"] not in allowed_sensitivity:
        return False
    if require_verified and memory["validation_status"] != "verified":
        return False
    if memory["ttl"] is not None and memory["ttl"] <= datetime.now(timezone.utc):
        return False
    return True
  • Step 4: Test scope and TTL enforcement

Create pkgs/memory-service/tests/test_filters.py:

from datetime import datetime, timedelta, timezone

from dubnium_memory.filters import is_retrievable


def test_rejects_cross_scope_memory() -> None:
    memory = {
        "id": "m1",
        "scope": "project:other",
        "sensitivity": "internal",
        "validation_status": "verified",
        "ttl": None,
    }

    assert not is_retrievable(
        memory,
        scope="project:dubnium",
        allowed_sensitivity={"internal"},
        require_verified=True,
    )


def test_rejects_expired_memory() -> None:
    memory = {
        "id": "m1",
        "scope": "project:dubnium",
        "sensitivity": "internal",
        "validation_status": "verified",
        "ttl": datetime.now(timezone.utc) - timedelta(days=1),
    }

    assert not is_retrievable(
        memory,
        scope="project:dubnium",
        allowed_sensitivity={"internal"},
        require_verified=True,
    )

Task 5: Add Retrieval API Boundary

Files:

  • Modify: pkgs/memory-service/src/dubnium_memory/api.py

  • Create: pkgs/memory-service/src/dubnium_memory/retrieval.py

  • Create: pkgs/memory-service/tests/test_retrieval.py

  • Step 1: Add request and response models

Add to models.py:

class RetrieveRequest(BaseModel):
    query: str = Field(min_length=1, max_length=4000)
    scope: str = Field(min_length=1, max_length=256)
    allowed_sensitivity: list[str] = Field(default_factory=lambda: ["internal"])
    require_verified: bool = False
    limit: int = Field(default=8, ge=1, le=32)


class RetrievedMemory(BaseModel):
    id: UUID
    summary: str
    scope: str
    sensitivity: str
    validation_status: ValidationStatus
    provenance: dict
  • Step 2: Implement retrieval query contract

Create pkgs/memory-service/src/dubnium_memory/retrieval.py:

from psycopg import Connection

from dubnium_memory.models import RetrieveRequest, RetrievedMemory


def retrieve_memories(conn: Connection, request: RetrieveRequest) -> list[RetrievedMemory]:
    rows = conn.execute(
        """
        SELECT id, summary, scope, sensitivity, validation_status, provenance
        FROM memories
        WHERE scope = %(scope)s
          AND sensitivity = ANY(%(allowed_sensitivity)s)
          AND (%(require_verified)s = false OR validation_status = 'verified')
          AND (ttl IS NULL OR ttl > now())
        ORDER BY importance DESC, created_at DESC
        LIMIT %(limit)s
        """,
        request.model_dump(),
    ).fetchall()
    return [RetrievedMemory.model_validate(dict(row)) for row in rows]
  • Step 3: Add API endpoint

Add to api.py:

from dubnium_memory.models import RetrieveRequest, RetrievedMemory


@app.post("/memory/retrieve")
def retrieve(request: RetrieveRequest) -> list[RetrievedMemory]:
    raise NotImplementedError("database connection wiring is added in the service module task")

Keep this endpoint local-only until the database dependency is wired. Do not expose it on the network in Phase 1.

Task 6: Add NixOS Workload Module

Files:

  • Create: modules/workloads/memory.nix

  • Modify: hosts/workstation/default.nix

  • Step 1: Create the workload module

Create modules/workloads/memory.nix:

{ lib, config, pkgs, ... }:
let
  cfg = config.dubnium.memory;
  memoryPackage = pkgs.callPackage ../../pkgs/memory-service { };
in
{
  config = lib.mkIf cfg.enable {
    services.postgresql = {
      enable = true;
      extensions = ps: [ ps.pgvector ];
      ensureDatabases = [ cfg.database.name ];
      ensureUsers = [
        {
          name = cfg.database.user;
          ensureDBOwnership = true;
        }
      ];
    };

    services.redis.servers.dubnium-memory = lib.mkIf cfg.redis.enable {
      enable = true;
      bind = "127.0.0.1";
      port = 6379;
    };

    systemd.services.dubnium-memory-api = {
      description = "Dubnium persistent memory API";
      wantedBy = [ "multi-user.target" ];
      after = [ "postgresql.service" ];
      requires = [ "postgresql.service" ];
      environment = {
        DUBNIUM_MEMORY_HOST = cfg.api.host;
        DUBNIUM_MEMORY_PORT = toString cfg.api.port;
        DATABASE_URL = "postgresql:///${cfg.database.name}?host=/run/postgresql";
      };
      serviceConfig = {
        Type = "simple";
        ExecStart = "${memoryPackage}/bin/dubnium-memory-api";
        Restart = "always";
        RestartSec = "5s";
        NoNewPrivileges = true;
        PrivateTmp = true;
        ProtectHome = true;
        Slice = "platform.slice";
      };
    };
  };
}
  • Step 2: Import the module without enabling it

Modify hosts/workstation/default.nix imports:

../../modules/workloads/memory.nix

Do not set dubnium.memory.enable = true until package build and module eval pass.

  • Step 3: Verify disabled module eval

Run:

nix --extra-experimental-features "nix-command flakes" eval .#nixosConfigurations.workstation.config.systemd.services.dubnium-memory-api.enable

Expected: the attribute should be absent or evaluation should show the service is not defined while dubnium.memory.enable = false.

Task 7: Enable Prototype Locally

Files:

  • Modify: hosts/workstation/default.nix

  • Step 1: Enable the memory workload

Add under dubnium:

memory = {
  enable = true;
  api.host = "127.0.0.1";
  api.port = 8090;
};
  • Step 2: Verify generated service contracts

Run:

nix --extra-experimental-features "nix-command flakes" eval .#nixosConfigurations.workstation.config.services.postgresql.enable
nix --extra-experimental-features "nix-command flakes" eval .#nixosConfigurations.workstation.config.services.redis.servers.dubnium-memory.enable
nix --extra-experimental-features "nix-command flakes" eval .#nixosConfigurations.workstation.config.systemd.services.dubnium-memory-api.environment.DUBNIUM_MEMORY_HOST

Expected:

true
true
"127.0.0.1"

Task 8: Add Operator Runbook

Files:

  • Create: docs/runbooks/memory-service.md

  • Modify: docs/README.md

  • Modify: docs/SUMMARY.md

  • Step 1: Create the runbook

Create docs/runbooks/memory-service.md with:

# Runbook: Memory Service

Status: prototype

Use this after `dubnium.memory.enable = true`.

## Verify Services

```bash
systemctl status postgresql
systemctl status redis-dubnium-memory
systemctl status dubnium-memory-api
curl http://127.0.0.1:8090/healthz

Expected:

{"status":"ok"}

Security Checks

  • the API binds to 127.0.0.1
  • memories include scope, sensitivity, validation status, and provenance
  • expired memories are not returned
  • sensitive memories are not returned unless explicitly allowed
  • retrieval events are logged with memory ids and artifact references
  • logs do not contain raw token-like values

- [ ] **Step 2: Link the runbook**

Add `Memory Service` to the Runbooks lists in `docs/README.md` and
`docs/SUMMARY.md`.

- [ ] **Step 3: Build docs**

Run:

```bash
mdbook build

Expected: docs build succeeds. Generated web/docs changes may be reverted if the review scope is source docs only.

Final Verification

Before committing Phase 1 implementation:

git diff --check
nix --extra-experimental-features "nix-command flakes" build .#memory-service
nix --extra-experimental-features "nix-command flakes" eval .#nixosConfigurations.workstation.config.dubnium.memory.enable
pytest pkgs/memory-service/tests
mdbook build

If a full workstation build still fails on the known placeholder hardware configuration, report that separately from targeted memory-module evaluation.

Follow-Up: MemGPT-Style Agent Upgrade

After Phase 1 is stable, create a separate ADR or spike plan for evaluating Letta as the maintained framework lineage from MemGPT. That spike should be read-only against existing memory rows at first, then test controlled agent-managed memory writes only after external governance hooks and replay evidence are in place.

Follow-Up: Artifact And OCI Architecture

After Phase 1 is stable, create a separate implementation plan for artifact handling. That work should start with filesystem content-addressed storage and metadata extraction, then evaluate MinIO and OCI-style exported cognition artifacts only after memory rows, retrieval events, and artifact references have stable ids.