Memory Data Model Specification

Status: draft

This document is the canonical data model and requirements specification for the Dubnium memory service prototype. It reconciles the architecture direction, API/domain models, and current Postgres migration.

Implementation references:

pkgs/memory-service/src/dubnium_memory/models.py
pkgs/memory-service/src/dubnium_memory/embeddings.py
pkgs/memory-service/src/dubnium_memory/migrations/001_initial.sql
pkgs/memory-service/src/dubnium_memory/migrations/002_pgvector_embeddings.sql

Goals

The data model must support:

durable episodic, semantic, and working memory records
scoped retrieval for projects, sessions, and agents
externalized artifacts and evidence references
retrieval event capture for audit and replay
metadata needed by a future external governance layer
local Postgres and pgvector evolution without coupling to vLLM internals

Non-Goals

The data model does not define:

transformer KV-cache persistence
prompt assembly format
future governance authority behavior
autonomous memory mutation rules
object storage implementation details
a Letta or MemGPT internal schema

Trust Boundary

All stored content is untrusted when it enters the system and when it is retrieved later. This includes user input, agent output, model-generated summaries, tool output, artifact-derived text, and database rows.

Boundary requirements:

validate API payloads before constructing domain objects
redact secret-like values before persistence
use parameterized SQL for all request-derived values
keep secrets out of logs and durable memory summaries
store enough provenance, validation, sensitivity, scope, and TTL metadata for external policy systems to inspect later
return retrieval candidates and identifiers, not assembled prompts

Domain Objects

Memory

Memory is a normalized semantic or episodic record. It is not raw transcript storage and should not contain binary artifact data.

Required fields:

Field	Type	Requirement
`id`	UUID	Stable identifier generated before persistence
`memory_type`	enum	One of `working`, `episodic`, `semantic`
`summary`	string	Non-empty, max 8000 chars, redacted before persistence
`scope`	string	Non-empty, max 256 chars
`source`	string	Non-empty source label, max 128 chars
`provenance`	object	JSON object, empty object allowed

Optional or defaulted fields:

Field	Type	Default	Requirement
`session_id`	UUID or null	null	References `sessions.id` when durable
`importance`	float	0.0	Range 0.0 to 1.0
`confidence`	float	0.0	Range 0.0 to 1.0
`sensitivity`	string	`internal`	Non-empty, max 64 chars
`validation_status`	enum	`unverified`	One of `unverified`, `verified`, `rejected`
`ttl`	timestamp or null	null	Expired records excluded and removable
`artifact_refs`	list	empty	Each artifact scope must match memory scope

Durable table: memories.

Current gap: artifact refs are represented in domain/API objects but are not yet persisted as a relationship table.

Retrieved Memory

Retrieved memory is a context candidate returned by retrieval. It must contain only the fields needed by callers to decide whether and how to assemble context.

Fields:

id
summary
scope
sensitivity
validation_status
provenance
artifact_refs

Retrieval responses must not construct prompts. Prompt assembly remains outside the memory service.

Retrieve Request

Retrieve requests define caller intent and visibility constraints.

Fields:

Field	Type	Default	Requirement
`query`	string	none	Non-empty, max 4000 chars
`scope`	string	none	Non-empty, max 256 chars
`allowed_sensitivity`	string list	`["internal"]`	Must not be empty
`require_verified`	bool	false	Filters to verified memories when true
`limit`	int	8	Range 1 to 32

Retrieval Event

Retrieval events record what was available to a caller at retrieval time.

Fields:

Field	Type	Requirement
`id`	UUID	Generated for each retrieval
`scope`	string	Request scope
`query`	string	Request query
`returned_memory_ids`	UUID list	Ordered returned memory ids
`returned_artifact_ids`	UUID list	Artifact ids referenced by returned memories
`created_at`	timestamp	Durable database timestamp

Durable table: retrieval_events.

Replay requirements:

preserve returned memory ids
preserve returned artifact ids
preserve query and scope
preserve timestamp
later replay surfaces should reconstruct candidate availability from these identifiers and persisted records

Artifact Reference

Artifact refs are lightweight pointers from memory records to external evidence. They do not embed raw binary content.

Fields:

Field	Type	Requirement
`id`	UUID	Artifact identifier
`scope`	string	Must match containing memory scope
`sha256`	string	Content hash
`storage_uri`	string	External storage pointer
`artifact_type`	string	Type such as `image`, `document`, `log`

Durable table: artifacts.

Current gap: memory-to-artifact relationship persistence is not implemented.

Embedding

Embeddings are model-specific vector representations. They are separate from memory records so memory facts remain portable across embedding model changes.

Fields:

Field	Type	Requirement
`model`	string	Non-empty, max 128 chars
`dimensions`	int	Positive
`vector`	float list	Length must match `dimensions`

Current durable table: memory_embeddings.

Current durable fields:

memory_id
embedding_model
embedding_ref
embedding
embedding_dimensions
created_at

Current implementation can persist embedding references and pgvector values for a memory. The application service can embed stored summaries when configured with an embedder and an embedding-capable store. The Postgres store can query vectors behind the storage boundary.

Session

Sessions group conversational or agentic work under a scope.

Durable table: sessions.

Fields:

id
scope
created_at

Current gap: session creation and lookup APIs are not implemented.

Task State

Task state is active execution state, not memory. It should remain structured and queryable instead of being embedded in vector stores.

Durable table: tasks.

Fields:

id
scope
status
state
created_at
updated_at

Current gap: task-state domain objects and APIs are not implemented.

Provenance

Provenance records attach lineage to one memory, artifact, or retrieval event.

Durable table: provenance.

Fields:

id
memory_id
artifact_id
retrieval_event_id
source_identity
source_event
created_at

Constraint: exactly one of memory_id, artifact_id, or retrieval_event_id must be set.

Current gap: provenance has initial schema support but no write path beyond memory-local JSON metadata.

Durable Tables

Table	Purpose	Status
`sessions`	Session metadata	Schema only
`memories`	Normalized memory records	Implemented for store/retrieve/expire
`memory_embeddings`	Embedding references and vectors	Implemented for persistence
`tasks`	Active workflow state	Schema only
`artifacts`	Externalized artifact metadata	Schema only
`retrieval_events`	Retrieval audit/replay records	Implemented for retrieval event persistence
`provenance`	Lineage records	Schema only

API Requirements

The API boundary must:

reject non-JSON write requests
reject oversized payloads
validate UUIDs, timestamps, enum values, scores, and bounds
redact secret-like values before storing memory summaries
return JSON errors without stack traces
expose retrieval events for local replay/audit inspection
keep durable storage implementation behind the application service contract

Retrieval Requirements

Retrieval must filter by:

scope
allowed sensitivity
validation status when require_verified is true
TTL expiration

Retrieval should rank by:

lexical or vector relevance
importance
confidence
recency

Current implementation supports scope, sensitivity, verification, TTL, lexical matching, vector relevance in the Postgres store, importance, and confidence. Recency ranking is future work.

Evolution Requirements

Future changes should preserve:

vLLM runtime statelessness
memory/runtime separation from governance authority
external artifact references instead of binary prompt memory
replayable retrieval events
replaceable embedding providers
MemGPT/Letta integration above Dubnium memory APIs, not as source of truth

Before adding autonomous memory writes, durable storage, redaction, retrieval filters, provenance, expiration, and replay evidence must pass local validation.

Keyboard shortcuts

Dubnium