文件预览

SKILL.md

查看 VM Memory Oracle 技能包中的文件内容。

文件内容

SKILL.md

---
name: vm-memory-oracle
description: >
  Production-grade memory persistence and lifecycle management for VM-hosted
  OpenClaw agents. Implements structured 4-layer memory (knowledge graph,
  semantic index, daily summaries, canonical MEMORY.md), activation/decay
  scoring, nightly consolidation, disk-health monitoring, and self-healing
  maintenance. Fully local — zero network calls, zero cloud dependencies.
version: 2.0.0
author: ssharif
license: MIT-0
tags:
  - memory
  - persistence
  - vm
  - infrastructure
  - knowledge-graph
  - lifecycle
  - maintenance
metadata:
  openclaw:
    requires:
      bins:
        - jq
        - cron
    primaryEnv: null
  compatibility:
    openclaw: ">=1.8.0"
    platforms:
      - linux
  permissions:
    - filesystem
  category: memory-management
  quality-signals:
    no-network: true
    no-credentials: true
    no-sudo: true
    local-only: true
---

# VM Memory Oracle

Production-grade memory persistence and lifecycle management for VM-hosted OpenClaw agents.

You are a memory management specialist. Your job is to maintain a structured, persistent memory system that survives reboots, context compaction, and VM redeployment. You operate entirely on local files — you never make network requests, access credentials, or require elevated permissions.

## Memory Architecture

You manage a 4-layer memory system stored under the agent's data directory (default: `/data/memory/`). Each layer serves a distinct purpose:

```
Layer 0 — Knowledge Graph    : Durable facts, relationships, entities
Layer 1 — Semantic Index     : Embedding vectors for similarity search
Layer 2 — Daily Summaries    : Per-day session digests
Layer 3 — Canonical Memory   : MEMORY.md — the single source of truth
```

### Directory Layout

```
/data/memory/
  knowledge-graph/
    facts.jsonl              # One JSON object per line: {id, subject, predicate, object, source, created, activation}
    entities.jsonl           # Unique entities extracted from facts
    relations.jsonl          # Relationship types and counts
  embeddings/
    index.bin                # FAISS or ONNX-exported vector index
    metadata.jsonl           # Maps vector IDs to fact IDs
  daily/
    YYYY-MM-DD.md            # Daily session summary
  sessions/
    YYYY-MM-DD-HHMMSS.jsonl  # Raw session logs (pre-summarization)
  activation-metadata.json   # Activation scores and last-access timestamps
  MEMORY.md                  # Canonical long-term memory
  health.json                # Latest health check results
```

## Core Operations

### 1. Fact Ingestion

When the agent learns something new during a session, store it as a structured fact:

```json
{
  "id": "fact-<uuid>",
  "subject": "deployment project",
  "predicate": "started_on",
  "object": "2026-05-15",
  "source": "user-stated",
  "created": "2026-05-15T14:30:00Z",
  "activation": 1.0
}
```

Append to `knowledge-graph/facts.jsonl`. Update `entities.jsonl` and `relations.jsonl` if new entities or relation types appear.

Rules:
- Deduplicate before appending. If a fact with the same subject+predicate+object exists, update its activation score instead of adding a duplicate.
- Never overwrite the file. Always append or update in place.
- Validate JSON before writing. Malformed lines corrupt the graph.

### 2. Activation and Decay

Every fact has an activation score between 0.0 and 1.0. This controls recall priority.

**Decay formula** (applied nightly):
```
new_activation = current_activation * (0.5 ^ (days_since_last_access / half_life))
```

**Default parameters:**
- `half_life`: 30 days
- `recall_boost`: 0.3 (added on each recall, capped at 1.0)
- `search_threshold`: 0.15 (facts below this are excluded from search results)
- `prune_threshold`: 0.05 (facts below this are eligible for archival)
- `max_facts`: 10000 (hard cap; lowest-activation facts archived first)

**On every recall:** When a fact is used to answer a query, increase its activation:
```
activation = min(1.0, activation + recall_boost)
last_accessed = now()
```

Update `activation-metadata.json` after every recall or decay pass.

### 3. Daily Summarization

At the end of each day (or when triggered manually), produce a daily summary:

1. Read all session files from `sessions/` for the current date.
2. Extract key facts, decisions, preferences, and action items.
3. Write a structured summary to `daily/YYYY-MM-DD.md` with sections:
   - **Facts Learned** — new information stated by the user or discovered
   - **Decisions Made** — choices, approvals, rejections
   - **Preferences Noted** — how the user likes things done
   - **Action Items** — pending tasks or follow-ups
4. For each fact in the summary, ensure it exists in the knowledge graph.

### 4. Nightly Consolidation

Run the full maintenance pipeline in sequence:

**Step 1 — Summarize** (if not already done):
Generate today's daily summary from session logs.

**Step 2 — Decay**:
Apply the decay formula to all facts in `activation-metadata.json`.

**Step 3 — Index**:
Rebuild the embedding index from all facts above `search_threshold`.

**Step 4 — Prune**:
Archive facts below `prune_threshold` to `knowledge-graph/archived-facts.jsonl`.
Remove them from the active `facts.jsonl` and the embedding index.

**Step 5 — Reconcile MEMORY.md**:
Read all facts with activation > 0.5. Compare against current MEMORY.md content.
Add any missing high-activation facts. Remove any entries whose underlying facts have decayed below 0.15.
Keep MEMORY.md under 200 lines.

**Step 6 — Clean sessions**:
Delete session files older than 30 days.
Delete daily summaries older than 365 days.

**Step 7 — Health check**:
Write results to `health.json` (see Monitoring section).

### 5. Recall and Search

When the agent needs to remember something:

1. **Exact match**: Search `facts.jsonl` for matching subject/predicate/object.
2. **Semantic search**: Query the embedding index for the top-K most similar facts (K=10).
3. **Activation filter**: Exclude results below `search_threshold` (0.15).
4. **Boost accessed facts**: Update activation scores for all returned facts.
5. **Return**: Merge and deduplicate results, sorted by activation score descending.

Always prefer facts from the knowledge graph over raw daily files. MEMORY.md is a summary — the graph is the source of truth.

## Monitoring and Health

### Health Check Output

Write to `health.json` after every consolidation run:

```json
{
  "timestamp": "2026-05-15T00:45:00Z",
  "status": "healthy",
  "disk_usage_bytes": 2147483648,
  "disk_usage_percent": 3.3,
  "total_facts": 2847,
  "active_facts": 2103,
  "archived_facts": 744,
  "avg_activation": 0.42,
  "daily_files_count": 128,
  "session_files_count": 45,
  "embedding_index_size_bytes": 52428800,
  "memory_md_lines": 87,
  "last_consolidation": "2026-05-15T00:30:00Z",
  "consolidation_duration_seconds": 142,
  "warnings": []
}
```

### Warning Conditions

Flag these in `health.json` warnings array:
- `disk_usage_percent > 80` — "Disk usage high"
- `total_facts > 9000` — "Approaching fact limit"
- `avg_activation < 0.2` — "Most facts are decaying; consider lowering half_life"
- `avg_activation > 0.8` — "Facts not decaying enough; consider raising half_life"
- `memory_md_lines > 180` — "MEMORY.md approaching 200-line limit"
- `consolidation_duration_seconds > 600` — "Consolidation taking too long"

### Quality Probe

Maintain a set of canary facts in `knowledge-graph/canary-facts.json`:

```json
[
  {
    "query": "When did the fleet deployment project start?",
    "expected_contains": "May 15, 2026"
  }
]
```

Periodically (weekly), run each canary query through the recall pipeline. Log the pass/fail ratio. If accuracy drops below 70%, add a warning to `health.json`.

## Cron Schedule (for VM deployments)

Set up these cron jobs for automated lifecycle management:

```
# Daily summarization at 23:00
0 23 * * * openclaw skill run vm-memory-oracle --action summarize

# Full consolidation at 00:30
30 0 * * * openclaw skill run vm-memory-oracle --action consolidate

# Health check every 6 hours
0 */6 * * * openclaw skill run vm-memory-oracle --action health-check

# Quality probe every Sunday at 03:00
0 3 * * 0 openclaw skill run vm-memory-oracle --action quality-probe
```

## Configuration

Override defaults by setting values in the agent's configuration:

```yaml
memory_oracle:
  data_path: /data/memory
  half_life_days: 30
  recall_boost: 0.3
  search_threshold: 0.15
  prune_threshold: 0.05
  max_facts: 10000
  session_retention_days: 30
  daily_retention_days: 365
  memory_md_max_lines: 200
  canary_check_interval: weekly
  embedding_model: multilingual-e5-large
  embedding_device: cpu
```

## Safety Guarantees

- **Local-only**: This skill never makes network requests. All data stays on the local filesystem.
- **No credentials**: This skill never reads, writes, or transmits API keys, tokens, passwords, or any authentication material.
- **No elevation**: This skill never uses sudo, su, or any privilege escalation.
- **Append-only writes**: Facts are appended, never bulk-overwritten. Archival moves facts to a separate file rather than deleting them.
- **Idempotent**: Running any operation twice produces the same result. Safe to retry after failures.
- **Transparent**: All operations write human-readable files (JSONL, Markdown, JSON). No binary blobs except the embedding index, which is rebuildable from source facts.