文件预览

README.md

查看 VM Memory Oracle 技能包中的文件内容。

返回技能详情下载技能包打开来源页

文件内容

README.md

# VM Memory Oracle

Production-grade memory persistence and lifecycle management for VM-hosted OpenClaw agents.

## What It Does

VM Memory Oracle gives your OpenClaw agent a structured, persistent memory that survives reboots, context compaction, and VM redeployment. It replaces fragile flat-file memory with a 4-layer architecture:

| Layer | Purpose | Storage |
|---|---|---|
| Knowledge Graph | Durable facts, entities, relationships | `facts.jsonl`, `entities.jsonl` |
| Semantic Index | Vector similarity search | `index.bin` (rebuildable) |
| Daily Summaries | Per-day session digests | `daily/YYYY-MM-DD.md` |
| Canonical Memory | Single source of truth | `MEMORY.md` |

Every fact has an activation score that decays over time. Frequently recalled facts stay prominent; unused facts gradually fade and get archived. This keeps your agent's memory focused and relevant.

## Key Features

- **Activation/Decay System** — Facts decay with a configurable half-life (default: 30 days). Recalled facts get boosted. Stale facts get archived, not deleted.
- **Nightly Consolidation** — Automated pipeline: summarize sessions, apply decay, rebuild index, prune stale facts, reconcile MEMORY.md, clean old files.
- **Health Monitoring** — Disk usage, fact count, activation distribution, consolidation duration, and quality probes all tracked in `health.json`.
- **Quality Probes** — Canary facts tested weekly to detect memory recall degradation.
- **Fully Local** — Zero network calls. Zero cloud dependencies. Zero credential handling. All data stays on your VM's filesystem.

## Requirements

- OpenClaw >= 1.8.0
- Linux (designed for VM deployments)
- `jq` (JSON processing)
- `cron` (scheduled tasks)

## Installation

```bash
openclaw skill install vm-memory-oracle
```

Or install from ClawHub:

```bash
clawhub install ssharif/vm-memory-oracle
```

## Quick Start

### 1. Configure your data path

```yaml
# In your OpenClaw agent config
memory_oracle:
  data_path: /data/memory
```

### 2. Initialize the directory structure

```bash
openclaw skill run vm-memory-oracle --action init
```

This creates:
```
/data/memory/
  knowledge-graph/
  embeddings/
  daily/
  sessions/
  activation-metadata.json
  MEMORY.md
  health.json
```

### 3. Set up automated maintenance

```bash
openclaw skill run vm-memory-oracle --action install-cron
```

This registers four cron jobs:
- **23:00** — Daily session summarization
- **00:30** — Full consolidation (decay, index, prune, reconcile)
- **Every 6h** — Health check
- **Sunday 03:00** — Quality probe

### 4. Verify

```bash
openclaw skill run vm-memory-oracle --action health-check
cat /data/memory/health.json | jq .
```

## Configuration Reference

| Parameter | Default | Description |
|---|---|---|
| `data_path` | `/data/memory` | Root directory for all memory files |
| `half_life_days` | `30` | Days until a fact's activation halves |
| `recall_boost` | `0.3` | Activation increase on each recall |
| `search_threshold` | `0.15` | Minimum activation for search results |
| `prune_threshold` | `0.05` | Activation below which facts get archived |
| `max_facts` | `10000` | Hard cap on active facts |
| `session_retention_days` | `30` | Days to keep raw session logs |
| `daily_retention_days` | `365` | Days to keep daily summaries |
| `memory_md_max_lines` | `200` | Maximum lines in MEMORY.md |
| `canary_check_interval` | `weekly` | How often to run quality probes |
| `embedding_model` | `multilingual-e5-large` | Model for semantic embeddings |
| `embedding_device` | `cpu` | Device for embedding inference (`cpu` or `gpu`) |

## VM Deployment Guide

This skill is designed for persistent VM environments where the memory directory lives on a dedicated data disk.

### Azure VM Setup

1. **Attach a separate managed disk** for `/data/memory/` with `deleteOption: Detach` so the disk survives VM redeployment.
2. **Format only on first boot** — use `overwrite: false` in cloud-init to prevent reformatting existing data.
3. **Mount at `/data/memory/`** in `/etc/fstab` with `nofail` option.

### Disk Sizing

| Workload | Recommended Size | Notes |
|---|---|---|
| Light (< 1K facts) | 16 GB | Personal agent |
| Medium (1K-5K facts) | 32 GB | Team agent |
| Heavy (5K-10K facts) | 64 GB | Production fleet |
| With large RAG corpus | 128-256 GB | Domain-specific knowledge |

### Backup

Pair with Azure disk snapshots or local tarballs. The consolidation pipeline creates a local backup before pruning:

```
/data/memory/backups/pre-maintenance-YYYYMMDD.tar.gz
```

Backups older than 7 days are automatically cleaned up.

## How It Works

### Fact Lifecycle

```
Session → Ingestion → Knowledge Graph → Embedding Index
                                ↓
                         Activation: 1.0
                                ↓
                    Decay applied nightly (half-life: 30d)
                                ↓
              ┌─────────────────┼─────────────────┐
              ↓                 ↓                 ↓
        Active (>0.15)   Fading (0.05-0.15)  Archived (<0.05)
        Appears in        Excluded from       Moved to
        search results    search, still       archived-facts.jsonl
                          in graph
```

### Daily Pipeline

```
23:00  Summarize today's sessions → daily/YYYY-MM-DD.md
00:30  Apply decay → activation-metadata.json
       Rebuild embedding index → embeddings/index.bin
       Prune stale facts → archived-facts.jsonl
       Reconcile → MEMORY.md
       Clean old sessions (>30d) and dailies (>365d)
       Health check → health.json
```

## Safety

| Property | Guarantee |
|---|---|
| Network access | None. Zero outbound connections. |
| Credentials | Never read, written, or transmitted. |
| Privilege escalation | Never uses sudo or su. |
| Write behavior | Append-only for facts. Archival instead of deletion. |
| Idempotency | All operations safe to retry. |
| Transparency | All files human-readable (JSONL, Markdown, JSON). |

## Monitoring Integration

`health.json` is designed for easy ingestion into monitoring systems:

- **Azure Log Analytics** — Ship with Azure Monitor Agent using a custom log table.
- **Prometheus** — Parse with a JSON exporter or node-exporter textfile collector.
- **Grafana** — Dashboard the `health.json` fields directly.

### Alert Thresholds

| Metric | Warning | Critical |
|---|---|---|
| Disk usage | > 80% | > 90% |
| Active facts | > 9,000 | > 9,500 |
| Avg activation | < 0.2 or > 0.8 | N/A |
| Quality probe accuracy | < 80% | < 70% |
| Consolidation duration | > 600s | > 1200s |

## Troubleshooting

**Memory not persisting across reboots:**
Check that `/data/memory` is mounted on a separate disk, not the OS disk. Verify the mount with `df -h /data/memory`.

**Consolidation taking too long:**
Reduce `max_facts` or increase `prune_threshold` to archive more aggressively. Check embedding index size — if it exceeds 4 GB, consider switching `embedding_model` to a smaller variant.

**Quality probe failing:**
Canary facts may have decayed. Re-add them with high activation, or lower `half_life_days` to keep important facts active longer.

**MEMORY.md too large:**
Lower `memory_md_max_lines` or increase `search_threshold` so fewer facts qualify for inclusion.

## License

MIT-0 — Use, modify, and redistribute freely. No attribution required.