vipune 0.5: agent memory without the agent

There is no shortage of agent memory systems. mem0, Letta, Zep, Voltropy's LCM, Claude Code's built-in memory, Cursor's context engine. My colleague Topi's Remind is a recent addition to the space. The common pattern across most of these is LLM-based distillation: raw data goes in, an LLM extracts higher-value memories, generalizations, or concepts. mem0 has been doing this for a while. Remind pushes it further with spreading activation retrieval, entity graphs, and outcome tracking, but the core technique is the same.

vipune does not use an LLM. That is a deliberate choice. You store text, it gets embedded locally (ONNX, bge-small-en-v1.5), and you search by meaning. No distillation step, no external model calls, no token cost for memory operations. The tradeoff is obvious: you do not get automatic generalization. What you get is a single binary with no dependencies that runs the same everywhere, offline, with zero configuration.

The 0.5 release adds CLI flags that make this simpler tool useful for things the larger systems were not designed for: scoped multi-agent memory within a single session, typed retrieval, and recency-weighted search that lets you use the same store as both long-term knowledge and short-term working memory.

(I have been toying with the idea of adding an optional distillation step via apfel, the CLI that exposes Apple Intelligence's on-device model on macOS 26+. A local 3B model with no API keys could handle memory consolidation without breaking vipune's zero-dependency, zero-cost model. I have been experimenting with apfel for other things and the on-device inference is fast enough to be practical. But that is future work, not a promise.)

Before getting into the new flags, it is worth looking at one of the more interesting architectural approaches in this space.

What Volt LCM gets right

Voltropy's Volt introduced Lossless Context Management earlier this year. The LCM paper is worth reading. The core insight: stop asking the model to manage its own memory and let the engine do it deterministically. LCM maintains a DAG of hierarchical summaries in a persistent store. Compaction happens asynchronously between turns. Nothing is lost. Sessions can run indefinitely.

The approach is sound. Volt performs well on long-horizon tasks because the model does not have to invent a memory strategy on the fly.

The limitation is structural. Volt is a complete terminal-based coding agent, forked from OpenCode. You get LCM, but you also get the entire agent runtime. If you are building your own harness, or running Claude Code, or using Cursor, Volt's memory is not something you can pull out and use separately.

vipune: just the memory

vipune is the memory layer without the agent. Single binary. No API keys, no daemon, no database server. Everything runs locally using ONNX embeddings (bge-small-en-v1.5). Install it, start using it.

cargo install vipune

Or grab the binary directly:

curl --proto '=https' --tlsv1.2 -LsSf \
  https://github.com/randomm/vipune/releases/download/v0.5.0/vipune-installer.sh | sh

With 0.5, the feature set covers the main things needed for multi-agent memory workflows. Three capabilities in particular.

Multi-agent scoping

Multiple agents can share a single vipune instance. Inside a git repo, vipune infers the project scope from the repository, so agents working in the same repo share memory by default:

# Both agents are in the same repo — memories are shared automatically
vipune add "Auth service uses JWT with RSA-256"
vipune search "authentication flow"

When you need isolation between agents in the same session, --project overrides the default:

# Agent A: backend scope
vipune --project "myapp/backend" add "Auth service uses JWT with RSA-256"

# Agent B: frontend scope
vipune --project "myapp/frontend" add "Token refresh handled in useAuth hook"

Outside a git repo, set VIPUNE_PROJECT to scope manually. Either way, each agent gets its own memory namespace, or shares one deliberately.

Typed memories

Not all memories are the same. A design decision is not the same thing as a guardrail. vipune now has five memory types:

Type	Purpose
`fact`	Default. Statements about the world.
`preference`	How the user or system prefers things done.
`procedure`	Step-by-step processes.
`guard`	Things that must not happen.
`observation`	Transient notes, intermediate findings.

vipune add "Never deploy to prod on Fridays" --memory-type guard
vipune add "Run migrations before schema tests" --memory-type procedure
vipune search "deployment rules" --memory-type guard,procedure

Agents can filter by type on search. A coding agent looking for guardrails does not need to wade through every factual observation from the last three days. This keeps search results relevant as the memory store grows.

The --status flag adds another axis. Memories start as active or candidate. The --supersedes flag atomically replaces an old memory with a new one:

vipune add "Alice now works at Google" --supersedes abc123-old-memory-id

One transaction. The old memory is marked superseded, the new one is active. No window where both are live.

Tunable recency

Search results combine semantic similarity with a time decay score:

score = (1 - recency_weight) * similarity + recency_weight * time_score

The default is 70% semantic, 30% recency. For long-term project memory, that balance works. For a single programming session where you want recent context to dominate:

vipune search "what did I just decide about the API" --recency 0.8

Now 80% of the ranking comes from how recent the memory is. This is what turns vipune from a knowledge base into working memory. Same binary, same data, different search behaviour depending on what you need right now.

For full-text matching, --hybrid enables BM25 alongside semantic search:

VIPUNE_HYBRID=true vipune search "JWT RSA-256"

Useful when you need exact keyword hits, not just meaning.

MCP server

vipune runs as an MCP server out of the box:

vipune mcp

This exposes store_memory, search_memories, list_memories, and supersede_memory as native tools. Claude Code, Cursor, and anything else that speaks MCP can use vipune as a memory provider without shell command wrappers. The MCP tools accept the same type, status, and filter parameters as the CLI.

Where this fits

The bigger memory systems do more than vipune. Letta has structured agents with memory tiers. Volt LCM has hierarchical DAG compaction. mem0 has managed cloud infrastructure. If you need those things, use those tools.

The niche vipune occupies is narrower. You have an agent (or several), you want them to remember things across turns or sessions, and you do not want to add a service, a daemon, or an account to make that happen. You want a binary you can call from a shell or expose over MCP. The --project, --memory-type, and --recency flags are what make that narrow niche practical for real workflows instead of just toy examples.

# In your CLAUDE.md or agent instructions:
# Use `vipune search` before starting work to check for relevant context.
# Use `vipune add` to store decisions, discoveries, and guardrails.
# Use `vipune add --memory-type guard` for things that must not be forgotten.

That is the whole integration.

GitHub | crates.io | CLI Reference