Upgrades to Brain and Memory: Bringing QMD Back Online
How a deliberate hardware upgrade to support local LLM inference exposed a silent dependency — and why QMD is the right fix for Max’s memory problems.
If you’ve spent any time running a persistent AI agent, you’ve probably noticed something frustrating: the agent forgets things it shouldn’t. Not because the underlying model is incapable, but because the tooling for surfacing the right memories at the right moment is harder than it looks. Max — my home AI agent running on OpenClaw on a server called Elmore — has been suffering from exactly this problem. The plan to fix it properly required new hardware. Getting the hardware in place exposed a problem we didn’t know we had.
The Problem with AI Agent Memory
OpenClaw’s builtin memory engine uses SQLite with vector embeddings, and while it works, it lacks two things that matter enormously in practice: query expansion — the ability to reformulate a vague question into something more searchable — and reranking — the ability to re-score results by actual relevance rather than raw vector similarity. The result is an agent that can have a rich conversation history and still draw a blank on something it absolutely should remember.
The fix is a tool called QMD, built by Tobi Lütke (founder of Shopify). QMD is a local-first search sidecar that runs alongside OpenClaw, combining BM25 full-text search, vector semantic search, and LLM-powered reranking — all running locally via node-llama-cpp with GGUF models. No API calls, no cloud dependency, no per-query costs.
“A mini CLI search engine for your docs, knowledge bases, meeting notes, whatever. Tracking current SOTA approaches while being all local.” — QMD README
Where OpenClaw’s builtin engine asks “what vectors are close to this query?”, QMD asks “what documents actually answer this question?” — and uses a local LLM to figure out the difference. OpenClaw supports QMD as an optional memory backend and manages the sidecar lifecycle automatically. If QMD fails for any reason, OpenClaw falls back to the builtin engine gracefully.
Why Elmore Needed New Hardware
Running QMD’s reranking and embedding pipeline locally requires real compute. The larger goal — giving Max a fully local LLM as its reasoning brain rather than routing every inference through a cloud API — requires even more. The old system couldn’t support it, so Elmore was rebuilt:
Motherboard ASUS TUF Gaming B650E-E WiFi
GPU NVIDIA RTX 5060 Ti — 16 GB VRAM
RAM 32 GB Corsair Vengeance DDR5 @ 6000 MHz (CL38)
Cooler Thermalright Phantom Spirit 120SE
The OS migrated cleanly. OpenClaw came back up without complaint. Max was responding. Everything looked fine — but under the hood, QMD was silently gone. The binary had been installed on the old system and the new environment had no trace of it. OpenClaw had fallen back to the builtin engine without raising an alarm.
Finding the Problem
The investigation started with a simple file search to see what SQLite databases were present:
find ~/.openclaw -name "*.sqlite" 2>/dev/null
That returned an index file at ~/.openclaw/agents/main/qmd/xdg-cache/qmd/index.sqlite — proof QMD had been running before the upgrade. But:
which qmd # (no output)
Gone. The binary wasn’t on the PATH anywhere.
The Reinstall: More Complicated Than Expected
The QMD README suggests installing via Bun. That turned out to be the wrong path on this system. Installing from the GitHub URL pulled the raw TypeScript source and tried to compile it — which failed with hundreds of type errors. Even after the postinstall scripts ran successfully, the compiled dist/ directory never appeared.
The root cause is a known ABI mismatch: Bun compiles native modules against its own internal ABI, but the QMD CLI shebang is #!/usr/bin/env node. When the system’s Node.js runs it, the versions don’t match and every command fails. The fix is straightforward — install via npm instead, which compiles against the system Node correctly:
npm install -g @tobilu/qmd
There was one more wrinkle: a stale Bun shim was still cached in the shell and continued intercepting the qmd command even after the npm install succeeded. Removing it cleared the path:
rm -f ~/.bun/bin/qmd hash -r qmd --version # qmd 2.1.0
The Complete Fix
- Install Bun (needed for other OpenClaw tooling, but not for QMD):
curl -fsSL https://bun.sh/install | bash - Install QMD via npm:
npm install -g @tobilu/qmd - Symlink the binary so the OpenClaw gateway service can find it:
sudo ln -sf ~/.npm-global/bin/qmd /usr/local/bin/qmd - Enable QMD as Max’s memory backend:
openclaw config set memory.backend qmd - Increase the status probe timeout for first-run model loading:
openclaw config set memory.qmd.limits.timeoutMs 120000 - Pre-warm QMD using the same XDG directories OpenClaw uses:
XDG_CACHE_HOME=~/.openclaw/agents/main/qmd/xdg-cache qmd embed - Restart the gateway and verify:
openclaw gateway restart && openclaw memory status
What’s Different Now — and What’s Next
With QMD active, Max’s memory search runs through a three-stage pipeline: BM25 keyword retrieval, vector similarity search, and LLM reranking — all local, all on Elmore’s hardware. Query expansion means a vague question gets reformulated into something that actually finds the right notes. Reranking means the top result is the most relevant one, not just the nearest vector.
QMD also detected the RTX 5060 Ti automatically via Vulkan, with full GPU offloading enabled. That’s the first sign the new hardware is doing what it was built to do. Whether it fully resolves Max’s memory problems remains to be seen in practice — but the infrastructure is now correct.
The bigger prize is still ahead: running a fully local LLM on Elmore as Max’s reasoning brain. Sixteen gigabytes of VRAM makes that possible in a way it simply wasn’t before. That’s the next chapter.
Max runs on OpenClaw 2026.4.10 on Elmore · QMD 2.1.0 · RTX 5060 Ti · April 2026