Architecture
System internals for contributors.
This page describes how mx is built. It covers the module structure, dispatch model, storage backends, and encoding pipeline. The audience is contributors reading the source code, not users running commands.
Table of contents
Overview
mx is a single-binary Rust CLI built on three pillars:
clap derive for the command tree – every subcommand, flag, and validation rule is expressed as Rust types in
src/cli.rs.SurrealDB for the knowledge graph – an embedded SurrealKV database (or optional network WebSocket connection) stores entries, relationships, tags, embeddings, and metadata.
base-d for commit encoding – a separate crate that hashes, compresses, and encodes commit messages through randomly selected dictionaries.
The binary is mx. There is no library crate;
main.rs declares modules and calls into handlers. The
Rust edition is 2024.
Key dependencies:
Crate |
Version |
Role |
|---|---|---|
|
4 |
CLI parsing with derive macros |
|
2 |
Embedded + WebSocket knowledge store |
|
3 |
Dictionary-based hash/compress encoding |
|
1 |
Async runtime for SurrealDB (multi-thread) |
|
0.22 |
Local vector embeddings via ONNX inference (BGE-Base-EN-v1.5, 768-dim) |
|
1 / 1 / 0.8 / 0.9 |
Serialization across JSON, TOML, YAML |
|
0.4 |
Timestamps with serde support |
|
1 / 2 |
Error handling (anyhow for handlers, thiserror for typed errors) |
|
0.12 |
HTTP client for GitHub API calls |
|
10 |
JWT signing for GitHub App auth |
|
0.13 |
Fence-aware heading extraction |
|
2 |
Terminal colors |
Module structure
All source lives under src/. The top-level modules
declared in main.rs are:
src/
main.rs # entry point, Cli::parse(), match on Commands
cli.rs # the full command tree (clap derive enums)
paths.rs # single source of path truth
handlers/ # command handler routing
mod.rs # top-level dispatchers (pr, github, codex, log, show, etc.)
memory.rs # mx memory subcommand handler
kv.rs # mx kv subcommand handler
metadata.rs # metadata subcommand handler (categories, tags, etc.)
state.rs # mx state subcommand handler (deprecated)
commit.rs # encoding pipeline (hash + compress + encode)
knowledge.rs # KnowledgeEntry struct (the core data model)
store.rs # KnowledgeStore trait (abstract storage interface)
surreal_db/ # SurrealDB implementation of KnowledgeStore
mod.rs # SurrealDatabase struct, with_db! macro, RecordId
connection.rs # SurrealMode, SurrealConfig, SurrealConnection enum
knowledge.rs # SurrealKnowledgeRecord DTO, query hydration
queries.rs # backup operations, query helpers
lookups.rs # lookup table CRUD (categories, agents, projects, etc.)
relationships.rs # graph edge operations (relates_to)
trait_impl.rs # KnowledgeStore impl for SurrealDatabase
tests.rs # integration tests
codex/ # session conversation archival
mod.rs # manifest types, re-exports
archive/ # the archive pipeline
mod.rs # ArchiveRequest, ArchiveOptions, entry points
include.rs # IncludeSet (--include flag parser)
write.rs # per-session writer, --all driver loop
sources.rs # source walkers (subagent discovery, etc.)
paths.rs # archive-folder naming, short-ID extraction
backfill.rs # vault backfill (--backfill flag)
export/ # mx codex export pipeline
index/ # codex indexing
images.rs # base64 image extraction from JSONL
transcript.rs # conversation.md rendering
read.rs # list, read, search operations
migrate.rs # v1->v2 archive migration
notices.rs # vault-present warnings
chunking.rs # token-aware text chunking for embeddings
embeddings.rs # EmbeddingProvider trait, TractProvider
kv.rs # KV store engine (schema TOML + data JSON)
types.rs # shared domain types (Agent, Category, Project, etc.)
display.rs # safe_truncate, formatting helpers
tensor.rs # emotional state tensor encode/decode (deprecated, serves mx state)
github.rs # GitHub API operations (cleanup, comments)
sync/ # GitHub sync (issues, wiki)
convert.rs # md2yaml / yaml2md conversion
session.rs # deprecated session export (forwards to codex)
index.rs # legacy index operations
helpers.rs # shared utilities
wake_chunk.rs # wake ritual chunking
wake_ritual.rs # wake ritual flow
wake_token.rs # HMAC-signed wake session tokens
engage.rs # interactive wake engage mode
content_ops.rs # content editing operations (find/replace, append, etc.)
Module boundaries
The codebase follows a layered pattern:
CLI layer (
cli.rs) – pure data. No logic, no imports beyond clap. Every command variant, flag, and validation constraint is a type.Handler layer (
handlers/) – orchestration. Reads CLI args, calls into domain modules, formats output. Handlers ownprintln!andeprintln!. They do not own business logic.Domain layer (
commit.rs,knowledge.rs,store.rs,kv.rs,codex/,embeddings.rs,tensor.rs) – the actual work. Pure functions where possible, side effects isolated to well-defined boundaries (git subprocesses, database calls, filesystem writes).Infrastructure layer (
surreal_db/,paths.rs,github.rs) – external integrations. SurrealDB, filesystem, GitHub API.
Command dispatch
The dispatch path is:
main() -> Cli::parse() -> match cli.command { ... }
main.rs is small by design. It does three
things:
Emits a legacy-path deprecation note if
MX_MEMORY_PATHis set.Parses the CLI with
clap::Parser::parse().Pattern-matches on the top-level
Commandsenum and calls the appropriate handler.
Some commands dispatch directly to domain functions from
main.rs:
Commands::Commit { .. } => commit::upload_commit(..),
Commands::Log { .. } => handle_log(..),
Commands::Show { .. } => handle_show(..),Others dispatch through handlers/mod.rs:
Commands::Memory { command } => handle_memory(command, cli.verbose),
Commands::Kv { command } => handle_kv(command, cli.verbose),
Commands::Codex { command } => handle_codex(command),The handler functions in handlers/mod.rs then match
on the subcommand enum and call into domain modules. For example,
handle_codex matches on
CodexCommands::Archive,
CodexCommands::Export, etc., and routes each to the
appropriate function in codex::archive,
codex::export, or codex::read.
The Commit command
The Commit variant is handled inline in
main.rs rather than through a handler, because it has
two distinct modes selected by the --encode-only
flag:
Normal mode: calls
commit::upload_commit()with the message, stage/push flags, and display preferences.Encode-only mode: calls
commit::encode_commit_message()with explicit title and body text, prints the result, and exits. No git state is touched.
Exit codes
Most commands exit 0 on success or propagate an
anyhow::Error (which prints the error chain to stderr
and exits non-zero). The kv subcommand is the
exception: it uses typed exit codes (0 = OK, 1 = key not found, 2 =
type mismatch, 3 = schema missing, 4 = invalid input) so callers can
distinguish failure modes programmatically. The KvError
enum covers five typed variants: KeyNotFound,
TypeMismatch, SchemaMissing,
EntryNotFound (a specific entry ID was not found within
a key), and AmbiguousId (an ID prefix matched multiple
entries). Both EntryNotFound and
AmbiguousId map to exit code 4.
Path management
src/paths.rs is the single source of truth for every
filesystem path mx touches. The module is deliberately the
only file in the codebase that calls
dirs::home_dir(). Every other module that needs a path
calls a function from paths.rs.
The base directory
All paths derive from mx_home(), which resolves once
per process via OnceLock:
If
MX_HOMEis set and non-empty, use it.Otherwise, use
~/.mx/.
The result is cached for the lifetime of the process.
Derived paths
Each subsystem has its own function in paths.rs:
Function |
Returns |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
XDG cache or |
|
|
|
|
|
|
|
|
|
|
The _with() test-seam pattern
Pure resolution logic is factored into _with
variants that take env-var values as explicit parameters instead of
reading std::env:
fn codex_dir_with(env_val: Option<&str>, home: &Path) -> PathBuf {
if let Some(path) = env_val && !path.is_empty() {
return PathBuf::from(path);
}
home.join("codex")
}
pub fn codex_dir() -> PathBuf {
codex_dir_with(
std::env::var("MX_CODEX_PATH").ok().as_deref(),
mx_home(),
)
}Tests call the _with variant directly with
controlled inputs. The public function is a thin wrapper that reads
the env var and passes it in. This keeps tests parallel-safe (no
env-var mutation) and the resolution logic unit-testable in
isolation.
The same pattern is used by surreal_root_with,
model_cache_dir_with,
resolve_mx_home_with, and
resolve_kv_path_with.
External paths (read-only)
paths.rs also provides helpers for locations owned
by other tools that mx reads but never writes:
claude_dir()–~/.claude/claude_projects_dir()–~/.claude/projects/(override:MX_CLAUDE_PROJECTS_DIRfor tests)claude_subagents_dir(slug, session)– subagent JSONL locationclaude_sessions_dir()– per-PID liveness JSONsclaude_history_jsonl()– slash-command historyclaude_mcp_logs_dir(slug)– MCP server log parent directorywonka_vault_archives_dir()– legacy vault snapshots (~/.wonka/vault/archives/)
These are centralized in paths.rs so the codex
archive source walkers have a single source of truth for Claude’s
on-disk layout.
SurrealDB integration
The knowledge graph is backed by SurrealDB. The integration supports two connection modes:
Embedded mode (default)
Uses the SurrealKV engine – a local, file-based
key-value store compiled into the mx binary. No external server
process is required. The database files live at
$MX_HOME/memory/surreal/ (override with
MX_SURREAL_ROOT).
On first connection, the schema file
(schema/surrealdb-schema.surql) is applied via
include_str!. This is compiled into the binary – there
is no runtime file read. The schema uses
DEFINE ... IF NOT EXISTS and UPSERT
throughout, making it safe to re-apply on every startup.
Network mode
When MX_SURREAL_MODE=network, mx connects to an
external SurrealDB instance over WebSocket (ws:// or
wss://). The local surreal_root path is
unused. Authentication supports three levels (root, namespace,
database), configured via MX_SURREAL_AUTH_LEVEL.
Password can be provided directly (MX_SURREAL_PASS) or
read from a file (MX_SURREAL_PASS_FILE, useful for
agenix-managed secrets on NixOS).
Connection architecture
The connection is represented as an enum:
pub enum SurrealConnection {
Embedded(Surreal<surrealdb::engine::local::Db>),
Network(Surreal<WsClient>),
}A with_db! macro dispatches across both
variants:
macro_rules! with_db {
($self:expr, $db:ident, $body:expr) => {
match &$self.conn {
SurrealConnection::Embedded($db) => $body,
SurrealConnection::Network($db) => $body,
}
};
}This allows every query function to be written once and work
against both backends. The SurrealDatabase struct wraps
the connection and exposes synchronous methods that internally use a
block_on bridge over a global
OnceLock<Runtime> tokio runtime.
The KnowledgeStore trait
src/store.rs defines the KnowledgeStore
trait – the abstract interface for knowledge storage.
SurrealDatabase implements this trait in
surreal_db/trait_impl.rs. The trait surface
includes:
CRUD:
upsert_knowledge,get,deleteSearch:
search(full-text BM25),semantic_search(vector cosine similarity)Listing:
list_by_category,count_by_category,list_all,countWake cascade:
wake_cascade(layered identity retrieval)Lookups: categories, agents, projects, sessions, relationships, tags
Reinforcement:
reinforce(increment resonance, update activation metadata),update_activations(batch-reset decay clocks for search activation)Backups: pre-mutation content snapshots
The trait exists to decouple handler logic from the storage
backend. In practice, SurrealDatabase is the only
implementation.
Knowledge graph data model
The schema lives in schema/surrealdb-schema.surql
and is compiled into the binary. It defines a SCHEMAFULL
relational-graph model.
Core entity: knowledge
The central table is knowledge. Each row represents
one knowledge entry with the following field groups:
Identity and content:
title(string),body(optional string),summary(optional string)content_hash(string) – for change detection during seed/importformat–markdown,json, orstele:*variants
Classification (record links):
category(record<category>) – pattern, technique, insight, gotcha, reference, decision, bloom, sessionsource_type(record<source_type>) – manual, ram, cache, agent_sessionentry_type(record<entry_type>) – primary, summary, synthesiscontent_type(record<content_type>) – text, code, config, data, binarysource_project,source_agent,session– optional record links
Visibility:
visibility–publicorprivate(ASSERT constraint)owner– agent ID for private entries
Resonance (wake-up cascade):
resonance(int) – importance level, 1–10 with overflow for transcendentresonance_type– foundational, transformative, relational, operational, ephemeral, sessionlast_activated(datetime),activation_count(int)decay_rate(float, 0.0–1.0) – some memories fade, some do notanchors(array<string>) – IDs of related blooms this entry connects towake_phrases(array<string>) – verification phrases for the wake ritualwake_order(optional int) – custom sequence position
Embeddings:
embedding(optional array<float>) – 768-dim vector (BGE-Base-EN-v1.5). For chunked entries, this holds a normalized mean vector of all chunk embeddings (used byauto-anchor).embedding_model(optional string),embedded_at(optional datetime)chunk_count(int, default 0) – number of embedding chunks. Zero means the entry is unchunked (single embedding). A positive value means the entry was split into overlapping chunks stored in theembedding_chunktable.
Graph relations
SurrealDB’s graph relations replace traditional junction tables:
Relation table |
Direction |
Purpose |
|---|---|---|
|
knowledge -> tag |
Freeform labels |
|
knowledge -> applicability_type |
Scope constraints (language, platform, domain) |
|
knowledge -> knowledge |
Inter-entry graph edges |
|
project -> tag |
Project-level tags |
|
project -> applicability_type |
Project scope |
The relates_to relation carries a
relationship_type field
(record<relationship_type>) and is uniquely indexed on the
triple (from, to, type). Relationship types are: related,
supersedes, extends, implements, contradicts, example_of.
Lookup tables
Eight lookup tables provide controlled vocabularies:
category, project, agent,
applicability_type, source_type,
entry_type, content_type,
relationship_type, session_type,
tag. Default seed data is applied via
UPSERT in the schema file. Users can extend them
through mx memory categories add,
mx memory agents add, etc.
Full-text search
A simple analyzer (blank + class tokenizers,
lowercase filter) powers BM25 search indexes on title,
body, and summary. Searches via
mx memory search query all three indexes.
Vector search
Embeddings are 768-dimensional float arrays generated by tract-onnx (BGE-Base-EN-v1.5, local inference). The search strategy is brute-force cosine similarity – no HNSW index. This is deliberate at the current scale; the schema comment notes to reconsider when the store exceeds 50K vectors or 100ms query latency.
The EmbeddingProvider trait in
embeddings.rs abstracts the embedding backend.
TractProvider is the sole implementation. The model
cache location is controlled by
paths::model_cache_dir().
Two-phase semantic search
Semantic search uses a two-phase strategy to cover both unchunked entries and chunked entries:
Phase 1a: Query unchunked entries (those with
chunk_count <= 0or absent) by cosine similarity against theirembeddingfield. Returns up tolimitresults.Phase 1b: Query the
embedding_chunktable by cosine similarity. Returns up tolimit * 3results (over-fetching for deduplication).
Both queries run in a single SurrealDB request (chained statements).
Phase 2 (merge): Chunk results are deduplicated by
entry_id, keeping the maximum similarity score per entry. For each unique chunk entry, the fullknowledgerecord is fetched (with visibility, category, and resonance filters applied). The unchunked and chunk results are merged into a single scored map: if an entry appears in both result sets, the higher score wins. The final list is sorted by score descending and truncated tolimit.
This design means a long entry surfaces in search results if any 400-token section is semantically relevant, rather than only when the mean vector (which averages over all sections) happens to score well.
Embedding chunks
The embedding_chunk table stores per-chunk
embeddings for long entries (those exceeding 400 tokens). Each row
represents one chunk of a chunked entry:
entry_id(string) – thekn-prefixed ID of the parent knowledge entrychunk_index(int) – zero-based position within the entry’s chunk sequencechunk_text(string) – the decoded text of this chunktoken_offset(int) – token offset from the start of the original texttoken_count(int) – number of tokens in this chunkembedding(array<float>) – 768-dim vector for this chunkembedding_model(string) – model ID that generated the embeddingcreated_at(datetime)
The table is indexed on entry_id (for bulk deletion)
and uniquely indexed on (entry_id, chunk_index) (for
upsert). Chunks are deleted and re-created on every re-embed of the
parent entry. When a knowledge entry is deleted, its chunks are
cleaned up on a best-effort basis.
Chunking parameters: 400 tokens per chunk, 100-token overlap
(stride 300). These are defined in
ChunkConfig::default() in
src/chunking.rs.
Backups
The memory_backup table stores pre-mutation content
snapshots. Before any update, edit, append, prepend, or delete
operation, the current content is written to a backup row. Backups
reference entries by plain string ID (not a record link) so they
survive entry deletion.
Codex archive format
The codex is the session conversation archive.
mx codex archive captures Claude Code sessions from
~/.claude/projects/ into permanent storage at
$MX_HOME/codex/.
Archive directory layout
Each archive is a directory named with the pattern:
{date}_{short-session-id}[_{counter}]
For example: 2026-04-30_abc12345 or
2026-04-30_abc12345_2 for incremental saves.
Inside each archive directory:
{archive}/
manifest.json # metadata (version, timestamps, counts, checksums)
session.jsonl # raw session JSONL (unless --clean)
conversation.md # clean markdown transcript (when --clean or migrated)
images/ # extracted base64 images (v2+)
image_001.png
image_002.png
agents/ # subagent session JSONLs (when --include subagents)
agent-{uuid}.jsonl
Manifest
The manifest is a JSON file tracking archive metadata. The
current write version is 5. All fields added since v2 are
Option so older archives deserialize cleanly.
Key fields:
version– manifest format version (2–5)session_id– the Claude session UUIDarchived_at,session_start,session_end– timestampsproject_path– the working directory of the sessionmessage_count,agent_count– summary statisticsagents– array ofAgentInfo(id, file, message count)size_bytes,checksum– integrity dataimage_count,images– v2: extracted image metadatahas_clean_transcript– v3: whetherconversation.mdexistsuser_name,assistant_name– v4: configurable speaker namessource_breakdown– v5: per-sidecar byte counts
The IncludeSet
The --include flag on mx codex archive
controls which optional source artifacts are captured. It parses a
comma-separated string into a struct with boolean fields:
subagents(default: true) – capture subagent session JSONLsmcp– capture MCP server logstool_output– capture/tmptool outputshistory– capturehistory.jsonlsliceall/none– shortcuts
Source walkers
The archive pipeline uses source walkers to discover files for
capture. Currently sources.rs implements subagent
discovery (find_agent_sessions). The other source types
(MCP, tool-output, history) are declared in the
IncludeSet but their walkers are pending implementation
in future PRs.
KV store format
The KV store (src/kv.rs) is a lightweight local
state engine for agents. No networking, no database – just a TOML
schema file and a JSON data file per agent.
Schema (TOML)
Each agent’s schema lives at
$MX_HOME/kv/schema/{agent}.toml and declares the keys,
types, constraints, and defaults:
[keys.commit_count]
type = "counter"
min = 0
[keys.recent_files]
type = "history"
max_entries = 50
[keys.current_task]
type = "string"
default = ""
[keys.focus_areas]
type = "list"
description = "Areas of active focus"
[keys.session_state]
type = "state"
fields = ["mode", "context", "priority"]Supported types:
Type |
Behavior |
|---|---|
|
Integer with optional |
|
Simple string value. Supports |
|
Timestamped append-only log with optional
|
|
Ordered list with timestamps. Supports |
|
Named fields (like a struct). Supports single-field set
( |
Data (JSON)
The data file at $MX_HOME/kv/data/{agent}.json holds
current values. All writes are atomic: serialize to a temp file,
fsync, rename. The format is a flat JSON object keyed by the key
names from the schema.
History and list entries are stored as objects with
id (stable entry ID, serialized from the
id field), hash (legacy on-disk name for
the entry ID, read via serde(rename)),
value, ts, an optional data
field (arbitrary JSON object for structured metadata), and an
optional memory field (a kn- ID linking
the entry to a knowledge node in the memory graph). In the Rust
structs, the numeric sequence number is the index field
(serialized as id on disk) and the stable base58
identifier is the id field (serialized as
hash on disk). The on-disk names are preserved via
serde(rename) for backward compatibility – no data
migration is needed. The entry ID is a short base58 string generated
from blake3(key + timestamp + index) via base-d,
providing a stable identifier independent of numeric ordering. The
id (entry ID), data, and
memory fields all use #[serde(default)]
for backward compatibility – files written before these fields
existed are back-filled on first load (IDs are generated, data and
memory default to None) and saved automatically.
Schema mutation
The KvStore struct holds a schema_path
field alongside the existing data_path. The
add_key_to_schema() method validates the key name
(alphanumeric, underscores, hyphens; max 128 chars; no dots),
appends a [keys.<name>] block to the TOML file
without reformatting existing content, and re-parses the file to
update the in-memory Schema. This is exposed through
push --create <type> at the CLI layer, where the
handler calls add_key_to_schema before the normal push
path. If the key already exists, the method is a no-op.
The rename_key() method moves a key from one name to
another in both the schema and data files. It validates the new
name, checks that the old key exists and the new key does not, then
atomically swaps the in-memory entries before persisting. Data is
written first (higher-value file), then schema. If the data write
fails, in-memory mutations are rolled back. Entry IDs are stable
across renames – they were hashed from the original key name at
creation time and are never regenerated. This is exposed through
mx kv rename <old> <new> at the CLI
layer.
Per-agent keying
The active agent is determined by the
MX_CURRENT_AGENT environment variable. Schema and data
files are resolved via paths::kv_schema_path(agent) and
paths::kv_data_path(agent). The path resolution
includes a legacy fallback to ~/.crewu/kv/ for
migration purposes.
Memory pointers
KV keys can optionally link to a knowledge entry in the SurrealDB
store via a kn- ID reference. This allows an agent to
associate fast local state with richer knowledge graph entries. The
--memory flag on get, last,
since, search, random, and
dump resolves these references and displays the linked
entry.
Memory links exist at two levels: key-level (one pointer per key)
and per-entry (one pointer per history or list entry). Per-entry
links are set via push --memory at creation time or
set --id --memory on existing entries. When resolving,
per-entry memory wins over a legacy kn- value prefix,
which wins over the key-level fallback. The SearchHit
struct (returned by last, random,
search, since, and get --id)
carries the per-entry memory field for the handler to
resolve.
SearchHit derives serde::Serialize to
support the --json output flag. The serialized field
names are the Rust struct names (index,
id, value, ts,
data, memory) – deliberately different
from the on-disk serde(rename) aliases used by
HistoryEntry and ListEntry. The
data and memory fields use
#[serde(skip_serializing_if = "Option::is_none")] so
they are omitted from JSON output when not set.
Base-d integration
The base-d crate (version 3) provides the encoding
layer. It is used in three places:
commit.rs – the encoding pipeline
When mx commit runs:
get_staged_diff()captures the output ofgit diff --staged.encode_hash_with_registry()hashes the diff bytes with a random hash algorithm and encodes the hash through a random dictionary. This produces the commit title.encode_compress_with_registry()compresses the commit message with a random compression algorithm and encodes the compressed bytes through a second random dictionary. This produces the commit body.A footer tag is assembled:
[hash_algo:title_dict|compress_algo:body_dict].If both dictionaries are the same (dejavu), the marker
whoa.is appended.All parts are validated for unsafe characters (NUL, C0/C1 controls). If validation fails, the entire encode is retried with freshly rolled dictionaries, up to 5 attempts.
git_commit()writes the three-part message (title, body, footer) as the commit message.
The EncodedCommit struct captures all parts:
pub struct EncodedCommit {
pub title: String,
pub body: String,
pub footer: String,
pub dejavu: bool,
pub title_dict: String,
pub body_dict: String,
}handlers/mod.rs – the decoding pipeline
mx log uses a four-phase architecture:
Parse – raw CLI arguments (received as trailing varargs) are parsed into a structured
LogOptionswith separate fields for count, display mode (Compact,Full,Oneline, format presets, or custom format string), diff mode (None,Stat,ShortStat,Patch), decorate preference, and filter arguments. Custom--formatstrings and--graphare detected here and trigger a passthrough to rawgit logwith a stderr note.Harvest – a single
git logcall with a structured format string retrieves commit metadata (full hash, short hash, decorations, parents, author, date, committer, commit date, subject, body). Each commit body is decoded viatry_decode_commit_body().Attach diffs – if a diff mode was requested, a second
git logcall retrieves the diff output. Each diff block is matched to its corresponding commit by hash and attached as a string field.Render – the display mode selects a renderer. Each renderer prints the decoded message with the appropriate header format, followed by any attached diff output.
The -n/--count and --full
flags are not clap-managed – they are parsed internally from the
trailing varargs, following the same pattern as
mx show.
try_decode_commit_body() scans for the last
footer-shaped line (validated against the known compression
algorithm vocabulary). Everything above the footer is the encoded
payload; everything below is trailing content (dejavu markers,
user-appended notes). commit::decode_body() looks up
the dictionary from the footer, decodes, and decompresses. The scan
uses a “last wins” heuristic: if multiple footer-shaped lines appear
(e.g., from a user-amended commit that quotes a prior footer), the
last one is used.
handle_show() uses a two-pass approach: Pass 1
retrieves commit metadata and the encoded message (with
--no-patch), decodes it, and prints the header. Pass 2
retrieves the diff output (with --format="") and
streams it as-is. Passthrough detection skips decoding entirely for
ref:path syntax (file content viewing) and
--format/--pretty (user-controlled
output).
commit.rs – PR merge encoding
mx pr merge follows the same pipeline but sources
the diff from gh pr diff and the message from the PR
title and body. The encoded message is passed to
gh pr merge --subject ... --body ....
knowledge.rs – content hashing
KnowledgeEntry uses base-d’s hash encoding for
content hashing (via base_d::hash and
base_d::encode), producing the
content_hash field used for change detection during
seed/import operations.
Testing patterns
The _with() seam
The primary testing pattern in the codebase is the
_with() test seam described in Path management. Any function that reads
from the environment or calls dirs::home_dir() is split
into:
A
_with(...)variant that takes all external inputs as parameters (pure function).A public wrapper that reads the environment and delegates.
Tests call the _with variant directly, avoiding all
process-global state. This means the test suite runs safely in
parallel without #[serial] except for the handful of
tests that must observe the public wrapper’s env-var behavior.
serial_test
Tests that mutate process environment (e.g., clearing
MX_CLAUDE_PROJECTS_DIR to observe the default fallback)
are marked with #[serial] from the
serial_test crate. These are a small minority – the
_with() pattern eliminates the need for serialization
in most cases.
proptest
The proptest crate is available in dev-dependencies
for property-based testing. It is used selectively where input
domains are large (e.g., Unicode boundary testing for
safe_truncate).
Round-trip encoder tests
The try_decode_commit_body_tests module in
handlers/mod.rs tests the encode-decode round trip by
calling encode_commit() with known inputs and verifying
that try_decode_commit_body() recovers the original
message. An encode_until helper retries encoding with
different random dictionaries until a predicate is satisfied (e.g.,
dejavu vs. non-dejavu), filtering out dictionary/codec pairings that
produce unsafe output or fail round-trip.
KV store tests
The KV engine uses the same _with() approach for
path resolution (resolve_kv_path_with). Store tests
operate on temp directories and never touch the user’s real
~/.mx/kv/ state.
SurrealDB integration tests
The surreal_db/tests.rs module contains integration
tests that open a temporary embedded SurrealKV database, apply the
schema, and exercise the full KnowledgeStore trait
surface. Each test gets an isolated database directory.