fr vs pickbrain — Semantic search implementations compared

Both tools point at the same corpus — Claude, Codex, and other coding agent transcripts. Both call themselves "search". But only one of them does semantic search. fr is a Tantivy BM25 engine with edit-distance-1 fuzzy fallback — purely lexical. pickbrain is a multi-vector neural retriever built on Stanford's XTR-WARP — actually semantic.

At a glance

Two different problems wearing the same shape.

fast-resume — Python + Tantivy

Keyword search

Backend: Tantivy — Rust BM25 engine (Lucene-style)
Storage: Tantivy index directory
Query: BM25 boosted ×5 ∪ fuzzy term query (edit distance 1, prefix mode)
Filters: agent: dir: date: with ! negation and comma-OR
Adapters: Claude · Codex · Copilot CLI · Copilot VS Code · Crush · OpenCode · OMP · Pi · Vibe
Model: none — pure lexical

pickbrain

witchcraft — Rust + Candle (XTR-WARP)

Semantic + hybrid

Backend: Witchcraft — clean-room reimplementation of Stanford's XTR-WARP
Storage: Single SQLite file with FTS5 + 4-bit packed residual blobs
Query: T5 encoder → 128-dim per-token vectors → MaxSim over centroid-pruned candidates → optional FTS5 fusion via RRF
Filters: --session UUID, programmatic SQL filter
Adapters: Claude Code · Codex · Slack history
Model: Google DeepMind's XTR — T5 encoder, GGUF-quantized; Metal/OpenVINO/fbgemm backends

Pipelines

What happens between query and result.

fr · query path

parse_query()

Strip agent:, dir:, date: keywords from input — remainder is free text.

↓

BM25 ×5

Tantivy parse_query over title + content, boosted

Fuzzy term

edit distance ≤ 1, prefix=true, per token

⊕ boolean OR (both branches lexical)

Apply filters

term-set / regex / range queries on agent, dir, timestamp fields

↓

Sort & return

auto: empty query → newest, else relevance. Or explicit newest / oldest / relevance.

pickbrain · query path

embedder.embed(q)

Tokenize → T5 encoder forward pass → drop low-norm tokens (norm < 1.0) → L2-normalize → matrix [n_tokens × 128]

↓

match_centroids()

top-32 centroids per query token; ≤40k candidates; decompress 4-bit residuals; MaxSim aggregate

fulltext_search()

SQLite FTS5 BM25 (only in hybrid mode)

⊕ Reciprocal Rank Fusion · k = 60

Apply SQL filter

optional rowid filter via temporary-table join

↓

Hydrate & return

score-sorted document chunks with sub-index for span highlighting

Implementation matrix

The detail underneath each box.

Aspect	fr	pickbrain
Search type	lexical BM25 + 1-edit fuzzy	semantic XTR multi-vector + optional BM25
Embedding model	none	XTR (Google DeepMind), T5 encoder, 128-dim per token
Document representation	Tokenized text via Tantivy default tokenizer; `raw` on filter fields	Matrix of token embeddings per chunk; 4-bit packed residuals around k-means centroids
Index structure	Tantivy inverted index (Lucene-style postings)	LSM-tree of centroid generations · L0=1024, fanout=16
Scoring algorithm	BM25 from Tantivy + Jaro-Winkler post-rerank for sub-chunks	Per-token MaxSim (à la ColBERT/XTR), with missing-token imputation for pruned candidates
Hybrid mode	Same engine, "hybrid" = exact ⊕ fuzzy both branches still lexical	Two engines fused via RRF: semantic + SQLite FTS5 BM25
Long-doc strategy	Whole content field, BM25 length-normalized	Sliding window: 2048 tokens, 256 stride; sum overlapping embeddings before normalize
Typo tolerance	yes fuzzy `distance=1, prefix=true`	yes via subword tokenization
Synonym matching	no	yes embedding similarity is the synonym mechanism
Concept matching	no	yes token-level late interaction
Filter syntax	`agent:claude,!codex dir:proj date:<1d` rich, native	`--session UUID` + SQL filter sparse, programmatic
Build cost	low no model, no GPU	One T5 encoder pass per chunk Metal/OpenVINO accelerates
Query latency	Sub-millisecond Tantivy lookup	~21 ms p.95 on NFCorpus (M2 Max), end-to-end incl. encoder
Quality benchmark	Not measured against IR benchmarks	~0.33 NDCG@10 on NFCorpus (matches reference XTR-WARP)
Source corpora	9 coding agents	Claude Code, Codex, Slack history
UI	Textual TUI w/ live indexing & preview	Plain CLI; piped output
Operational footprint	tiny pip install, runs anywhere	~250 MB T5 weights · Cargo + Make build · GPU helpful

Behaviour

Same query, different results.

Each row pairs a query you might type while looking for a past session with the wording the transcript actually contains. Where the query and the transcript share no common tokens, only embedding-space similarity can recover the match.

Query you remember	Actual transcript wording	fr	pickbrain
auth middleware fix	resolved JWT validator bypass	missno shared tokens	hitembedding similarity
react state management	refactored the Zustand store	miss	hit
off-by-one	the loop terminated one iteration early	miss	hit
that thing about caching	memoized the embedder output across calls	miss	hit
auth midware	authentication middleware	hitfuzzy edit distance 1	hitsubword tokens
getUserById	defined getUserById in user.service.ts	hitexact term, BM25 fast	hitvia hybrid FTS5 path
agent:claude dir:foo api auth	— filter syntax, not transcript	nativeparse_query parses keywords	partialSQL filter only, no DSL
(empty)	browse mode — list everything	nativenewest-first sort	n/aembedder needs ≥ 4 chars

Picking a tool

Reach for which one, when.

You remember an exact identifier — getUserById, RUST_BACKTRACE, a CLI flag.
You want to slice by metadata. The agent:claude,!codex dir:proj date:<1d grammar is fluent and there's no equivalent on the pickbrain side.
You want broad agent coverage — nine adapters out of the box.
You want to scroll, skim, and live-preview. The Textual TUI is built for that.
You don't want to set up a Rust toolchain, download model weights, or use any GPU.
You don't want false positives from semantic drift cluttering the result list.

You phrase the query conceptually — "the auth thing", "where I fixed the caching bug", "the React state refactor".
The transcript may use a different vocabulary than you do now (synonyms, paraphrases, alternate framings).
You want a single ranked list combining semantic + BM25 evidence. RRF hybrid mode delivers that natively.
You're calling search from inside Claude Code or Codex as a skill — programmatic, not interactive.
You want token-level highlighting that points at the matching span inside a long chunk.
You're OK paying ~21 ms per query and ~250 MB of model weights on disk.

Verdict

Different tools for different recalls.

Calling fr a "semantic search" engine overstates it — it's a polished BM25-with-fuzzy index over a wide range of agent transcripts, with the best filter syntax of the two and the best TUI. What it cannot do is bridge vocabulary mismatch: if your query and the transcript share no tokens, fr never finds the match.

pickbrain, riding on Witchcraft's XTR-WARP reimplementation, does actual neural multi-vector retrieval. MaxSim aggregation over per-token embeddings is the architecture that gives ColBERT and XTR their strong scores on IR benchmarks; WARP centroid pruning + 4-bit residual storage keeps it under 25 ms end-to-end on a laptop. Hybrid mode adds BM25 evidence via RRF, so you don't lose lexical precision for exact identifiers.

Recommendation: if you want the best semantic search across AI sessions specifically, pickbrain isn't merely better — it's the only contender. Use fr for fast scrolling, exact-identifier lookups, and metadata slicing across all your coding agents. The two tools answer different questions over the same corpus.