Deterministic retrieval glossary
Plain-language definitions for the terms behind deterministic retrieval, RAG, and AI agents — and how they connect to ColdState. Each links to the guide that goes deeper.
Approximate nearest neighbor (ANN)
The search technique vector databases use to find the closest embeddings to a query without comparing against every entry. It trades exactness for speed — which means results can vary between runs or index builds, one reason embedding search is non-deterministic. See: RAG alternatives.
Content hash
A fingerprint of a knowledge entry's exact text. ColdState returns a content_hash with every fetched or cited fact, so an AI agent can store it and later call verify to confirm the source hasn't changed — the basis of reproducible, checkable citations. See: The AI for AI.
Cross-domain retrieval
Searching across many subject areas at once rather than a single siloed corpus, and surfacing the connections between them. ColdState searches 35 domains together and flags when a related entry comes from a different field. See: Silo vs Cross-Domain.
Deterministic retrieval
A search method that returns the same ranked results for the same query, every time — no embeddings, no approximate search, no drift between runs. Because it is reproducible, results can be cached, snapshotted, and audited. See: Deterministic retrieval.
Embedding
A numeric vector a neural model produces to represent the meaning of text. Embedding search compares these vectors to find similar content. It is powerful but probabilistic: embeddings change with model versions, so rankings can shift. ColdState scores relevance without a query-time embedding step. See: Deterministic retrieval.
Grounding
Giving a language model real, retrieved facts to base its answer on instead of relying on its parameters alone. Grounding is how you reduce hallucination — and why a retrieval tool is one of the most valuable things an AI agent can call. See: Knowledge vs Gap.
Hallucination
When a language model produces fluent, confident text that isn't true — typically because it had no grounded source and filled the gap with a guess. Deterministic retrieval with verifiable citations is one defense. See: Knowledge vs Gap.
KB snapshot
A version fingerprint of ColdState's knowledge base. Pin a kb_snapshot and the same query returns identical results, so an agent can reproduce an entire run — or audit it later. See: The AI for AI.
Knowledge API
An API that gives applications and AI agents structured access to a knowledge base. ColdState's Knowledge API exposes 48.4M entries across 35 domains over REST and MCP, with deterministic search and citation tools. See: API docs.
MCP (Model Context Protocol)
An open standard for connecting AI assistants to external tools and data. An MCP server advertises tools a model can call with structured arguments — letting assistants like Claude search, fetch, and verify instead of guessing. See: What is an MCP server?.
RAG (retrieval-augmented generation)
A pattern where a system retrieves relevant documents and feeds them to an LLM to ground its answer. Classic RAG uses an embedding model, a vector database, and often a reranker — a probabilistic stack with per-query inference cost. See: RAG alternatives.
Reproducibility
The property that the same inputs always produce the same outputs. For AI agents it means stable evals and replayable failures. ColdState supports it with a pinnable kb_snapshot and content hashes you can re-verify. See: Deterministic retrieval.
Reranker
A second-stage model that reorders an initial set of search results by relevance. It improves quality but adds latency, cost, and another non-deterministic step to a RAG pipeline. See: RAG alternatives.
Topology states (CRYSTALLINE · FLUID · REACTIVE)
ColdState's signal describing the shape of a result set: CRYSTALLINE for a single tight cluster of matches, FLUID for several relevant groups, REACTIVE for many novel cross-domain connections. It tells an agent how confident or exploratory a match is. See: API docs.
Vector database
A database built to store and search embeddings using approximate nearest-neighbor search. It is the backbone of most RAG systems — and the component a deterministic approach removes. See: RAG alternatives.
See determinism in practice
These ideas are a deterministic search API and MCP server you can call today — free to start, no embeddings, no per-query inference.