Corpora
A Corpus represents a document collection that one or more search endpoints query against — for example, a product catalogue, a knowledge base, or a movie database. The corpus is the underlying data; endpoints are the different ways your search infrastructure exposes that data.
A corpus has just two properties: a name and an optional description. Its value comes from what it groups together.
Why corpora matter
The most important effect of a corpus is judgment portability. Judgments and judgment lists are scoped per corpus, not per endpoint, which means:
- A judgment of "Air Zoom Pegasus is highly relevant for 'running shoes'" can be reused across every endpoint that points at the same product corpus.
- When you spin up a new endpoint to test a different ranking model on the same data, the new evaluation run automatically benefits from existing judgments without re-judging the same candidates.
- Imported judgment lists (from logs, prior campaigns, or expert annotation) attach to a corpus and are immediately available to every endpoint sharing it.
A judgment is uniquely keyed by (corpus, query, candidate), so the same query and candidate
in a different corpus is treated as a separate judgment.
Modelling your corpora
A common pattern: one corpus per logical document collection, multiple endpoints per corpus. For example:
| Corpus | Endpoints sharing it |
|---|---|
products-uk | prod-elasticsearch, prod-elasticsearch-bm25-tuned, prod-rerank-v2 |
support-articles | support-elasticsearch, support-vector |
movies-demo | movies-elasticsearch, movies-opensearch |
Use a separate corpus when:
- The underlying documents differ (a product catalogue vs. a knowledge base).
- The document IDs aren't compatible across systems (judgments would mismatch).
- You want judgments isolated for compliance or experimentation reasons.
Reuse an existing corpus when you're trying different search configurations against the same data.
Creating a corpus
In the UI
- Navigate to Corpora and click Create Corpus.
- Enter a Name and optional Description.
- Click Create.
You can then assign endpoints to it from the endpoint creation or edit page.
Using the API
curl -X POST "https://${RELEVAL_HOST}/api/v1/corpora" \
-H 'Content-Type: application/json' \
-H "Authorization: Bearer ${TOKEN}" \
-d '{
"name": "products-uk",
"description": "UK product catalogue, used by all production search variants."
}'
The response includes the new corpus ID, which you supply as corpus_id when creating an
endpoint.
Managing corpora
List corpora
curl "https://${RELEVAL_HOST}/api/v1/corpora" \
-H "Authorization: Bearer ${TOKEN}"
Each entry includes an endpoint_count so you can see how many endpoints attach to it.
Update a corpus
curl -X PUT "https://${RELEVAL_HOST}/api/v1/corpora/${CORPUS_ID}" \
-H 'Content-Type: application/json' \
-H "Authorization: Bearer ${TOKEN}" \
-d '{
"name": "products-uk",
"description": "UK product catalogue, including refurbished items."
}'
Delete a corpus
curl -X DELETE "https://${RELEVAL_HOST}/api/v1/corpora?corpus_id=${CORPUS_ID}" \
-H "Authorization: Bearer ${TOKEN}"
A corpus that still has endpoints attached cannot be deleted — the API returns
409 Conflict. Delete or reassign the endpoints first, then delete the corpus.