Skip to main content

Corpora

A Corpus represents a document collection that one or more search endpoints query against — for example, a product catalogue, a knowledge base, or a movie database. The corpus is the underlying data; endpoints are the different ways your search infrastructure exposes that data.

A corpus has just two properties: a name and an optional description. Its value comes from what it groups together.

Why corpora matter

The most important effect of a corpus is judgment portability. Judgments and judgment lists are scoped per corpus, not per endpoint, which means:

  • A judgment of "Air Zoom Pegasus is highly relevant for 'running shoes'" can be reused across every endpoint that points at the same product corpus.
  • When you spin up a new endpoint to test a different ranking model on the same data, the new evaluation run automatically benefits from existing judgments without re-judging the same candidates.
  • Imported judgment lists (from logs, prior campaigns, or expert annotation) attach to a corpus and are immediately available to every endpoint sharing it.

A judgment is uniquely keyed by (corpus, query, candidate), so the same query and candidate in a different corpus is treated as a separate judgment.

Modelling your corpora

A common pattern: one corpus per logical document collection, multiple endpoints per corpus. For example:

CorpusEndpoints sharing it
products-ukprod-elasticsearch, prod-elasticsearch-bm25-tuned, prod-rerank-v2
support-articlessupport-elasticsearch, support-vector
movies-demomovies-elasticsearch, movies-opensearch

Use a separate corpus when:

  • The underlying documents differ (a product catalogue vs. a knowledge base).
  • The document IDs aren't compatible across systems (judgments would mismatch).
  • You want judgments isolated for compliance or experimentation reasons.

Reuse an existing corpus when you're trying different search configurations against the same data.

Creating a corpus

In the UI

  1. Navigate to Corpora and click Create Corpus.
  2. Enter a Name and optional Description.
  3. Click Create.

You can then assign endpoints to it from the endpoint creation or edit page.

Using the API

curl -X POST "https://${RELEVAL_HOST}/api/v1/corpora" \
-H 'Content-Type: application/json' \
-H "Authorization: Bearer ${TOKEN}" \
-d '{
"name": "products-uk",
"description": "UK product catalogue, used by all production search variants."
}'

The response includes the new corpus ID, which you supply as corpus_id when creating an endpoint.

Managing corpora

List corpora

curl "https://${RELEVAL_HOST}/api/v1/corpora" \
-H "Authorization: Bearer ${TOKEN}"

Each entry includes an endpoint_count so you can see how many endpoints attach to it.

Update a corpus

curl -X PUT "https://${RELEVAL_HOST}/api/v1/corpora/${CORPUS_ID}" \
-H 'Content-Type: application/json' \
-H "Authorization: Bearer ${TOKEN}" \
-d '{
"name": "products-uk",
"description": "UK product catalogue, including refurbished items."
}'

Delete a corpus

curl -X DELETE "https://${RELEVAL_HOST}/api/v1/corpora?corpus_id=${CORPUS_ID}" \
-H "Authorization: Bearer ${TOKEN}"

A corpus that still has endpoints attached cannot be deleted — the API returns 409 Conflict. Delete or reassign the endpoints first, then delete the corpus.