Getting Started
This guide takes you from zero to your first measured evaluation run. By the end you'll have Releval running locally, a search endpoint configured, a query set executing against it, and NDCG / MAP / Precision metrics on the results.
Releval is self-hosted. The quickest install path is Docker Compose, which is what this guide uses.
Prerequisites
- Docker and Docker Compose — recent versions; Compose v2 syntax is assumed throughout
- A search system to evaluate. Releval supports Elasticsearch, OpenSearch, Solr, Vespa, any HTTP-based search API, and any rendered search results page. See Search Endpoints for the full list.
- Roughly 2 GB of free disk for the database volumes after initial setup, plus headroom for ClickHouse if you intend to ingest user behaviour events.
Run with Docker Compose
Save the following as docker-compose.yaml:
services:
releval:
image: releval:latest
container_name: releval
ports:
- "8080:8080"
environment:
# Must be set to Y to indicate acceptance of the EULA
- ACCEPT_EULA=Y
- CONNECTIONSTRINGS__DEFAULTCONNECTION=Host=postgres;Port=5432;Database=releval;Username=postgres;Password=password
- CONNECTIONSTRINGS__CLICKHOUSE=Host=clickhouse;Port=8123;Username=default;password=;Database=default;Compression=false
- DATAPROTECTION__APPLICATIONNAME=Releval
- DATAPROTECTION__KEYSDIRECTORY=/app/keys
volumes:
- releval-files:/app/files
- releval-keys:/app/keys
depends_on:
postgres:
condition: service_healthy
clickhouse:
condition: service_healthy
postgres:
image: postgres:18.3
container_name: postgres
ports:
- "5432:5432"
environment:
- POSTGRES_PASSWORD=password
volumes:
- postgres-data:/var/lib/postgresql/data
shm_size: '2gb'
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 5s
timeout: 5s
retries: 5
clickhouse:
image: clickhouse/clickhouse-server:25.8.22
container_name: clickhouse
hostname: clickhouse
ports:
- "8123:8123"
volumes:
- clickhouse-data:/var/lib/clickhouse
ulimits:
nofile:
soft: 262144
hard: 262144
healthcheck:
test: ["CMD-SHELL", "clickhouse-client --query 'SELECT 1'"]
interval: 5s
timeout: 5s
retries: 5
depends_on:
clickhouse-keeper:
condition: service_started
clickhouse-keeper:
image: clickhouse/clickhouse-keeper:25.8.22-alpine
container_name: clickhouse-keeper
hostname: clickhouse-keeper
ports:
- "9181:9181"
volumes:
- clickhouse-keeper-data:/var/lib/clickhouse-keeper
volumes:
releval-files:
releval-keys:
postgres-data:
clickhouse-data:
clickhouse-keeper-data:
Three services back the app:
- PostgreSQL stores evaluations, query sets, judgments, and member accounts.
- ClickHouse (with ClickHouse Keeper) stores high-volume user behaviour events for the Workspace.
- Releval is the API + UI server. The schema and a default administrator account are created automatically on first startup.
Bring everything up:
docker compose up -d
The app waits for both databases' healthchecks before it starts. To watch progress, run
docker compose ps and confirm every service reaches healthy. Releval is then available
at http://localhost:8080.
If startup hangs, the most likely culprit is the releval:latest image not being pulled.
Run docker compose pull first if your registry needs explicit auth, or build the image
locally before bringing the stack up.
Sign in
Use the bootstrap administrator credentials on first login:
| Field | Value |
|---|---|
| Username | Admin |
| Password | Password1234! |
The username field is the literal string Admin — not an email address. The default account
is intentional so the very first sign-in works without configuration; it's also the reason
to rotate it before exposing the instance.
Change the default password immediately after first login. Open the Account menu
and update it. While you're there, configure an authentication provider
(GitHub, Google, OIDC) and invite your team
rather than continuing to share Admin.
Your First Evaluation
The shortest path from a fresh install to numbers you can act on is five steps. Each one links to the page that covers it in depth — skim those if you want context, or follow the shortcut version here.
Connect a search endpoint
A search endpoint tells Releval where to send queries. Create one from Endpoints → New with:
- A name (anything memorable)
- The base URL of your search system
- The endpoint type —
elasticsearch,opensearch,solr,vespa,api, orpage - An authentication method if your system requires it
- A candidates mapping describing how to extract result IDs and titles from the response
Use Test Endpoint to send a sample query and inspect the parsed candidates before saving.
Define your queries
A query set is the list of searches you want to score. Create one from Query Sets → New and paste real searches users run (or upload them as JSONL).
Then create a query template — a parameterised request body that
embeds each query into the format your endpoint expects (e.g. an Elasticsearch
multi_match, a Solr q parameter). Templates are reusable across endpoints, so you can
A/B-test ranking changes by swapping templates without touching the queries themselves.
Create and run the evaluation
An evaluation ties an endpoint, a query set, and a template together. Create one, then start a run — Releval executes every query in parallel, captures the responses, and prepares them for judgment.
Judge the results
Releval can't compute relevance metrics without knowing which results are actually relevant. There are two ways to provide that signal:
- Manually — open the run, walk through the candidates, and rate each one. See Judging results.
- With an AI Judge — configure a provider (OpenAI, Anthropic, Bedrock, Azure OpenAI, Ollama, or any OpenAI-compatible endpoint) and have an LLM rate the candidates against your prompt template. Useful when query sets are too large to judge by hand.
Pick a scale — binary, graded, or detailed — that matches
how nuanced your judgments need to be.
Review the metrics
Once judgments are in, the run shows your chosen metrics (NDCG, MAP, MRR, ERR, Precision, Recall, …) at the run level and per-query. Clone the run to compare it against a tweaked endpoint or template — this is the main loop for measuring ranking changes.
Beyond the UI
Everything you can do in the UI is also available via the REST API, the gRPC API, and an MCP server that exposes the same operations to AI agents. Set up an App Client to authenticate API calls without sharing your member credentials, and you can drive evaluations entirely from CI to gate ranking changes on relevance regressions.
Next Steps
- Get production-ready: authentication providers, email, data protection keys, and data storage configuration.
- Pick the right scale and metrics for your evaluation methodology.
- Capture real user clicks via the User Behaviour Insights Workspace and use them as implicit signals.
- Add teammates from Administration → Members and Roles.