Getting Started

This guide takes you from zero to your first measured evaluation run. By the end you'll have Releval running locally, a search endpoint configured, a query set executing against it, and NDCG / MAP / Precision metrics on the results.

Releval is self-hosted. The quickest install path is Docker Compose, which is what this guide uses.

Prerequisites

Docker and Docker Compose — recent versions; Compose v2 syntax is assumed throughout
A search system to evaluate. Releval supports Elasticsearch, OpenSearch, Solr, Vespa, any HTTP-based search API, and any rendered search results page. See Search Endpoints for the full list.
Roughly 2 GB of free disk for the database volumes after initial setup, plus headroom for ClickHouse if you intend to ingest user behaviour events.

Run with Docker Compose

Save the following as docker-compose.yaml:

services:
  releval:
    image: releval:latest
    container_name: releval
    ports:
      - "8080:8080"
    environment:
      # Must be set to Y to indicate acceptance of the EULA
      - ACCEPT_EULA=Y 
      - CONNECTIONSTRINGS__DEFAULTCONNECTION=Host=postgres;Port=5432;Database=releval;Username=postgres;Password=password
      - CONNECTIONSTRINGS__CLICKHOUSE=Host=clickhouse;Port=8123;Username=default;password=;Database=default;Compression=false
      - DATAPROTECTION__APPLICATIONNAME=Releval
      - DATAPROTECTION__KEYSDIRECTORY=/app/keys
    volumes:
      - releval-files:/app/files
      - releval-keys:/app/keys
    depends_on:
      postgres:
        condition: service_healthy
      clickhouse:
        condition: service_healthy

  postgres:
    image: postgres:18.3
    container_name: postgres
    ports:
      - "5432:5432"
    environment:
      - POSTGRES_PASSWORD=password
    volumes:
      - postgres-data:/var/lib/postgresql/data
    shm_size: '2gb'
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 5s
      timeout: 5s
      retries: 5

  clickhouse:
    image: clickhouse/clickhouse-server:25.8.22
    container_name: clickhouse
    hostname: clickhouse
    ports:
      - "8123:8123"
    volumes:
      - clickhouse-data:/var/lib/clickhouse
    ulimits:
      nofile:
        soft: 262144
        hard: 262144
    healthcheck:
      test: ["CMD-SHELL", "clickhouse-client --query 'SELECT 1'"]
      interval: 5s
      timeout: 5s
      retries: 5
    depends_on:
      clickhouse-keeper:
        condition: service_started

  clickhouse-keeper:
    image: clickhouse/clickhouse-keeper:25.8.22-alpine
    container_name: clickhouse-keeper
    hostname: clickhouse-keeper
    ports:
      - "9181:9181"
    volumes:
      - clickhouse-keeper-data:/var/lib/clickhouse-keeper

volumes:
  releval-files:
  releval-keys:
  postgres-data:
  clickhouse-data:
  clickhouse-keeper-data:

Three services back the app:

PostgreSQL stores evaluations, query sets, judgments, and member accounts.
ClickHouse (with ClickHouse Keeper) stores high-volume user behaviour events for the Workspace.
Releval is the API + UI server. The schema and a default administrator account are created automatically on first startup.

Bring everything up:

docker compose up -d

The app waits for both databases' healthchecks before it starts. To watch progress, run docker compose ps and confirm every service reaches healthy. Releval is then available at http://localhost:8080.

Tip

If startup hangs, the most likely culprit is the releval:latest image not being pulled. Run docker compose pull first if your registry needs explicit auth, or build the image locally before bringing the stack up.

Use the bootstrap administrator credentials on first login:

Field	Value
Username	`Admin`
Password	`Password1234!`

The username field is the literal string Admin — not an email address. The default account is intentional so the very first sign-in works without configuration; it's also the reason to rotate it before exposing the instance.

Caution

Change the default password immediately after first login. Open the Account menu and update it. While you're there, configure an authentication provider (GitHub, Google, OIDC) and invite your team rather than continuing to share Admin.

Your First Evaluation

The shortest path from a fresh install to numbers you can act on is five steps. Each one links to the page that covers it in depth — skim those if you want context, or follow the shortcut version here.

Connect a search endpoint

A search endpoint tells Releval where to send queries. Create one from Endpoints → New with:

A name (anything memorable)
The base URL of your search system
The endpoint type — elasticsearch, opensearch, solr, vespa, api, or page
An authentication method if your system requires it
A candidates mapping describing how to extract result IDs and titles from the response

Use Test Endpoint to send a sample query and inspect the parsed candidates before saving.

Define your queries

A query set is the list of searches you want to score. Create one from Query Sets → New and paste real searches users run (or upload them as JSONL).

Then create a query template — a parameterised request body that embeds each query into the format your endpoint expects (e.g. an Elasticsearch multi_match, a Solr q parameter). Templates are reusable across endpoints, so you can A/B-test ranking changes by swapping templates without touching the queries themselves.

Create and run the evaluation

An evaluation ties an endpoint, a query set, and a template together. Create one, then start a run — Releval executes every query in parallel, captures the responses, and prepares them for judgment.

Judge the results

Releval can't compute relevance metrics without knowing which results are actually relevant. There are two ways to provide that signal:

Manually — open the run, walk through the candidates, and rate each one. See Judging results.
With an AI Judge — configure a provider (OpenAI, Anthropic, Bedrock, Azure OpenAI, Ollama, or any OpenAI-compatible endpoint) and have an LLM rate the candidates against your prompt template. Useful when query sets are too large to judge by hand.

Pick a scale — binary, graded, or detailed — that matches how nuanced your judgments need to be.

Review the metrics

Once judgments are in, the run shows your chosen metrics (NDCG, MAP, MRR, ERR, Precision, Recall, …) at the run level and per-query. Clone the run to compare it against a tweaked endpoint or template — this is the main loop for measuring ranking changes.

Beyond the UI

Everything you can do in the UI is also available via the REST API, the gRPC API, and an MCP server that exposes the same operations to AI agents. Set up an App Client to authenticate API calls without sharing your member credentials, and you can drive evaluations entirely from CI to gate ranking changes on relevance regressions.

Next Steps

Get production-ready: authentication providers, email, data protection keys, and data storage configuration.
Pick the right scale and metrics for your evaluation methodology.
Capture real user clicks via the User Behaviour Insights Workspace and use them as implicit signals.
Add teammates from Administration → Members and Roles.

Prerequisites​

Run with Docker Compose​

Sign in​

Your First Evaluation​

Connect a search endpoint​

Define your queries​

Create and run the evaluation​

Judge the results​

Review the metrics​

Beyond the UI​

Next Steps​