Skip to main content

Analysis

Analysis lets you compare search results across different configurations to understand how changes impact result rankings. It answers questions like:

  • How much do results change when I modify my ranking model?
  • Are the top results stable between my baseline and candidate configurations?
  • Which queries are most affected by a configuration change?

How Analysis Works

An analysis compares query executions between a baseline configuration and one or more candidate configurations. For each query, Releval:

  1. Executes the query using each configuration (endpoint + query template)
  2. Collects the ranked results
  3. Computes overlap and similarity metrics between baseline and candidate results

Creating an Analysis

Live Execution

Create an analysis that executes queries in real-time against your endpoint:

curl -X POST "https://${RELEVAL_HOST}/api/v1/analysis" \
-H 'Content-Type: application/json' \
-H "Authorization: Bearer ${TOKEN}" \
-d '{
"name": "Ranking Model Comparison",
"endpoint_id": "${ENDPOINT_ID}",
"query_set_id": "${QUERY_SET_ID}",
"baseline": "${BASELINE_TEMPLATE_ID}",
"query_template_ids": ["${BASELINE_TEMPLATE_ID}", "${CANDIDATE_TEMPLATE_ID}"]
}'

You nominate one of the query templates as the baseline; the others are scored against it. Every query in the query set runs through every template, then the candidate results are compared against the baseline's.

From File Upload

Upload pre-computed execution results for analysis. The file must be JSON Lines; each line contains a query execution and its results.

curl -X POST "https://${RELEVAL_HOST}/api/v1/analysis/upload" \
-H "Authorization: Bearer ${TOKEN}" \
-F 'name=Imported Analysis' \
-F 'input=@executions.jsonl'

Analysis Metrics

Rank-Biased Overlap (RBO)

RBO measures how similar two ranked lists are, with a bias toward the top of the list. It returns a value between 0 and 1:

  • 1.0 — identical rankings
  • 0.0 — completely different rankings

RBO uses a persistence parameter p (default 0.9) that controls how much weight is given to top-ranked results versus deeper results. A higher p considers more of the list; a lower p focuses on the very top.

See Metrics for the mathematical definition.

Exact Top-N Matches

Counts how many candidates appear at the exact same position in both the baseline and candidate result lists. This is a strict measure — a candidate must be at the same rank in both lists to count.

Viewing Analysis Results

List Analyses

curl "https://${RELEVAL_HOST}/api/v1/analysis" \
-H "Authorization: Bearer ${TOKEN}"

Get Analysis Details

curl "https://${RELEVAL_HOST}/api/v1/analysis/${ANALYSIS_ID}" \
-H "Authorization: Bearer ${TOKEN}"

Check Analysis Status

Analysis runs asynchronously. Check its progress:

curl "https://${RELEVAL_HOST}/api/v1/analysis/${ANALYSIS_ID}/status" \
-H "Authorization: Bearer ${TOKEN}"

View Query-Level Metrics

See how each individual query performed:

curl "https://${RELEVAL_HOST}/api/v1/analysis/${ANALYSIS_ID}/metrics?query_name=running+shoes" \
-H "Authorization: Bearer ${TOKEN}"

Browse Executions

View the actual query executions (paginated):

curl "https://${RELEVAL_HOST}/api/v1/analysis/${ANALYSIS_ID}/executions?page=1&page_size=20" \
-H "Authorization: Bearer ${TOKEN}"

Use Cases

A/B Testing Ranking Models

  1. Create two query templates — one for your current ranking logic, one for the new model
  2. Run an analysis with the current template as baseline
  3. Review RBO scores to see how much results differ
  4. Focus on queries with low RBO to understand where the new model diverges most

Validating Index Changes

Before deploying a schema or mapping change:

  1. Run an analysis against the current index
  2. Apply the change to a staging index
  3. Run another analysis against the staging index
  4. Compare metrics to verify results are stable or improved