Analysis
Analysis lets you compare search results across different configurations to understand how changes impact result rankings. It answers questions like:
- How much do results change when I modify my ranking model?
- Are the top results stable between my baseline and candidate configurations?
- Which queries are most affected by a configuration change?
How Analysis Works
An analysis compares query executions between a baseline configuration and one or more candidate configurations. For each query, Releval:
- Executes the query using each configuration (endpoint + query template)
- Collects the ranked results
- Computes overlap and similarity metrics between baseline and candidate results
Creating an Analysis
Live Execution
Create an analysis that executes queries in real-time against your endpoint:
curl -X POST "https://${RELEVAL_HOST}/api/v1/analysis" \
-H 'Content-Type: application/json' \
-H "Authorization: Bearer ${TOKEN}" \
-d '{
"name": "Ranking Model Comparison",
"endpoint_id": "${ENDPOINT_ID}",
"query_set_id": "${QUERY_SET_ID}",
"baseline": "${BASELINE_TEMPLATE_ID}",
"query_template_ids": ["${BASELINE_TEMPLATE_ID}", "${CANDIDATE_TEMPLATE_ID}"]
}'
You nominate one of the query templates as the baseline; the others are scored against it. Every query in the query set runs through every template, then the candidate results are compared against the baseline's.
From File Upload
Upload pre-computed execution results for analysis. The file must be JSON Lines; each line contains a query execution and its results.
curl -X POST "https://${RELEVAL_HOST}/api/v1/analysis/upload" \
-H "Authorization: Bearer ${TOKEN}" \
-F 'name=Imported Analysis' \
-F 'input=@executions.jsonl'
Analysis Metrics
Rank-Biased Overlap (RBO)
RBO measures how similar two ranked lists are, with a bias toward the top of the list. It returns a value between 0 and 1:
- 1.0 — identical rankings
- 0.0 — completely different rankings
RBO uses a persistence parameter p (default 0.9) that controls how much weight is given to
top-ranked results versus deeper results. A higher p considers more of the list; a lower p
focuses on the very top.
See Metrics for the mathematical definition.
Exact Top-N Matches
Counts how many candidates appear at the exact same position in both the baseline and candidate result lists. This is a strict measure — a candidate must be at the same rank in both lists to count.
Viewing Analysis Results
List Analyses
curl "https://${RELEVAL_HOST}/api/v1/analysis" \
-H "Authorization: Bearer ${TOKEN}"
Get Analysis Details
curl "https://${RELEVAL_HOST}/api/v1/analysis/${ANALYSIS_ID}" \
-H "Authorization: Bearer ${TOKEN}"
Check Analysis Status
Analysis runs asynchronously. Check its progress:
curl "https://${RELEVAL_HOST}/api/v1/analysis/${ANALYSIS_ID}/status" \
-H "Authorization: Bearer ${TOKEN}"
View Query-Level Metrics
See how each individual query performed:
curl "https://${RELEVAL_HOST}/api/v1/analysis/${ANALYSIS_ID}/metrics?query_name=running+shoes" \
-H "Authorization: Bearer ${TOKEN}"
Browse Executions
View the actual query executions (paginated):
curl "https://${RELEVAL_HOST}/api/v1/analysis/${ANALYSIS_ID}/executions?page=1&page_size=20" \
-H "Authorization: Bearer ${TOKEN}"
Use Cases
A/B Testing Ranking Models
- Create two query templates — one for your current ranking logic, one for the new model
- Run an analysis with the current template as baseline
- Review RBO scores to see how much results differ
- Focus on queries with low RBO to understand where the new model diverges most
Validating Index Changes
Before deploying a schema or mapping change:
- Run an analysis against the current index
- Apply the change to a staging index
- Run another analysis against the staging index
- Compare metrics to verify results are stable or improved