Evaluation Runs
Evaluation Runs represent the execution of an evaluation at a specific point in time.
An evaluation run captures the full set of query executions, results, and metrics generated when an evaluation is run. Each run is tied to a particular evaluation configuration, enabling comparisons across runs. Runs provide the data needed to analyze performance trends, validate changes, and support reproducible testing.
Create Evaluation Run
Creates an evaluation run for the evaluation specified by the given ID.
List Evaluation Runs
Gets the evaluation runs for an evaluation specified by the given ID.
Get Evaluation Run
Gets the evaluation run for the given unique identifier.
Clone Evaluation Run
Clones an existing evaluation run with the given unique identifier and starts it.
Start Evaluation Run
Starts an evaluation run.
Lock Evaluation Run
Locks a completed evaluation run, preventing any further judgment modifications.
Stop Evaluation Run
Stops a queued or running evaluation run. Queries that have already completed
Update Evaluation Run Tags
Updates the tags on an evaluation run. Tags can be updated on runs in any status.
Get Metric Trends
Returns metric values for all completed or locked runs in an evaluation, ordered by creation date.
Compare Evaluation Runs
Compares two evaluation runs, showing run-level metric deltas and per-query metric breakdowns.
Delete Evaluation Runs
Deletes the evaluation runs specified by the given unique identifiers.
List Queries
Lists the evaluation queries for a specific evaluation run.
List Results
Lists the results for a specific query in an evaluation run. Results are paginated.
Get Next Result for Evaluation
Gets the next unrated result for evaluation by the current member. Results are prioritized
List Metrics
List the ranking metrics available for evaluation runs.