Evaluation Runs

Evaluation Runs represent the execution of an evaluation at a specific point in time.

An evaluation run captures the full set of query executions, results, and metrics generated when an evaluation is run. Each run is tied to a particular evaluation configuration, enabling comparisons across runs. Runs provide the data needed to analyze performance trends, validate changes, and support reproducible testing.

📄️Create Evaluation Run

Creates an evaluation run for the evaluation specified by the given ID.

📄️List Evaluation Runs

Gets the evaluation runs for an evaluation specified by the given ID.

📄️Get Evaluation Run

Gets the evaluation run for the given unique identifier.

📄️Clone Evaluation Run

Clones an existing evaluation run with the given unique identifier and starts it.

📄️Start Evaluation Run

Starts an evaluation run.

📄️Lock Evaluation Run

Locks a completed evaluation run, preventing any further judgment modifications.

📄️Stop Evaluation Run

Stops a queued or running evaluation run. Queries that have already completed

📄️Update Evaluation Run Tags

Updates the tags on an evaluation run. Tags can be updated on runs in any status.

📄️Get Metric Trends

Returns metric values for all completed or locked runs in an evaluation, ordered by creation date.

📄️Compare Evaluation Runs

Compares two evaluation runs, showing run-level metric deltas and per-query metric breakdowns.

📄️Delete Evaluation Runs

Deletes the evaluation runs specified by the given unique identifiers.

📄️List Queries

Lists the evaluation queries for a specific evaluation run.

📄️List Results

Lists the results for a specific query in an evaluation run. Results are paginated.

📄️Get Next Result for Evaluation

Gets the next unrated result for evaluation by the current member. Results are prioritized

📄️List Metrics

List the ranking metrics available for evaluation runs.