Prompt templates
Each AI judge has a required Handlebars prompt template that defines what the model is asked, and the shape of the response it must return. The UI pre-fills a sensible default when you create a judge, which you can take as-is or adapt to your domain.
Available variables
The Handlebars context exposes the following variables:
| Variable | Type | Description |
|---|---|---|
query | string | object | The query value from the query set. A string for plain text query sets, or the JSON object for structured query sets. |
candidates | array | Candidates being judged in this prompt. Length is 1 unless batch_size > 1. |
candidates[].id | string | The candidate's ID. Must be referenced in the response so grades can be matched back to candidates. |
candidates[].title | string | The candidate's title (from candidates mapping). |
candidates[].fields | object | Additional mapped fields (price, brand, description, etc.). May be absent. |
candidates[].image | string | The candidate's image. Only present when include_images: true and the candidate has a mapped image. |
scale_min | number | The lowest grade the evaluation scale accepts. |
scale_max | number | The highest grade the scale accepts. |
scale_labels | array | Each entry has grade (number) and label (human-readable name like "Highly Relevant"). |
evaluation_name | string | The parent evaluation's name. Useful for context-setting prompts. |
evaluation_description | string | The parent evaluation's description, if set. |
Standard Handlebars syntax is supported, including {{#each}}, {{#if}}, {{else}}, and the
helpers Releval registers for query templates.
When to customise the default
The pre-filled template works well for most domains. Adapt it when you need to:
- Inject domain-specific guidelines (e.g. "treat refurbished items as Marginally Relevant").
- Constrain the model to a stricter rubric.
- Localise the prompt (the default is English-only).
- Emphasise particular fields (e.g. brand, category) that appear in
candidates[].fields.
Required response format
Whatever your template asks, the model's response must contain one <candidate> block per
candidate sent in the prompt:
<candidate id="abc123">
<reasoning>
The query asked for "running shoes" and this candidate is a hiking boot.
The product description does not mention running.
</reasoning>
<grade>
1
</grade>
</candidate>
Three requirements apply. If any fails, the affected candidates are not judged:
- Every candidate ID sent in the prompt must appear in exactly one
<candidate>block. - Each
<grade>must be an integer. - Each grade must be within the evaluation scale's range (e.g. 0–4 for
graded).
The <reasoning> block is optional but strongly recommended — it is stored on the judgment
and shown in the UI alongside the grade.
The response must contain <candidate id="...">…</candidate> blocks with <grade> and
<reasoning> children. Don't ask the model to return JSON or Markdown tables.
A worked custom template
Here's a custom template that adds an e-commerce rubric and explicitly asks the model to penalise out-of-stock items:
Iterating on a template
The prompt template lives on the judge, so changes apply to all future runs that use it. To test a template change without burning credits on a full run:
- Clone the judge (or create a sibling using a free local provider such as
ollama). - Run it on a small evaluation run (10–20 queries).
- Inspect the per-candidate reasoning in the UI.
- Refine the template, repeat.
- Once stable, point your full evaluations at the production judge.
When you're ready, see Running an AI judging run to start judging.