Skip to main content
GET
/
v3
/
evals
/
experiments
/
{experiment_id}
/
metrics
/
rows
Get Experiment Row Metrics
curl --request GET \
  --url https://{host}/v3/evals/experiments/{experiment_id}/metrics/rows \
  --header 'Authorization: Bearer <token>'
{
  "api_version": "3.0",
  "kind": "NORMAL",
  "data": {
    "top": [
      {
        "experiment_item_id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
        "dataset_item_id": "ds-item-001",
        "inputs": {
          "question": "What is Python?"
        },
        "flags": [
          {
            "evaluator_name": "coherence",
            "value": 0.95,
            "label": "positive",
            "reasoning": "The response is highly coherent and on-topic."
          }
        ]
      }
    ],
    "bottom": [
      {
        "experiment_item_id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
        "dataset_item_id": "ds-item-001",
        "inputs": {
          "question": "What is Python?"
        },
        "flags": [
          {
            "evaluator_name": "coherence",
            "value": 0.95,
            "label": "positive",
            "reasoning": "The response is highly coherent and on-topic."
          }
        ]
      }
    ]
  }
}

Documentation Index

Fetch the complete documentation index at: https://handbook.fiddler.ai/llms.txt

Use this file to discover all available pages before exploring further.

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Path Parameters

experiment_id
string<uuid>
required

Path parameter for unique identifier of an experiment

Response

Top and bottom performing rows with evaluator flags

Response object for standard API responses.

api_version
enum<string>
default:3.0

API version.

Available options:
2.0,
3.0
kind
enum<string>
default:NORMAL

Type of response, indicating a normal response.

Available options:
NORMAL
data
Experiment Row Metrics Response · object

Top and bottom performing rows identified via percentile-based outlier detection. Numeric evaluators use P10/P90 thresholds; categorical evaluators flag labels that differ from the mode. Rows are ranked by how many evaluators flagged them as outliers. At most 3 rows are returned in each list.