Evaluate your RAG application’s retrieval and generation quality using Fiddler’s built-in evaluators. This cookbook demonstrates the directDocumentation Index
Fetch the complete documentation index at: https://handbook.fiddler.ai/llms.txt
Use this file to discover all available pages before exploring further.
.score() API for rapid iteration on test cases before scaling to full experiments.
Use this cookbook when: You have a RAG application and want to quickly assess whether responses are faithful to retrieved documents and relevant to user queries.
Time to complete: ~15 minutes
Prerequisites
- Fiddler account with API access
- LLM credential configured in Settings > LLM Gateway
pip install fiddler-evals pandas
Connect and Initialize Evaluators
Replace
URL, TOKEN, and credential names with your Fiddler account details. Find your credentials in Settings > Access Tokens and Settings > LLM Gateway.Create Test Cases
Define representative test cases that cover both successful and failing RAG scenarios:
Evaluate Each Test Case
Use the
.score() method to evaluate each test case directly. Each evaluator returns a Score object with value, label, and reasoning:View Results
| scenario | Faithfulness | Relevance | Status |
|---|---|---|---|
| Perfect Match | yes | high | HEALTHY |
| Hallucination | no | high | ISSUE DETECTED |
| Irrelevant Answer | yes | low | ISSUE DETECTED |
Understanding the Evaluators
RAG Faithfulness
RAG Faithfulness checks whether the response is grounded in the retrieved documents.- Inputs:
user_query,rag_response,retrieved_documents - Scoring: Binary — Yes (1.0) / No (0.0)
- Use for: Detecting hallucinations where the LLM generates plausible but unsupported claims
Answer Relevance
Answer Relevance measures how well the response addresses the user’s query.- Inputs:
user_query,rag_response(+ optionalretrieved_documents) - Scoring: Ordinal — High (1.0), Medium (0.5), Low (0.0)
- Use for: Detecting off-topic responses where the LLM answers a different question
Next Steps
- Running RAG Experiments at Scale — Use Datasets and Experiments to evaluate systematically across larger test sets
- Detecting Hallucinations in RAG — Set up continuous hallucination monitoring in production
- RAG Health Diagnostics — Conceptual guide to the diagnostic triad
Source notebook: Fiddler Cookbook: RAG Evaluation Fundamentals