Detecting Hallucinations in RAG - Fiddler Documentation

Build a hallucination detection pipeline that combines pre-deployment evaluation with the Evals SDK and continuous production monitoring through LLM Observability enrichments and Evaluator Rules. Use this cookbook when: You want to monitor your RAG application for hallucinations across both testing and production environments. Time to complete: ~25 minutes

Prerequisites

Fiddler account with API access
LLM credential configured in Settings > LLM Gateway
pip install fiddler-evals fiddler-client pandas

The Two-Layer Approach

Hallucination detection works best as a two-layer pipeline:

Layer	Tool	Purpose
Pre-deployment	Evals SDK	Test against known scenarios, validate with golden labels
Production	LLM Observability + Evaluator Rules	Continuous monitoring of live traffic

Layer 1: Pre-Deployment Evaluation

Set Up and Connect

Use the RAG Health Metrics triad to distinguish hallucinations from other failure modes:

Replace URL, TOKEN, and credential names with your Fiddler account details. Find your credentials in Settings > Access Tokens and Settings > LLM Gateway.

import pandas as pd
from fiddler_evals import init, evaluate, Project, Application, Dataset
from fiddler_evals.evaluators import (
    AnswerRelevance,
    ContextRelevance,
    RAGFaithfulness,
)

URL = 'https://your-org.fiddler.ai'
TOKEN = 'your-access-token'
LLM_CREDENTIAL_NAME = 'your-llm-credential'
LLM_MODEL_NAME = 'openai/gpt-4o'

init(url=URL, token=TOKEN)

project = Project.get_or_create(name='hallucination_detection')
app = Application.get_or_create(
    name='rag-hallucination-test',
    project_id=project.id,
)
dataset = Dataset.get_or_create(
    name='hallucination-scenarios',
    application_id=app.id,
)

Create Hallucination-Focused Test Cases

Design test cases that specifically probe for hallucination patterns:

hallucination_scenarios = pd.DataFrame(
    [
        {
            'scenario': 'Grounded response',
            'user_query': 'What is the return policy?',
            'retrieved_documents': [
                'Returns accepted within 30 days with receipt.',
            ],
            'rag_response': 'You can return items within 30 days '
                'if you have a receipt.',
        },
        {
            'scenario': 'Fabricated details',
            'user_query': 'What is the return policy?',
            'retrieved_documents': [
                'Returns accepted within 30 days with receipt.',
            ],
            'rag_response': 'You can return items within 60 days. '
                'No receipt needed. We also offer free shipping on returns.',
        },
        {
            'scenario': 'Insufficient context',
            'user_query': 'What are the shipping costs?',
            'retrieved_documents': [
                'We ship to all 50 US states.',
            ],
            'rag_response': 'Standard shipping is $5.99 and express '
                'shipping is $12.99.',
        },
    ]
)

dataset.insert_from_pandas(
    df=hallucination_scenarios,
    input_columns=['user_query', 'retrieved_documents', 'rag_response'],
    metadata_columns=['scenario'],
)

Run the Diagnostic Evaluation

def passthrough_task(inputs, extras, metadata):
    return {
        'rag_response': inputs['rag_response'],
        'retrieved_documents': inputs['retrieved_documents'],
    }

result = evaluate(
    dataset=dataset,
    task=passthrough_task,
    evaluators=[
        RAGFaithfulness(model=LLM_MODEL_NAME, credential=LLM_CREDENTIAL_NAME),
        AnswerRelevance(model=LLM_MODEL_NAME, credential=LLM_CREDENTIAL_NAME),
        ContextRelevance(model=LLM_MODEL_NAME, credential=LLM_CREDENTIAL_NAME),
    ],
    score_fn_kwargs_mapping={
        'user_query': lambda x: x['inputs']['user_query'],
        'retrieved_documents': 'retrieved_documents',
        'rag_response': 'rag_response',
    },
)

Interpret Results

Use the diagnostic workflow to classify failures:

for r in result.results:
    scores = {s.evaluator_name: s for s in r.scores}
    scenario = r.dataset_item.metadata.get('scenario', 'unknown')

    faithfulness = scores.get('rag_faithfulness')
    relevance = scores.get('answer_relevance')
    context = scores.get('context_relevance')

    # Classify the failure mode
    if faithfulness and faithfulness.value == 0:
        diagnosis = 'HALLUCINATION'
    elif context and context.value < 0.5:
        diagnosis = 'BAD RETRIEVAL'
    elif relevance and relevance.value < 0.5:
        diagnosis = 'OFF-TOPIC'
    else:
        diagnosis = 'HEALTHY'

    print(f'{scenario}: {diagnosis}')
    if faithfulness:
        print(f'  Faithfulness: {faithfulness.label} — {faithfulness.reasoning}')

Expected output:

Grounded response: HEALTHY
  Faithfulness: yes — The response accurately reflects the return policy
  stated in the retrieved document.

Fabricated details: HALLUCINATION
  Faithfulness: no — The response claims a 60-day return window and no
  receipt requirement, but the source document states 30 days with receipt.

Insufficient context: HALLUCINATION
  Faithfulness: no — The response provides specific prices ($5.99, $12.99)
  that are not supported by the retrieved document.

Reading the diagnosis: The triad distinguishes why a response failed:

HALLUCINATION = Faithfulness fails (response fabricates information)
BAD RETRIEVAL = Context Relevance fails (wrong documents retrieved)
OFF-TOPIC = Answer Relevance fails (response doesn’t address the question)

Layer 2: Production Monitoring

Option A: Evaluator Rules (Agentic)
Option B: LLM Observability Enrichments

For applications using Agentic Monitoring, configure Evaluator Rules to continuously evaluate production spans:

Navigate to your application’s Evaluator Rules tab
Add a rule for RAG Faithfulness
Map evaluator inputs to your span attributes:
- user_query → your query span attribute
- rag_response → your response span attribute
- retrieved_documents → your context span attribute
Set alert thresholds (e.g., alert when faithfulness drops below 80%)

See Evaluator Rules for step-by-step instructions.

For applications using LLM Observability, configure enrichments during model onboarding to monitor for hallucinations:

import fiddler as fdl

fiddler_enrichments = [
    # FTL Faithfulness for low-latency hallucination detection
    fdl.Enrichment(
        name='Faithfulness',
        enrichment='ftl_response_faithfulness',
        columns=['source_docs', 'response'],
        config={
            'context_field': 'source_docs',
            'response_field': 'response',
            'threshold': 0.5,
        },
    ),
    # Safety enrichments
    fdl.Enrichment(
        name='FTL Safety',
        enrichment='ftl_prompt_safety',
        columns=['question', 'response'],
    ),
    # Embeddings for drift detection
    fdl.TextEmbedding(
        name='Prompt TextEmbedding',
        source_column='question',
        column='Enrichment Prompt Embedding',
    ),
]

LLM Observability uses FTL Faithfulness (ftl_response_faithfulness), a proprietary Fast Trust Model for low-latency scoring. This is a different evaluator from the RAG Faithfulness used in Layer 1 — it has different inputs (context, response) and outputs probability scores (faithful_prob 0.0–1.0) rather than binary labels. For detailed diagnostic reasoning, use RAG Faithfulness via Evaluator Rules or the Evals SDK.

Combining Both Layers

The most effective hallucination detection pipeline uses both layers:

Stage	What to Do	Tool
Development	Test against known hallucination scenarios	Evals SDK + RAG Faithfulness
Pre-release	Run experiments comparing pipeline changes	Evals SDK + full diagnostic triad
Production	Continuous monitoring with alerting	Evaluator Rules or LLM Obs enrichments
Investigation	Deep-dive into flagged events	Evals SDK `.score()` on specific cases

Next Steps

RAG Health Diagnostics — Conceptual guide to failure mode diagnosis
RAG Evaluation Fundamentals — Direct evaluation with .score() API
Evaluator Rules — Configure production monitoring rules

Source notebooks:

Documentation Index

​The Two-Layer Approach

​Layer 1: Pre-Deployment Evaluation

​Layer 2: Production Monitoring

​Combining Both Layers

​Next Steps

The Two-Layer Approach

Layer 1: Pre-Deployment Evaluation

Layer 2: Production Monitoring

Combining Both Layers

Next Steps