Documentation Index
Fetch the complete documentation index at: https://handbook.fiddler.ai/llms.txt
Use this file to discover all available pages before exploring further.
Multimodal Evaluators
Build evaluators that analyze images and documents alongside text using CustomJudge with vision-capable models. This cookbook covers common document processing pipeline monitoring scenarios.
Use this cookbook when: You need to verify that a GenAI application correctly extracted, summarized, or described content from images or documents.
Time to complete: ~25 minutes
Prerequisites
- Fiddler account with API access
- Vision-capable model configured in LLM Gateway:
- Fiddler-hosted:
fiddler/ministral3-8b (available by default)
- Third-party: Configure provider credentials (OpenAI, Anthropic, etc.)
pip install fiddler-evals requests
Tip: When using Fiddler-hosted models, use the Test Connection button on the LLM Gateway page to warm up the model before running evaluations. This reduces cold-start latency on your first requests.
Private Preview — Multimodal evaluation is currently in private preview. To inquire about access, contact your Fiddler Customer Success Manager or email sales@fiddler.ai.
This example verifies that data extracted from a table matches the source document.
Connect to Fiddler and Load Helper
import base64
import json
from pathlib import Path
import requests
from fiddler_evals import init
from fiddler_evals.evaluators import CustomJudge
URL = 'https://your-org.fiddler.ai'
TOKEN = 'your-access-token'
init(url=URL, token=TOKEN)
def load_document(source: str) -> tuple[str, str]:
"""
Load a document from a file path or URL.
:param source: Local file path or HTTP(S) URL
:returns: Tuple of (base64_data, mime_type)
"""
mime_types = {
'.pdf': 'application/pdf',
'.jpg': 'image/jpeg',
'.jpeg': 'image/jpeg',
'.png': 'image/png',
'.gif': 'image/gif',
'.webp': 'image/webp',
}
if source.startswith(('http://', 'https://')):
headers = {'User-Agent': 'FiddlerEvals/1.0'}
response = requests.get(source, headers=headers, timeout=10)
response.raise_for_status()
content = response.content
ext = Path(source).suffix.lower()
else:
path = Path(source)
ext = path.suffix.lower()
content = path.read_bytes()
mime_type = mime_types.get(ext, 'application/octet-stream')
b64_data = base64.b64encode(content).decode('utf-8')
return b64_data, mime_type
Image Input Format — Images and PDFs must be passed as a list containing a structured object with media_type, encoding, and data fields. Passing a data URL string directly will cause errors. The load_document helper returns these components separately so you can construct the correct format.
Base64 Payload Size — Base64 representation adds ~33% to the original file size. A 20KB image becomes ~27KB in the API request. Keep this overhead in mind when working near size limits.
Create the Extraction Verification Judge
This evaluator compares extracted fields against the source document:extraction_judge = CustomJudge(
prompt_template="""
You are verifying data extraction accuracy. Compare the extracted data
against the source document and determine if the extraction is correct.
Verify fields "metric" and "outputType" accurately match the source document.
Respond with:
- extraction_accurate: True if all extracted fields match the source document
- errors_found: Briefly list any extraction errors, or "None" if accurate
Source Document:
{{ document }}
Extracted Data:
{{ extracted_data }}
""",
output_fields={
'extraction_accurate': {'type': 'boolean'},
'errors_found': {'type': 'string'},
},
model='fiddler/ministral3-8b',
)
Evaluate the Extraction
Load a sample document and verify the extracted data:
Sample document: Text Statistics evaluators table# Load the document
b64_data, mime_type = load_document('https://media.githubusercontent.com/media/fiddler-labs/fiddler-examples/main/cookbooks/assets/multimodal-text-statistics-table.png')
# Extracted data to verify against the source document
# only 1 of 4 rows shown, as a 'bad extraction'
extracted_json = [{'metric': 'Textstat', 'outputType': 'float'}]
scores = extraction_judge.score(
inputs={
'document': [
{
'media_type': mime_type,
'encoding': 'base64',
'data': b64_data,
}
],
'extracted_data': json.dumps(extracted_json),
}
)
scores_dict = {s.name: s for s in scores}
print(f'Extraction accurate: {scores_dict["extraction_accurate"].value}')
print(f'Errors found: {scores_dict["errors_found"].label}')
# Example output:
# Extraction accurate: 0.0
# Errors found: Incomplete extraction: Missing 'Evaluate', 'Sentiment', and 'Token Count' metrics. The 'outputType' for 'Textstat' is correct, but the extraction only includes one entry instead of all four metrics listed in the source document.
Example 2: Document Summarization Faithfulness
This example verifies that a summary accurately represents the source document.
Source document: Fiddler Platform Release 26.7 notes (PDF)
Create the Summarization Judge
summarization_judge = CustomJudge(
prompt_template="""
You are evaluating whether a summary is FAITHFUL to its source document.
IMPORTANT: A summary is meant to be brief. Do NOT penalize the summary
for omitting details, examples, or supporting context from the source.
Only flag information that is "Missing" if it is so essential that the
summary fundamentally misrepresents the source's main message.
Categories (in priority order):
- "Introduced Errors": Summary contains claims that contradict or
hallucinate facts not in the source. THIS IS THE MOST IMPORTANT
CATEGORY — flag any factual inaccuracies here.
- "Missing Key Information": Summary omits a fundamental message of
the source (rare — only use if the summary's main thrust is incomplete)
- "Missing Details": Avoid this category unless absolutely necessary.
Summarization inherently omits details.
- "Faithful": Summary accurately represents the source's main points
without introducing errors
Respond with:
- faithfulness_result: Choose the most severe applicable category
- reasoning: Briefly identify any factual errors. Do not
enumerate omitted details unless they fundamentally distort meaning.
Source Document:
{{ document }}
Summary:
{{ summary }}
""",
output_fields={
'faithfulness_result': {
'type': 'string',
'choices': [
'Introduced Errors',
'Missing Key Information',
'Missing Details',
'Faithful',
],
},
'reasoning': {'type': 'string'},
},
model='fiddler/ministral3-8b',
)
Evaluate the Summary
# Load the document
# Use the example document or replace with your own
url = 'https://raw.githubusercontent.com/fiddler-labs/fiddler-examples/main/cookbooks/assets/multimodal-summarization-doc.pdf'
b64_data, mime_type = load_document(url)
# Example of an UNFAITHFUL summary (contains subtle errors)
unfaithful_summary = """
Fiddler Release 26.7 (released March 31, 2026) introduces several improvements
to the LLM Gateway and evaluation tooling.
The LLM Gateway now supports Google Vertex AI as a provider, enabling teams
to route evaluator and LLM-as-a-Judge requests through GCP. Supported models
include Gemini, Claude, Llama, Mistral, and more.
Multi-target event updates for LLM models now correctly preserve all target
columns during event updates, fixing a bug where additional targets were
dropped from classification and regression models.
Evaluation datasets can now be created and managed directly from the UI
with CSV upload support for up to 5,000 rows per upload.
"""
scores = summarization_judge.score(inputs={
'document': [{
'media_type': mime_type,
'encoding': 'base64',
'data': b64_data,
}],
'summary': unfaithful_summary,
})
[score] = scores
print(f'Faithfulness result: {score.label}')
print(f'Reasoning: {score.reasoning}')
Example Output:# Example output:
# Faithfulness result: Introduced Errors
# Reasoning: The summary contains two key inaccuracies: (1) It incorrectly
# states that multi-target event updates fix a bug in classification and
# regression models, when the source explicitly states this update only
# affects LLM and NOT_SET model types (which are the only ones supporting
# multiple targets). (2) It claims CSV uploads are limited to 5,000 rows,
# when the source limits them to 1,000 rows per upload.
Tips
Stay Within Size Limits
| Context | Limit |
|---|
| Production Monitoring | 10 MB per span |
| Evals SDK | 20 MB per request |
| Fiddler Ministral (context window) | 32K tokens (~25KB images recommended) |
| Fiddler Ministral (PDF pages) | 8 pages max |
| Fiddler Ministral (images) | 8 images max |
Optimize Large Documents
- Compress images before encoding — reduce resolution if full quality isn’t needed
- Split large PDFs — evaluate sections separately if exceeding page limits
- Use appropriate DPI — higher DPI means larger file sizes but better text recognition
Use Multiple Images
Fiddler Ministral supports up to 8 images per evaluation. To evaluate content with multiple images, add a separate template variable for each image in your prompt template (e.g., {{ image_1 }}, {{ image_2 }}) and pass each as a structured input list following the same format as Example 1’s document field.
Next Steps