Documentation Index
Fetch the complete documentation index at: https://handbook.fiddler.ai/llms.txt
Use this file to discover all available pages before exploring further.
Ensure quality, safety, and brand compliance in content generation agents using a combination of Fiddler’s built-in evaluators for baseline quality and custom CustomJudge evaluators for domain-specific governance.
Use this cookbook when: You have content generation agents (writing reports, customer communications, marketing copy) and need automated quality gates to replace manual review of every draft.
Time to complete: ~20 minutes
Prerequisites
- Fiddler account with API access
- LLM credential configured in Settings > LLM Gateway
pip install fiddler-evals pandas
The Content Generation Challenge
Enterprise content generation agents produce volume that exceeds human review capacity. Without automated quality gates, teams face:
- Reviewer fatigue — manually reviewing hundreds of drafts per day
- Inconsistent quality — different reviewers apply different standards
- Brand drift — subtle changes in tone or style go undetected
The solution: combine Fiddler’s built-in evaluators (quality, safety) with custom LLM-as-a-Judge evaluators (brand voice, compliance) for automated governance.
Recommended Evaluators
Built-In Evaluators (Baseline Quality)
| Evaluator | What It Measures | Value |
|---|
| Answer Relevance | Does the output address the input instruction? | Instruction adherence |
| Coherence | Logical flow and clarity | Narrative quality |
| Conciseness | Brevity without losing meaning | Message clarity |
| Sentiment | Positive, negative, or neutral tone | Brand alignment |
| Prompt Safety | 11 safety dimensions (toxicity, bias, etc.) | Risk mitigation |
Custom Evaluators (Domain-Specific Governance)
| Evaluator | What It Measures | Value |
|---|
| Brand Voice Match | Adherence to company style guide | Automated brand governance |
| Bias Detection | Potential bias across multiple dimensions | Compliance and risk mitigation |
Set Up Built-In Evaluators
Replace URL, TOKEN, and credential names with your Fiddler account details. Find your credentials in Settings > Access Tokens and Settings > LLM Gateway.
import pandas as pd
from fiddler_evals import init
from fiddler_evals.evaluators import (
AnswerRelevance,
Coherence,
Conciseness,
CustomJudge,
)
URL = 'https://your-org.fiddler.ai'
TOKEN = 'your-access-token'
LLM_CREDENTIAL_NAME = 'your-llm-credential'
LLM_MODEL_NAME = 'openai/gpt-4o'
init(url=URL, token=TOKEN)
# Built-in evaluators for baseline quality
relevance = AnswerRelevance(model=LLM_MODEL_NAME, credential=LLM_CREDENTIAL_NAME)
coherence = Coherence(model=LLM_MODEL_NAME, credential=LLM_CREDENTIAL_NAME)
conciseness = Conciseness(model=LLM_MODEL_NAME, credential=LLM_CREDENTIAL_NAME)
Create a Brand Voice Match Judge
Use CustomJudge to evaluate content against your company’s style guide:brand_voice_judge = CustomJudge(
prompt_template="""
Determine whether the provided content adheres to the provided
brand guidelines.
Content: {{ content }}
Brand Guidelines: {{ brand_guidelines }}
""",
output_fields={
'voice_match_score': {
'type': 'string',
'choices': ['Perfect Match', 'Minor Deviations', 'Off-Brand'],
},
'reasoning': {'type': 'string'},
},
model=LLM_MODEL_NAME,
credential=LLM_CREDENTIAL_NAME,
)
Evaluate Generated Content
# Example: your brand guidelines
brand_guidelines = (
"Use professional, approachable tone. "
"Address customers as 'you'. "
"Avoid jargon, slang, and exclamation marks. "
"Keep sentences under 25 words."
)
# Sample content from your agent
generated_content = [
{
'instruction': 'Write a welcome email for new customers',
'content': 'Welcome to our platform. We are glad you chose us. '
'Your account is ready and you can start exploring features '
'right away.',
},
{
'instruction': 'Write a welcome email for new customers',
'content': 'OMG WELCOME!!! You are going to LOVE this!! '
'Our platform is literally the BEST thing ever!!!',
},
{
'instruction': 'Explain the refund process',
'content': 'To request a refund, navigate to your order history, '
'select the item, and click Request Refund. Processing takes '
'3-5 business days.',
},
]
# Evaluate each piece of content
for item in generated_content:
# Built-in evaluators
rel_score = relevance.score(
user_query=item['instruction'],
rag_response=item['content'],
)
coh_score = coherence.score(prompt=item['instruction'], response=item['content'])
con_score = conciseness.score(response=item['content'])
# Custom brand voice judge
brand_scores = brand_voice_judge.score(inputs={
'content': item['content'],
'brand_guidelines': brand_guidelines,
})
brand_dict = {s.name: s for s in brand_scores}
print(f"\nInstruction: {item['instruction'][:50]}...")
print(f" Relevance: {rel_score.label} ({rel_score.value})")
print(f" Coherence: {coh_score.label}")
print(f" Conciseness: {con_score.label}")
print(f" Brand Voice: {brand_dict['voice_match_score'].label}")
print(f" Reason: {brand_dict['reasoning'].label}")
Expected output:Instruction: Write a welcome email for new customers...
Relevance: high (1.0)
Coherence: high
Conciseness: high
Brand Voice: Perfect Match
Reason: Professional tone, addresses customer directly, no jargon or
exclamation marks, sentences are concise.
Instruction: Write a welcome email for new customers...
Relevance: medium (0.5)
Coherence: low
Conciseness: low
Brand Voice: Off-Brand
Reason: Uses all-caps, multiple exclamation marks, slang ("OMG",
"literally"), and informal tone — violates all brand guidelines.
Instruction: Explain the refund process...
Relevance: high (1.0)
Coherence: high
Conciseness: high
Brand Voice: Perfect Match
Reason: Clear, professional instructions with appropriate tone and
sentence length.
Build a Quality Gate
Combine evaluator scores into an automated quality gate that flags content for human review:def quality_gate(instruction, content, brand_guidelines):
"""Automated quality gate for content generation agents.
Returns 'APPROVED', 'REVIEW', or 'REJECTED' with reasons.
"""
issues = []
# Check relevance
rel = relevance.score(user_query=instruction, rag_response=content)
if rel.value < 0.5:
issues.append(f'Low relevance ({rel.label})')
# Check coherence
coh = coherence.score(prompt=instruction, response=content)
if coh.value < 0.5:
issues.append(f'Low coherence ({coh.label})')
# Check brand voice
brand = brand_voice_judge.score(inputs={
'content': content,
'brand_guidelines': brand_guidelines,
})
brand_dict = {s.name: s for s in brand}
voice = brand_dict['voice_match_score'].label
if voice == 'Off-Brand':
issues.append(f'Off-brand content')
elif voice == 'Minor Deviations':
issues.append(f'Minor brand deviations')
if not issues:
return 'APPROVED', []
elif any('Off-Brand' in i or 'Low' in i for i in issues):
return 'REJECTED', issues
else:
return 'REVIEW', issues
# Run the quality gate
for item in generated_content:
status, issues = quality_gate(
item['instruction'], item['content'], brand_guidelines,
)
print(f"{status}: {item['content'][:60]}...")
if issues:
print(f" Issues: {', '.join(issues)}")
Expected output:APPROVED: Welcome to our platform. We are glad you chose us. Your ...
REJECTED: OMG WELCOME!!! You are going to LOVE this!! Our platform...
Issues: Low coherence (low), Off-brand content
APPROVED: To request a refund, navigate to your order history, sel...
Production Monitoring
To deploy these evaluators in production:
- Evaluator Rules: Configure built-in evaluators (Answer Relevance, Coherence, Conciseness) as Evaluator Rules in your Agentic Monitoring application. See Evaluator Rules.
- Custom Judges in Experiments: Run the Brand Voice Match judge as a recurring experiment against sampled production outputs to track brand compliance over time.
- Alerting: Set up alerts on evaluator score degradation to catch systemic quality drift after model updates or prompt changes.
Next Steps
Related: Evaluator Rules — Configure evaluators for production monitoring