Skip to main content

Documentation Index

Fetch the complete documentation index at: https://handbook.fiddler.ai/llms.txt

Use this file to discover all available pages before exploring further.

Ensure quality, safety, and brand compliance in content generation agents using a combination of Fiddler’s built-in evaluators for baseline quality and custom CustomJudge evaluators for domain-specific governance. Use this cookbook when: You have content generation agents (writing reports, customer communications, marketing copy) and need automated quality gates to replace manual review of every draft. Time to complete: ~20 minutes
Prerequisites
  • Fiddler account with API access
  • LLM credential configured in Settings > LLM Gateway
  • pip install fiddler-evals pandas

The Content Generation Challenge

Enterprise content generation agents produce volume that exceeds human review capacity. Without automated quality gates, teams face:
  • Reviewer fatigue — manually reviewing hundreds of drafts per day
  • Inconsistent quality — different reviewers apply different standards
  • Brand drift — subtle changes in tone or style go undetected
The solution: combine Fiddler’s built-in evaluators (quality, safety) with custom LLM-as-a-Judge evaluators (brand voice, compliance) for automated governance.

Built-In Evaluators (Baseline Quality)

EvaluatorWhat It MeasuresValue
Answer RelevanceDoes the output address the input instruction?Instruction adherence
CoherenceLogical flow and clarityNarrative quality
ConcisenessBrevity without losing meaningMessage clarity
SentimentPositive, negative, or neutral toneBrand alignment
Prompt Safety11 safety dimensions (toxicity, bias, etc.)Risk mitigation

Custom Evaluators (Domain-Specific Governance)

EvaluatorWhat It MeasuresValue
Brand Voice MatchAdherence to company style guideAutomated brand governance
Bias DetectionPotential bias across multiple dimensionsCompliance and risk mitigation

1

Set Up Built-In Evaluators

Replace URL, TOKEN, and credential names with your Fiddler account details. Find your credentials in Settings > Access Tokens and Settings > LLM Gateway.
import pandas as pd
from fiddler_evals import init
from fiddler_evals.evaluators import (
    AnswerRelevance,
    Coherence,
    Conciseness,
    CustomJudge,
)

URL = 'https://your-org.fiddler.ai'
TOKEN = 'your-access-token'
LLM_CREDENTIAL_NAME = 'your-llm-credential'
LLM_MODEL_NAME = 'openai/gpt-4o'

init(url=URL, token=TOKEN)

# Built-in evaluators for baseline quality
relevance = AnswerRelevance(model=LLM_MODEL_NAME, credential=LLM_CREDENTIAL_NAME)
coherence = Coherence(model=LLM_MODEL_NAME, credential=LLM_CREDENTIAL_NAME)
conciseness = Conciseness(model=LLM_MODEL_NAME, credential=LLM_CREDENTIAL_NAME)
2

Create a Brand Voice Match Judge

Use CustomJudge to evaluate content against your company’s style guide:
brand_voice_judge = CustomJudge(
    prompt_template="""
        Determine whether the provided content adheres to the provided
        brand guidelines.

        Content: {{ content }}
        Brand Guidelines: {{ brand_guidelines }}
    """,
    output_fields={
        'voice_match_score': {
            'type': 'string',
            'choices': ['Perfect Match', 'Minor Deviations', 'Off-Brand'],
        },
        'reasoning': {'type': 'string'},
    },
    model=LLM_MODEL_NAME,
    credential=LLM_CREDENTIAL_NAME,
)
See Building Custom Judge Evaluators for a deep-dive into prompt_template, output_fields, and iterative prompt improvement.
3

Evaluate Generated Content

# Example: your brand guidelines
brand_guidelines = (
    "Use professional, approachable tone. "
    "Address customers as 'you'. "
    "Avoid jargon, slang, and exclamation marks. "
    "Keep sentences under 25 words."
)

# Sample content from your agent
generated_content = [
    {
        'instruction': 'Write a welcome email for new customers',
        'content': 'Welcome to our platform. We are glad you chose us. '
            'Your account is ready and you can start exploring features '
            'right away.',
    },
    {
        'instruction': 'Write a welcome email for new customers',
        'content': 'OMG WELCOME!!! You are going to LOVE this!! '
            'Our platform is literally the BEST thing ever!!!',
    },
    {
        'instruction': 'Explain the refund process',
        'content': 'To request a refund, navigate to your order history, '
            'select the item, and click Request Refund. Processing takes '
            '3-5 business days.',
    },
]

# Evaluate each piece of content
for item in generated_content:
    # Built-in evaluators
    rel_score = relevance.score(
        user_query=item['instruction'],
        rag_response=item['content'],
    )
    coh_score = coherence.score(prompt=item['instruction'], response=item['content'])
    con_score = conciseness.score(response=item['content'])

    # Custom brand voice judge
    brand_scores = brand_voice_judge.score(inputs={
        'content': item['content'],
        'brand_guidelines': brand_guidelines,
    })
    brand_dict = {s.name: s for s in brand_scores}

    print(f"\nInstruction: {item['instruction'][:50]}...")
    print(f"  Relevance:  {rel_score.label} ({rel_score.value})")
    print(f"  Coherence:  {coh_score.label}")
    print(f"  Conciseness: {con_score.label}")
    print(f"  Brand Voice: {brand_dict['voice_match_score'].label}")
    print(f"    Reason: {brand_dict['reasoning'].label}")
Expected output:
Instruction: Write a welcome email for new customers...
  Relevance:  high (1.0)
  Coherence:  high
  Conciseness: high
  Brand Voice: Perfect Match
    Reason: Professional tone, addresses customer directly, no jargon or
    exclamation marks, sentences are concise.

Instruction: Write a welcome email for new customers...
  Relevance:  medium (0.5)
  Coherence:  low
  Conciseness: low
  Brand Voice: Off-Brand
    Reason: Uses all-caps, multiple exclamation marks, slang ("OMG",
    "literally"), and informal tone — violates all brand guidelines.

Instruction: Explain the refund process...
  Relevance:  high (1.0)
  Coherence:  high
  Conciseness: high
  Brand Voice: Perfect Match
    Reason: Clear, professional instructions with appropriate tone and
    sentence length.
4

Build a Quality Gate

Combine evaluator scores into an automated quality gate that flags content for human review:
def quality_gate(instruction, content, brand_guidelines):
    """Automated quality gate for content generation agents.

    Returns 'APPROVED', 'REVIEW', or 'REJECTED' with reasons.
    """
    issues = []

    # Check relevance
    rel = relevance.score(user_query=instruction, rag_response=content)
    if rel.value < 0.5:
        issues.append(f'Low relevance ({rel.label})')

    # Check coherence
    coh = coherence.score(prompt=instruction, response=content)
    if coh.value < 0.5:
        issues.append(f'Low coherence ({coh.label})')

    # Check brand voice
    brand = brand_voice_judge.score(inputs={
        'content': content,
        'brand_guidelines': brand_guidelines,
    })
    brand_dict = {s.name: s for s in brand}
    voice = brand_dict['voice_match_score'].label
    if voice == 'Off-Brand':
        issues.append(f'Off-brand content')
    elif voice == 'Minor Deviations':
        issues.append(f'Minor brand deviations')

    if not issues:
        return 'APPROVED', []
    elif any('Off-Brand' in i or 'Low' in i for i in issues):
        return 'REJECTED', issues
    else:
        return 'REVIEW', issues

# Run the quality gate
for item in generated_content:
    status, issues = quality_gate(
        item['instruction'], item['content'], brand_guidelines,
    )
    print(f"{status}: {item['content'][:60]}...")
    if issues:
        print(f"  Issues: {', '.join(issues)}")
Expected output:
APPROVED: Welcome to our platform. We are glad you chose us. Your ...
REJECTED: OMG WELCOME!!! You are going to LOVE this!! Our platform...
  Issues: Low coherence (low), Off-brand content
APPROVED: To request a refund, navigate to your order history, sel...

Production Monitoring

To deploy these evaluators in production:
  1. Evaluator Rules: Configure built-in evaluators (Answer Relevance, Coherence, Conciseness) as Evaluator Rules in your Agentic Monitoring application. See Evaluator Rules.
  2. Custom Judges in Experiments: Run the Brand Voice Match judge as a recurring experiment against sampled production outputs to track brand compliance over time.
  3. Alerting: Set up alerts on evaluator score degradation to catch systemic quality drift after model updates or prompt changes.

Next Steps


Related: Evaluator Rules — Configure evaluators for production monitoring