Documentation Index
Fetch the complete documentation index at: https://handbook.fiddler.ai/llms.txt
Use this file to discover all available pages before exploring further.
Fiddler provides a comprehensive set of enrichments for monitoring LLM applications in production. Enrichments augment your application data with automatically generated trust, safety, and quality metrics during model onboarding. These metrics integrate directly with Fiddler’s monitoring dashboards, alerting systems, and analytics tools.
Configure enrichments using the fdl.Enrichment() class in the Python Client SDK. For detailed configuration examples, see the Enrichments Guide. For help choosing the right enrichment, see Selecting Enrichments.
Safety metrics
Safety enrichments detect and flag unsafe, harmful, or policy-violating content in your LLM application’s inputs and outputs.
| Metric | Enrichment Key | LLM Required? | Output Type | Description |
|---|
| Fast Safety | ftl_prompt_safety | Yes (Fiddler FTL) | bool + float per dimension | Evaluates text safety across 11 dimensions using Fiddler’s Fast Trust Model |
| PII Detection | pii | No | bool + matches + entities | Detects personally identifiable information using Presidio |
| Profanity | profanity | No | bool | Flags offensive or inappropriate language |
| Banned Keywords | banned_keywords | No | bool | Detects user-defined restricted terms |
| Regex Match | regex_match | No | category | Matches text against a user-defined regular expression |
| Language Detection | language_detection | No | string + float | Identifies the language of the source text |
| Topic Classification | topic_model | No | list[float] + string | Classifies text into user-defined topics using zero-shot classification |
Fast Safety
The Fast Safety enrichment evaluates text safety across 11 dimensions using Fiddler’s proprietary Fast Trust Model. Each dimension produces a boolean flag and a confidence probability score.
Enrichment key: ftl_prompt_safety
| Dimension | Output Columns | Score Range | Description |
|---|
illegal | illegal, illegal score | 0.0 — 1.0 | Content promoting illegal activities |
hateful | hateful, hateful score | 0.0 — 1.0 | Hateful or discriminatory content |
harassing | harassing, harassing score | 0.0 — 1.0 | Harassing or bullying content |
racist | racist, racist score | 0.0 — 1.0 | Racist content |
sexist | sexist, sexist score | 0.0 — 1.0 | Sexist content |
violent | violent, violent score | 0.0 — 1.0 | Content promoting violence |
sexual | sexual, sexual score | 0.0 — 1.0 | Sexually explicit content |
harmful | harmful, harmful score | 0.0 — 1.0 | Generally harmful content |
unethical | unethical, unethical score | 0.0 — 1.0 | Unethical content |
jailbreaking | jailbreaking, jailbreaking score | 0.0 — 1.0 | Jailbreaking or prompt injection attempts |
roleplaying | roleplaying, roleplaying score | 0.0 — 1.0 | Roleplaying attempts to bypass safety |
An aggregate max_risk_prob output is also generated, representing the maximum probability across all 11 dimensions.
For configuration details, see Enrichments: Fast Safety.
PII Detection
Detects and flags personally identifiable information using Presidio. Generates a boolean flag, matched text spans, and detected entity types.
Enrichment key: pii
Commonly used entity types: CREDIT_CARD, CRYPTO, DATE_TIME, EMAIL_ADDRESS, IBAN_CODE, IP_ADDRESS, LOCATION, PERSON, PHONE_NUMBER, URL, US_SSN, US_DRIVER_LICENSE, US_ITIN, US_PASSPORT
Fiddler supports 32 entity types in total, including international identifiers for Australia, India, Singapore, and the UK. For the full list, see the Presidio supported entities.
For configuration details, see Enrichments: PII.
Profanity
Flags offensive or inappropriate language using curated word lists from SurgeAI and Google.
Enrichment key: profanity
For configuration details, see Enrichments: Profanity.
Banned Keywords
Detects user-defined restricted terms in text inputs. The list of banned keywords is specified in the enrichment configuration.
Enrichment key: banned_keywords
For configuration details, see Enrichments: Banned Keywords.
Regex Match
Matches text against a user-defined regular expression pattern. Produces a categorical output of “Match” or “No Match”.
Enrichment key: regex_match
For configuration details, see Enrichments: Regex Match.
Language Detection
Identifies the language of the source text using fasttext models. Produces the detected language and a confidence probability.
Enrichment key: language_detection
For configuration details, see Enrichments: Language Detection.
Topic Classification
Classifies text into user-defined topics using a zero-shot classification model. Produces per-topic probability scores and the top-scoring topic.
Enrichment key: topic_model
For configuration details, see Enrichments: Topic.
Quality and hallucination metrics
Quality enrichments assess the accuracy, groundedness, and relevance of LLM-generated responses.
| Metric | Enrichment Key | LLM Required? | Output Type | Description |
|---|
| Fast Faithfulness | ftl_response_faithfulness | Yes (Fiddler FTL) | bool + float | Evaluates factual groundedness using Fiddler’s Fast Trust Model |
| RAG Faithfulness | faithfulness | Yes (OpenAI) | bool | Evaluates factual accuracy of responses against provided context |
| Answer Relevance | answer_relevance | Yes (OpenAI) | bool | Evaluates whether responses address the input prompt |
| Coherence | coherence | Yes (OpenAI) | bool | Assesses logical flow and clarity of responses |
| Conciseness | conciseness | Yes (OpenAI) | bool | Evaluates brevity and clarity of responses |
Fast Faithfulness
Evaluates the factual groundedness of AI-generated responses against provided context using Fiddler’s proprietary Fast Trust Model. Produces a boolean faithfulness flag and a confidence probability score.
Enrichment key: ftl_response_faithfulness
The faithfulness threshold defaults to 0.5 and can be adjusted in the configuration to control scoring sensitivity. Higher thresholds result in stricter faithfulness detection (fewer responses labeled as faithful).
For configuration details, see Enrichments: Fast Faithfulness.
RAG Faithfulness
Evaluates the accuracy and reliability of facts presented in AI-generated responses by checking whether the information aligns with the provided context documents. Uses an OpenAI LLM for evaluation.
Enrichment key: faithfulness
RAG Faithfulness vs Fast Faithfulness: This enrichment uses OpenAI for evaluation. Fast Faithfulness uses Fiddler’s Fast Trust Model for lower latency. See LLM-Based Metrics for a detailed comparison.
For configuration details, see Enrichments: Faithfulness.
Answer Relevance
Evaluates whether AI-generated responses address the input prompt. Produces a binary relevant/not-relevant result.
Enrichment key: answer_relevance
For configuration details, see Enrichments: Answer Relevance.
Coherence
Assesses the logical flow and clarity of AI-generated responses, checking whether the content maintains a consistent theme and argument structure.
Enrichment key: coherence
For configuration details, see Enrichments: Coherence.
Conciseness
Evaluates whether AI-generated responses communicate their message efficiently without unnecessary elaboration or redundancy.
Enrichment key: conciseness
For configuration details, see Enrichments: Conciseness.
Text statistics metrics
Text statistics enrichments provide quantitative analysis of text properties, including readability, length, and n-gram-based evaluation scores.
| Metric | Enrichment Key | LLM Required? | Output Type | Description |
|---|
| Textstat | textstat | No | float | Generates up to 19 text readability and complexity statistics |
| Evaluate | evaluate | No | float | Computes n-gram-based evaluation scores (BLEU, ROUGE, METEOR) |
| Sentiment | sentiment | No | float + string | Provides sentiment analysis using VADER |
| Token Count | token_count | No | int | Counts the number of tokens in a string |
Textstat
Generates text readability and complexity statistics using the textstat library. You can select specific statistics or use all 19 available metrics.
Enrichment key: textstat
| Sub-metric | Range | Description |
|---|
char_count | 0 — 64,000 | Character count |
letter_count | 0 — 64,000 | Letter count (alphabetical characters) |
miniword_count | 0 — 64,000 | Count of short words |
words_per_sentence | 0 — 1,000 | Average words per sentence |
polysyllabcount | 0 — 64,000 | Polysyllabic word count |
lexicon_count | 0 — 64,000 | Word count |
syllable_count | 0 — 96,000 | Total syllable count |
sentence_count | 0 — 32,000 | Sentence count |
flesch_reading_ease | -100 — 100 | Flesch Reading Ease score (higher = easier to read) |
smog_index | 0 — 30 | SMOG readability index |
flesch_kincaid_grade | -3.4 — 100 | Flesch-Kincaid Grade Level |
coleman_liau_index | 0 — 20 | Coleman-Liau readability index |
automated_readability_index | -3.4 — 100 | Automated Readability Index |
dale_chall_readability_score | 0 — 10 | Dale-Chall readability score |
difficult_words | 0 — 64,000 | Count of difficult words |
linsear_write_formula | 0 — 20 | Linsear Write readability formula |
gunning_fog | 0 — 20 | Gunning Fog readability index |
long_word_count | 0 — 64,000 | Count of long words |
monosyllabcount | 0 — 64,000 | Monosyllabic word count |
If no statistics are specified in the configuration, the default statistic is flesch_kincaid_grade.
For configuration details, see Enrichments: Textstat.
Evaluate
Computes n-gram-based evaluation metrics for comparing two text passages, such as an AI-generated response and a reference answer. These metrics score highest when the reference and generated texts contain overlapping sequences.
Enrichment key: evaluate
| Sub-metric | Output Column | Score Range | Description |
|---|
| BLEU | bleu | 0.0 — 1.0 | Precision of word n-grams between generated and reference text |
| ROUGE-1 | rouge1 | 0.0 — 1.0 | Unigram recall between generated and reference text |
| ROUGE-2 | rouge2 | 0.0 — 1.0 | Bigram recall between generated and reference text |
| ROUGE-L | rougeL | 0.0 — 1.0 | Longest common subsequence between generated and reference text |
| ROUGE-Lsum | rougeLsum | 0.0 — 1.0 | ROUGE-L applied at the summary level |
| METEOR | meteor | 0.0 — 1.0 | Combines precision, recall, and semantic matching |
For configuration details, see Enrichments: Evaluate.
Sentiment
Provides sentiment analysis using NLTK’s VADER lexicon. Produces a compound score and a categorical sentiment label.
Enrichment key: sentiment
| Output Column | Type | Description |
|---|
compound | float | Raw compound sentiment score |
sentiment | string | One of positive, negative, or neutral |
For configuration details, see Enrichments: Sentiment.
Token Count
Counts the number of tokens in a string using the tiktoken library.
Enrichment key: token_count
For configuration details, see Enrichments: Token Count.
Text validation metrics
Text validation enrichments verify the structural correctness of generated text outputs such as SQL queries and JSON payloads.
| Metric | Enrichment Key | LLM Required? | Output Type | Description |
|---|
| SQL Validation | sql_validation | No | bool + string | Validates SQL syntax for a specified dialect |
| JSON Validation | json_validation | No | bool + string | Validates JSON syntax and optionally against a schema |
SQL Validation
Validates SQL query syntax for a specified dialect. Supports 25+ SQL dialects including MySQL, PostgreSQL, Snowflake, BigQuery, and others.
Enrichment key: sql_validation
Query validation is syntax-based and does not check against any existing schema or databases for validity.
For configuration details, see Enrichments: SQL Validation.
JSON Validation
Validates JSON for correctness and optionally against a user-defined JSON Schema.
Enrichment key: json_validation
For configuration details, see Enrichments: JSON Validation.
Embedding metrics
Embedding enrichments convert text into vector representations for drift detection and visualization.
| Metric | Enrichment Key | LLM Required? | Output Type | Description |
|---|
| Text Embedding | TextEmbedding | No | vector + float | Generates text embeddings for UMAP visualization and drift detection |
| Centroid Distance | (auto-generated) | No | float | Distance from the nearest cluster centroid |
Text Embedding
Converts unstructured text into high-dimensional vector representations for semantic analysis. Enables Fiddler’s 3D UMAP visualizations and embedding-based drift detection.
Class: fdl.TextEmbedding()
TextEmbedding is configured using fdl.TextEmbedding() rather than fdl.Enrichment(). See the Enrichments Guide for usage examples.
Centroid Distance
Measures the distance between a data point’s embedding and the nearest cluster centroid. This metric is automatically generated when a TextEmbedding enrichment is created.
For configuration details, see Enrichments: Centroid Distance.