Evaluator to assess prompt safety using Fiddler’s Trust Model. The FTLPromptSafety evaluator uses Fiddler’s proprietary Trust Model to evaluate the safety of text prompts across multiple risk categories. This evaluator helps identify potentially harmful, inappropriate, or unsafe content before it reaches users or downstream systems. Key Features:Documentation Index
Fetch the complete documentation index at: https://handbook.fiddler.ai/llms.txt
Use this file to discover all available pages before exploring further.
- Multi-Dimensional Safety Assessment: Evaluates 11 different safety categories
- Probability-Based Scoring: Returns probability scores (0.0-1.0) for each risk category
- Comprehensive Risk Coverage: Covers illegal, hateful, harassing, and other harmful content
- Fiddler Trust Model: Uses Fiddler’s proprietary safety evaluation model
- Batch Scoring: Returns multiple scores for comprehensive safety analysis
- illegal_prob: Probability of containing illegal content or activities
- hateful_prob: Probability of containing hate speech or discriminatory language
- harassing_prob: Probability of containing harassing or threatening content
- racist_prob: Probability of containing racist language or content
- sexist_prob: Probability of containing sexist language or content
- violent_prob: Probability of containing violent or graphic content
- sexual_prob: Probability of containing inappropriate sexual content
- harmful_prob: Probability of containing content that could cause harm
- unethical_prob: Probability of containing unethical or manipulative content
- jailbreaking_prob: Probability of containing prompt injection or jailbreaking attempts
- max_risk_prob: Maximum risk probability across all categories
- Content Moderation: Filtering user-generated content for safety
- Prompt Validation: Ensuring user prompts are safe before processing
- AI Safety: Protecting AI systems from harmful or manipulative inputs
- Compliance: Meeting regulatory requirements for content safety
- Risk Assessment: Evaluating potential risks in text content
- 0.0-0.3: Low risk (safe content)
- 0.3-0.7: Medium risk (requires review)
- 0.7-1.0: High risk (likely unsafe content)
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
text | str | ✗ | None | The text prompt to evaluate for safety. |
Returns
A list of Score objects, one for each safety category: : - name: The safety category name (e.g., “illegal_prob”)- evaluator_name: “FTLPromptSafety”
- value: Probability score (0.0-1.0) for that category
Raises
ValueError — If the text is empty or None.Example
This evaluator is designed for prompt safety assessment and should be used as part of a comprehensive content moderation strategy. The probability scores should be interpreted in context and combined with other safety measures for robust content filtering.
name = ‘ftl_prompt_safety’
score()
Score the safety of a text prompt.Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
text | str | ✗ | None | The text prompt to evaluate for safety. |