Documentation Index
Fetch the complete documentation index at: https://handbook.fiddler.ai/llms.txt
Use this file to discover all available pages before exploring further.
Fiddler Guardrails provide real-time protection for your LLM applications by detecting and preventing harmful content, PII leaks, and hallucinations before they reach your users.
Time to complete: ~15 minutes
What You’ll Learn
- How to set up Fiddler Guardrails
- How to use the three main guardrail types (Safety, PII, Faithfulness)
- How to interpret risk scores
- How to integrate guardrails into your LLM application
Prerequisites
- Fiddler Guardrails Account: Sign up for Free Guardrails
- API Key: Generated from your Fiddler Guardrails dashboard
- Python 3.8+ (or any HTTP client)
Quick Start: Setting Up Guardrails
Step 1: Get Your API Key
- Sign up at fiddler.ai/free-guardrails
- Activate your account via email
- Generate your API key from the dashboard
For detailed setup instructions, see the Guardrails Getting Started Guide.
Step 2: Install Required Libraries (Optional)
# For Python
pip install requests
# Or use any HTTP client in your preferred language
import requests
import json
# Your API credentials
FIDDLER_URL = "https://your-instance.fiddler.ai" # Replace with your Fiddler instance URL
API_KEY = "your-api-key-here"
# Standard headers for all guardrail requests
HEADERS = {
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json"
}
Guardrail Types and Usage
Each guardrail type has its own endpoint and request/response format. Choose the appropriate guardrail based on your protection needs.
🛡️ Safety Guardrails
Detect harmful, toxic, or jailbreaking content across 10 safety dimensions.
Endpoint: /v3/guardrails/ftl-safety
Use cases:
- Content moderation
- Jailbreak prevention
- Toxic content detection
Example: Check for Harmful Content
def check_safety(text):
"""Check text for safety violations across 10 dimensions."""
response = requests.post(
f"{FIDDLER_URL}/v3/guardrails/ftl-safety",
headers=HEADERS,
json={"data": {"input": text}}
)
return response.json()
# Test with potentially harmful content
test_input = "I want to cause harm to others"
safety_scores = check_safety(test_input)
print(json.dumps(safety_scores, indent=2))
Response Format:
{
"fdl_harmful": 0.92,
"fdl_violent": 0.78,
"fdl_unethical": 0.65,
"fdl_illegal": 0.45,
"fdl_sexual": 0.12,
"fdl_racist": 0.08,
"fdl_jailbreaking": 0.23,
"fdl_harassing": 0.54,
"fdl_hateful": 0.71,
"fdl_sexist": 0.15
}
Safety Dimensions:
fdl_harmful - General harmful content
fdl_violent - Violence and threats
fdl_unethical - Unethical behavior
fdl_illegal - Illegal activities
fdl_sexual - Sexual content
fdl_racist - Racist content
fdl_jailbreaking - Prompt manipulation attempts
fdl_harassing - Harassment
fdl_hateful - Hateful content
fdl_sexist - Sexist content
fdl_roleplaying - Prompting persona change
Interpreting Safety Scores
Each dimension returns a score between 0 and 1:
- 0.0 - 0.3: Low risk (safe to proceed)
- 0.3 - 0.7: Medium risk (review recommended)
- 0.7 - 1.0: High risk (block or flag for review)
def is_safe(safety_scores, threshold=0.7):
"""
Check if content passes safety check.
Returns True if ALL dimensions are below threshold.
"""
dangerous_dimensions = [
dim for dim, score in safety_scores.items()
if score >= threshold
]
if dangerous_dimensions:
print(f"⚠️ Safety violations: {dangerous_dimensions}")
return False
print("✅ Content passed safety check")
return True
# Use in your application
if is_safe(safety_scores):
# Proceed with LLM processing
pass
else:
# Block or flag content
print("Content blocked due to safety concerns")
→ Safety Guardrails Tutorial
🔒 PII Detection
Detect personally identifiable information (PII), protected health information (PHI), and custom sensitive data.
Endpoint: /v3/guardrails/sensitive-information
Use cases:
- Data privacy compliance
- GDPR/CCPA protection
- Sensitive data redaction
Example 1: Detect PII
def detect_pii(text, entity_categories="PII"):
"""
Detect sensitive information in text.
Args:
text: Input text to analyze
entity_categories: "PII", "PHI", "Custom Entities", or list like ["PII", "PHI"]
"""
payload = {
"data": {
"input": text,
"entity_categories": entity_categories
}
}
response = requests.post(
f"{FIDDLER_URL}/v3/guardrails/sensitive-information",
headers=HEADERS,
json=payload
)
return response.json()
# Test with PII data
test_text = """
Contact John Doe at john.doe@email.com or call (555) 123-4567.
SSN: 123-45-6789. Credit card: 4111-1111-1111-1111.
"""
pii_results = detect_pii(test_text)
print(json.dumps(pii_results, indent=2))
Response Format:
{
"fdl_sensitive_information_scores": [
{
"score": 0.987,
"label": "person",
"text": "John Doe",
"start": 8,
"end": 16
},
{
"score": 0.998,
"label": "email",
"text": "john.doe@email.com",
"start": 20,
"end": 38
},
{
"score": 0.991,
"label": "social_security_number",
"text": "123-45-6789",
"start": 72,
"end": 83
}
]
}
Response Fields:
score - Confidence score (0.0 to 1.0)
label - Entity type (e.g., “email”, “social_security_number”)
text - The detected sensitive information
start / end - Character positions in the input text
Example 2: Detect PHI (Healthcare Data)
# Detect protected health information
healthcare_text = """
Patient John Smith prescribed metformin for diabetes.
Insurance number: HI-987654321.
"""
phi_results = detect_pii(healthcare_text, entity_categories="PHI")
# Display detected PHI entities
for entity in phi_results.get("fdl_sensitive_information_scores", []):
print(f"Found {entity['label']}: '{entity['text']}' (confidence: {entity['score']:.3f})")
Example 3: Custom Entity Detection
# Detect organization-specific sensitive data
custom_text = "Employee ID: EMP-2024-001, API key: sk-abc123xyz789"
custom_results = detect_pii(
custom_text,
entity_categories="Custom Entities"
)
# Note: For custom entities, you can also specify the entity types:
payload = {
"data": {
"input": custom_text,
"entity_categories": "Custom Entities",
"custom_entities": ["employee id", "api key", "project code"]
}
}
Supported Entity Categories:
- PII: 35+ types including names, addresses, SSN, credit cards, emails, phone numbers
- PHI: 7 healthcare-specific types (medication, medical conditions, health insurance numbers)
- Custom Entities: Define your own sensitive data patterns
Processing PII Results
def redact_pii(text, pii_results):
"""Redact detected PII from text."""
entities = pii_results.get("fdl_sensitive_information_scores", [])
# Sort by position in reverse to maintain correct offsets
entities_sorted = sorted(entities, key=lambda x: x['start'], reverse=True)
redacted_text = text
for entity in entities_sorted:
redacted_text = (
redacted_text[:entity['start']] +
f"[REDACTED_{entity['label'].upper()}]" +
redacted_text[entity['end']:]
)
return redacted_text
# Use in your application
if pii_results.get("fdl_sensitive_information_scores"):
clean_text = redact_pii(test_text, pii_results)
print(f"Redacted: {clean_text}")
→ PII Detection Tutorial
✅ FTL Faithfulness Detection
Detect hallucinations and unsupported claims by comparing LLM outputs to source context (for RAG applications) using Fiddler’s proprietary Fast Trust Model.
Endpoint: /v3/guardrails/ftl-response-faithfulness
This guardrail uses the FTL Faithfulness model for real-time content blocking. For RAG pipeline diagnostics using the LLM-as-a-Judge approach, see RAG Health Metrics.
Use cases:
- RAG application accuracy
- Fact-checking
- Hallucination prevention
Example: Check Response Faithfulness
def check_faithfulness(llm_response, source_context):
"""
Check if LLM response is faithful to the provided context.
Args:
llm_response: The text generated by your LLM
source_context: The reference text from your knowledge base/retrieval
"""
payload = {
"data": {
"response": llm_response,
"context": source_context
}
}
response = requests.post(
f"{FIDDLER_URL}/v3/guardrails/ftl-response-faithfulness",
headers=HEADERS,
json=payload
)
return response.json()
# Test with RAG example
retrieved_context = """
The Eiffel Tower is located in Paris, France. It was completed in 1889
and stands 330 meters tall. It was designed by Gustave Eiffel.
"""
llm_response_correct = "The Eiffel Tower in Paris is 330 meters tall and was completed in 1889."
llm_response_hallucinated = "The Eiffel Tower in Paris is 450 meters tall and was completed in 1895."
# Check faithful response
faithful_score = check_faithfulness(llm_response_correct, retrieved_context)
print(f"Faithful response score: {faithful_score}")
# Check hallucinated response
hallucinated_score = check_faithfulness(llm_response_hallucinated, retrieved_context)
print(f"Hallucinated response score: {hallucinated_score}")
Response Format:
{
"fdl_faithful_score": 0.92
}
Score Interpretation:
- 0.0 - 0.3: Low faithfulness (likely hallucination)
- 0.3 - 0.7: Medium faithfulness (review recommended)
- 0.7 - 1.0: High faithfulness (response is well-supported by context)
def is_faithful(faithfulness_result, threshold=0.7):
"""Check if response is faithful to context."""
score = faithfulness_result.get("fdl_faithful_score", 0.0)
if score >= threshold:
print(f"✅ Response is faithful (score: {score:.3f})")
return True
else:
print(f"⚠️ Possible hallucination detected (score: {score:.3f})")
return False
# Use in your RAG application
if not is_faithful(faithful_score):
print("Warning: LLM response may contain unsupported claims")
→ Faithfulness Tutorial
Common Integration Patterns
Check user input before sending to your LLM:
def process_user_input(user_message):
"""Process and validate user input before LLM processing."""
# Step 1: Check safety
safety_scores = check_safety(user_message)
# Block if any safety dimension exceeds threshold
max_safety_score = max(safety_scores.values())
if max_safety_score >= 0.7:
return {
"error": "Your message contains inappropriate content.",
"blocked": True
}
# Step 2: Check for PII and redact if needed
pii_results = detect_pii(user_message)
if pii_results.get("fdl_sensitive_information_scores"):
# Redact PII before sending to LLM
user_message = redact_pii(user_message, pii_results)
print(f"⚠️ PII detected and redacted")
# Step 3: Proceed with LLM processing
return {
"message": user_message,
"blocked": False
}
# Example usage
user_input = "My SSN is 123-45-6789. Can you help me?"
result = process_user_input(user_input)
if not result.get("blocked"):
# Safe to send to LLM
llm_response = call_your_llm(result["message"])
Pattern 2: Post-Processing (Output Guardrails)
Check LLM output before returning to user:
def validate_llm_output(llm_response, retrieval_context=None):
"""Validate LLM output before returning to user."""
# Step 1: Check for PII in output
pii_results = detect_pii(llm_response)
if pii_results.get("fdl_sensitive_information_scores"):
# Redact any PII in the response
llm_response = redact_pii(llm_response, pii_results)
print("⚠️ PII detected in LLM output and redacted")
# Step 2: Check faithfulness (for RAG applications)
if retrieval_context:
faithfulness_result = check_faithfulness(llm_response, retrieval_context)
if not is_faithful(faithfulness_result, threshold=0.7):
return {
"response": llm_response,
"warning": "This response may contain information not supported by source documents."
}
return {
"response": llm_response,
"warning": None
}
# Example usage in RAG application
context = retrieve_from_knowledge_base(user_query)
llm_output = generate_llm_response(user_query, context)
validated = validate_llm_output(llm_output, context)
if validated.get("warning"):
print(f"⚠️ {validated['warning']}")
return validated["response"]
Pattern 3: Complete LLM Pipeline with Multiple Guardrails
def safe_llm_pipeline(user_input, use_rag=True):
"""Complete LLM pipeline with comprehensive guardrails."""
# === INPUT GUARDRAILS ===
# 1. Safety check
safety_scores = check_safety(user_input)
if max(safety_scores.values()) >= 0.7:
return {"error": "Inappropriate content detected", "blocked": True}
# 2. PII detection and redaction
pii_input = detect_pii(user_input)
if pii_input.get("fdl_sensitive_information_scores"):
user_input = redact_pii(user_input, pii_input)
# === LLM PROCESSING ===
context = None
if use_rag:
context = retrieve_from_knowledge_base(user_input)
llm_response = generate_llm_response(user_input, context)
# === OUTPUT GUARDRAILS ===
# 3. PII detection in output
pii_output = detect_pii(llm_response)
if pii_output.get("fdl_sensitive_information_scores"):
llm_response = redact_pii(llm_response, pii_output)
# 4. Faithfulness check (for RAG)
warning = None
if use_rag and context:
faithfulness = check_faithfulness(llm_response, context)
if faithfulness.get("fdl_faithful_score", 0) < 0.7:
warning = "Response may contain unsupported claims"
return {
"response": llm_response,
"warning": warning,
"blocked": False
}
Best Practices
- Layer Multiple Guardrails: Use safety + PII for inputs, faithfulness + PII for outputs
- Set Appropriate Thresholds: Adjust risk score thresholds based on your use case sensitivity
- Log All Checks: Track guardrail results for monitoring and continuous improvement
- Handle Gracefully: Provide helpful user-facing messages when content is blocked
- Monitor Performance: Track false positives/negatives and adjust thresholds accordingly
- Consider Latency: Guardrail checks add ~100-300ms - use async calls when possible
- Respect Rate Limits: Free tier has limits (2 req/s, 70 req/hr, 200 req/day)
Error Handling
def safe_guardrail_check(guardrail_func, *args, **kwargs):
"""Wrapper for safe guardrail execution with error handling."""
try:
response = guardrail_func(*args, **kwargs)
return response, None
except requests.exceptions.HTTPError as e:
if e.response.status_code == 401:
error = "Authentication failed. Check your API key."
elif e.response.status_code == 413:
error = "Input exceeds token length limit."
elif e.response.status_code == 429:
error = "Rate limit exceeded. Please retry later."
else:
error = f"HTTP error: {e.response.status_code}"
return None, error
except requests.exceptions.Timeout:
return None, "Request timed out."
except Exception as e:
return None, f"Unexpected error: {str(e)}"
# Usage
safety_result, error = safe_guardrail_check(check_safety, user_input)
if error:
print(f"Guardrail check failed: {error}")
# Fallback behavior
else:
# Process safety_result
pass
Next Steps
Summary
You’ve learned how to:
- ✅ Use Safety Guardrails to detect harmful content across 10 dimensions
- ✅ Detect and redact PII, PHI, and custom sensitive information
- ✅ Check response faithfulness to prevent hallucinations in RAG applications
- ✅ Integrate multiple guardrails into your LLM pipeline
- ✅ Handle errors and respect rate limits
Each guardrail type uses a different endpoint and response format optimized for its specific protection purpose. Combine multiple guardrails for comprehensive LLM application safety.