Integrate Fiddler with your existing data infrastructure to seamlessly ingest training data, production events, and ground truth labels. From cloud data warehouses to real-time streaming platforms, Fiddler connects to the data sources you already use.Documentation Index
Fetch the complete documentation index at: https://handbook.fiddler.ai/llms.txt
Use this file to discover all available pages before exploring further.
Why Data Integration Matters
AI observability requires continuous data flow from your ML pipelines and applications. Fiddler’s data platform integrations enable:- Automated Data Ingestion - Pull training datasets and production events without manual uploads
- Real-Time Monitoring - Stream prediction events for immediate drift and performance detection
- Unified Data Pipeline - Single integration point for all your ML data sources
- Ground Truth Enrichment - Automatically join production predictions with delayed labels
- Historical Analysis - Query data warehouse for model performance over time
Integration Categories
🏢 Data Warehouses
Connect Fiddler to your cloud data warehouse for batch data ingestion and historical analysis. Supported Platforms:- Snowflake - Cloud data warehouse with zero-copy data sharing ✓ GA
- Google BigQuery - Serverless data warehouse with SQL analytics ✓ GA
- Import training datasets from data warehouse tables
- Query historical model predictions for performance analysis
- Join production events with delayed ground truth labels
- Export Fiddler metrics back to warehouse for BI tools
📊 Data Streaming
Stream real-time prediction events and feedback directly to Fiddler for immediate observability. Supported Platforms:- Apache Kafka - Distributed event streaming platform ✓ GA
- Amazon S3 - Object storage with event notifications ✓ GA
- Stream model predictions in real-time from production services
- Monitor agentic AI interactions as they occur
- Trigger alerts on data quality issues within seconds
- Capture ground truth feedback from user interactions
🔄 Orchestration & Pipelines
Integrate Fiddler into your ML workflow orchestration for automated monitoring at every pipeline stage. Supported Platforms:- Apache Airflow - Workflow orchestration platform ✓ GA
- AWS SageMaker Pipelines - Managed ML pipeline service ✓ GA
- Automatically upload datasets when training pipelines complete
- Trigger model evaluation as part of CI/CD workflows
- Schedule periodic drift checks and performance reports
- Orchestrate ground truth label collection and enrichment
Data Warehouse Integrations
Snowflake
Why Snowflake + Fiddler:- Zero-Copy Data Sharing - No data duplication, direct queries to Snowflake
- Secure Data Access - OAuth 2.0 and key-pair authentication
- Scalable Analytics - Leverage Snowflake’s compute for large datasets
- Cost-Effective - Pay only for queries executed, no data transfer fees
Google BigQuery
Why BigQuery + Fiddler:- Serverless Architecture - No infrastructure management
- SQL-Based Queries - Familiar interface for data teams
- Federated Queries - Join Fiddler data with other GCP sources
- Machine Learning - BigQuery ML model monitoring integration
Streaming Integrations
Apache Kafka
Why Kafka + Fiddler:- Real-Time Monitoring - Sub-second latency from prediction to observability
- High Throughput - Handle millions of events per second
- Event Replay - Replay historical events for testing and validation
- Exactly-Once Semantics - Guaranteed delivery for critical predictions
Amazon S3
Why S3 + Fiddler:- Batch Processing - Ingest large datasets efficiently
- Event Notifications - Automatic processing when new files arrive
- Data Lake Integration - Monitor models trained on S3 data lakes
- Cost-Effective Storage - Archive historical predictions in S3
Orchestration & Pipeline Integrations
Apache Airflow
Why Airflow + Fiddler:- Automated Workflows - Trigger Fiddler operations as DAG tasks
- Dependency Management - Ensure data quality before model training
- Scheduling - Periodic drift checks and model evaluations
- Observability - Monitor ML pipelines and models in one platform
AWS SageMaker Pipelines
Why SageMaker Pipelines + Fiddler:- Native AWS Integration - Seamless with SageMaker Partner AI App
- End-to-End ML Workflows - From data prep to model monitoring
- Model Registry Integration - Automatic monitoring setup for registered models
- Cost Optimization - Leverage existing SageMaker infrastructure
Integration Selector
Not sure which data integration to use? Here’s a quick decision guide:| Your Data Source | Recommended Integration | Why |
|---|---|---|
| Snowflake data warehouse | Snowflake connector | Zero-copy sharing, direct queries |
| BigQuery tables | BigQuery connector | Serverless, SQL-based, GCP-native |
| Real-time prediction streams | Kafka integration | Sub-second latency, high throughput |
| S3 data lake | S3 integration | Batch processing, event-driven uploads |
| Airflow ML pipelines | Airflow operators | Automated workflows, task dependencies |
| SageMaker workflows | SageMaker Pipelines | Native AWS integration, model registry |
Getting Started
Prerequisites
Before setting up data integrations, ensure you have:- Fiddler Account - Cloud or on-premises instance
- API Key - Generate from Fiddler UI Settings
- Data Source Access - Credentials with read permissions
- Network Connectivity - Firewall rules allowing Fiddler → Data Source
General Setup Pattern
All data integrations follow this pattern: 1. Configure ConnectionAdvanced Patterns
Pattern 1: Multi-Source Data Enrichment
Combine data from multiple sources for comprehensive monitoring:Pattern 2: Data Quality Validation
Validate data quality before ingestion:Pattern 3: Incremental Updates
Efficiently update datasets with only new data:Data Format Requirements
Baseline/Training Data
Must include:- Features - All model input features
- Predictions - Model outputs (for validation)
- Metadata (optional) - Additional context fields
Production Event Data
Must include:- Event ID - Unique identifier
- Timestamp - Event time
- Features - Model inputs
- Predictions - Model outputs
- Model Version (optional) - For multi-model monitoring
Security & Compliance
Authentication Methods
Snowflake:- Username/Password
- Key Pair Authentication (recommended for production)
- OAuth 2.0
- Service Account JSON key
- Application Default Credentials
- Workload Identity (GKE)
- SASL/PLAIN
- SASL/SCRAM
- mTLS
- IAM Role (recommended for AWS deployments)
- Access Key / Secret Key
- Cross-account access via IAM role assumption
Data Privacy
- Encryption in Transit - TLS 1.3 for all data transfers
- Encryption at Rest - Data encrypted in Fiddler storage
- PII Redaction - Automatically detect and redact sensitive fields
- Data Retention - Configurable retention policies per dataset
Network Security
Firewall Rules:- AWS PrivateLink - For SageMaker Partner AI App
- VPC Peering - Direct connection to data sources
- VPN Tunnels - Secure connectivity for on-premises sources
Monitoring Data Pipeline Health
Connection Health Checks
Data Ingestion Metrics
Monitor data pipeline performance:- Ingestion Latency - Time from source to Fiddler
- Throughput - Events per second processed
- Error Rate - Failed ingestion attempts
- Data Freshness - Time since last successful update
Alerts on Pipeline Failures
Troubleshooting
Common Issues
Connection Timeouts:- Check network connectivity and firewall rules
- Verify credentials are current and have proper permissions
- Ensure data source is reachable from Fiddler’s network
- Validate data types match Fiddler’s expected schema
- Check for null values in required fields
- Ensure timestamp fields use supported formats (ISO 8601)
- For Kafka: Check consumer lag and partition count
- For Data Warehouses: Optimize queries, add indexes
- For S3: Use Parquet or ORC instead of CSV
- Enable data validation rules before ingestion
- Set up alerts for out-of-range values
- Configure automatic PII redaction
Related Integrations
- Cloud Platforms - Deploy Fiddler on AWS, Azure, GCP
- ML Platforms - Integrate with Databricks, MLflow
- Agentic AI - Monitor LangGraph and Strands Agents
- Monitoring & Alerting - Send alerts to incident management tools