Machine learning task types supported by Fiddler. This enum defines the different types of ML tasks that Fiddler can monitor. The task type determines which metrics are calculated, how performance is measured, and what monitoring capabilities are available. Task-Specific Features:Documentation Index
Fetch the complete documentation index at: https://handbook.fiddler.ai/llms.txt
Use this file to discover all available pages before exploring further.
- Classification: Accuracy, precision, recall, F1, AUC, confusion matrix
- Regression: MAE, MSE, RMSE, R², residual analysis
- Ranking: NDCG, MAP, precision@k, ranking-specific metrics
- LLM: Token-based metrics, response quality, safety metrics
Examples
Configuring models for different tasks:Task type cannot be changed after model creation. Choose carefully based on your model’s primary objective and output format.
BINARY_CLASSIFICATION = ‘binary_classification’
Two-class classification tasks. Used for models that predict one of two possible outcomes or classes. Enables binary classification metrics and threshold-based analysis. Available metrics:- Accuracy, Precision, Recall, F1-score
- AUC-ROC, AUC-PR curves
- Confusion matrix analysis
- Threshold optimization tools
- Fraud detection (fraud/legitimate)
- Email spam filtering (spam/ham)
- Medical diagnosis (positive/negative)
- Credit approval (approve/deny)
- Churn prediction (churn/retain)
MULTICLASS_CLASSIFICATION = ‘multiclass_classification’
Multi-class classification tasks. Used for models that predict one of multiple possible classes or categories. Supports comprehensive multiclass performance analysis and class-specific metrics. Available metrics:- Per-class precision, recall, F1-score
- Macro and micro-averaged metrics
- Confusion matrix with multiple classes
- Class distribution analysis
- Document categorization (multiple topics)
- Image classification (multiple objects)
- Sentiment analysis (positive/neutral/negative)
- Product categorization
- Intent classification in chatbots
REGRESSION = ‘regression’
Continuous value prediction tasks. Used for models that predict numerical values on a continuous scale. Enables regression-specific metrics and residual analysis. Available metrics:- Mean Absolute Error (MAE)
- Mean Squared Error (MSE)
- Root Mean Squared Error (RMSE)
- R-squared (coefficient of determination)
- Residual distribution analysis
- Price prediction
- Sales forecasting
- Risk scoring (continuous scores)
- Demand forecasting
- Performance rating prediction
RANKING = ‘ranking’
Ranking and recommendation tasks. Used for models that rank items or provide ordered recommendations. Supports ranking-specific metrics and list-wise evaluation. Available metrics:- Normalized Discounted Cumulative Gain (NDCG)
- Mean Average Precision (MAP)
- Precision@K, Recall@K
- Mean Reciprocal Rank (MRR)
- Hit Rate analysis
- Search result ranking
- Product recommendations
- Content recommendation systems
- Information retrieval
- Personalized ranking
LLM = ‘llm’
Large language model and generative AI tasks. Used for language models, chatbots, and generative AI applications. Enables LLM-specific monitoring including safety, quality, and performance metrics. Available metrics:- Response quality metrics
- Safety and toxicity detection
- Hallucination detection
- Token-based analysis
- Latency and throughput metrics
- Chatbots and conversational AI
- Text generation models
- Question-answering systems
- Code generation models
- Content creation assistants
- Guardrails integration
- Safety monitoring
- Prompt and response analysis
- Token usage tracking