Documentation Index
Fetch the complete documentation index at: https://handbook.fiddler.ai/llms.txt
Use this file to discover all available pages before exploring further.
0.3
- New Evaluators
- Context Relevance (New): Measures whether retrieved documents are relevant to the user query. Ordinal scoring — High (1.0), Medium (0.5), Low (0.0) with detailed reasoning.
- RAG Faithfulness (New): LLM-as-a-Judge evaluator that assesses whether the response is grounded in the retrieved documents. Binary scoring — Yes (1.0) / No (0.0) with detailed reasoning.
- CustomJudge (New): Build custom LLM-as-a-Judge evaluators using
prompt_templatewith Jinja{{ placeholder }}syntax andoutput_fieldsfor structured evaluation results.
- Enhancements
- Answer Relevance 2.0: Upgraded from binary to ordinal scoring — High (1.0), Medium (0.5), Low (0.0) with detailed reasoning.
- Ordinal Score Bounding: Ordinal scores from the scoring API are now bounded to [0, 1].
- Enhancements
- Model and Credential Parameters:
modelandcredentialare now parameters on LLM-as-a-Judge evaluators, enabling configuration of the LLM used for evaluation. - Evaluator-Level Score Function Mapping: Evaluators now support
score_fn_kwargs_mappingat the evaluator level for more flexible parameter binding. - Score Name Prefix: Added support for custom score name prefixes on evaluators.
- Evals API Error Handling: Improved error handling and messaging for Evals API responses.
- Coherence Prompt Input Required: The
promptinput for the Coherence evaluator is now required. - Removed Pandas Core Dependency: Pandas moved from core to optional dependency, reducing install footprint.
- Docstring Standardization: Fixed docstring errors and standardized documentation format across all evaluators.
- Model and Credential Parameters:
- Removals
- Toxicity Evaluator Removed: The Toxicity evaluator has been removed from the SDK.
- Initial Release
- Core SDK with HTTP client, entity management (Project, Application, Dataset, Experiment), and the
evaluate()function for running experiments. - Evaluators: AnswerRelevance, Coherence, Conciseness, Sentiment, TopicClassification, FTLPromptSafety, FTLResponseFaithfulness, RegexSearch, and support for user-defined function evaluators.
- Data Input: Load test cases from pandas DataFrames, CSV files, or JSONL files.
- Concurrent Processing: Parallel evaluation with ThreadPoolExecutor and tqdm progress tracking.
- PyPI Publishing: Available as
pip install fiddler-evals.
- Core SDK with HTTP client, entity management (Project, Application, Dataset, Experiment), and the