Abstract base class for creating custom evaluators in Fiddler Evals. The Evaluator class provides a flexible framework for creating builtin and custom evaluators that can assess LLM outputs against various criteria. Each evaluator is responsible for a single, specific evaluation task (e.g., hallucination detection, answer relevance, exact match, etc.). Parameter Mapping: : Evaluators can define their own parameter mappings using score_fn_kwargs_mapping in the constructor. These mappings specify how data from the evaluation context (inputs, outputs, expected_outputs) should be passed to the evaluator’s score method. Mapping Priority (highest to lowest):Documentation Index
Fetch the complete documentation index at: https://handbook.fiddler.ai/llms.txt
Use this file to discover all available pages before exploring further.
- Evaluator-level score_fn_kwargs_mapping (set in constructor)
- Evaluation-level kwargs_mapping (passed to evaluate function)
- Default parameter resolution<br>
score method with parameters specific to your evaluation needs.
Example - custom evaluator with parameter mapping:
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
score_name_prefix | `str | None` | ✗ | None |
score_fn_kwargs_mapping | `ScoreFnKwargsMappingType | None` | ✗ | None |
The score method signature is intentionally flexible using *args and **kwargs to allow each evaluator to define its own parameter requirements. This design enables maximum flexibility while maintaining a consistent interface across all evaluators in the framework.
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
score_name_prefix | `str | None` | ✗ | None |
score_fn_kwargs_mapping | `Dict[str, str | Callable[[Dict[str, Any]], Any]] | None` | ✗ | None |
Example
Raises
ScoreFunctionInvalidArgs — If the mapping contains invalid parameter names that don’t match the evaluator’s score method signature. Return type: Noneproperty name : str
abstractmethod score(*args, **kwargs)
Evaluate inputs and return a score or list of scores. This method must be implemented by all concrete evaluator classes. Each evaluator can define its own parameter signature based on what it needs for evaluation. Common parameter patterns:- Output-only: score(self, output: str) -> Score
- Input-Output: score(self, input: str, output: str) -> Score
- Comparison: score(self, output: str, expected_output: str) -> Score
- All parameters: score(self, input: str, output: str, context: list[str]) -> Score
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
*args | Any | ✗ | None | Positional arguments specific to the evaluator’s needs. |
Returns
A single Score object or list of Score objects : representing the evaluation results. Each Score should include:- name: The score name (e.g., “has_zipcode”)
- evaluator_name: The evaluator name (e.g., “RegexMatch”)
- value: The score value (typically 0.0 to 1.0)
- status: SUCCESS, FAILED, or SKIPPED
- reasoning: Optional explanation of the score
- error: Optional error information if evaluation failed
Raises
- ValueError — If required parameters are missing or invalid.
- TypeError — If parameters have incorrect types.
- Exception — Any other evaluation-specific errors.