Registering and Using Custom Metrics
Registered Metrics enable the ability for your team to define the custom metrics (programmatic or GPT-based) for your Monitor projects.
To define a registered scorer, create a Python file that has at least 2 functions and follow the function signatures as described below:
scorer_fn: The scorer function is provided the row-wise inputs and is expected to generate outputs for each response. The expected signature for this function is:def scorer_fn(*, index: Union[int, str], response: str, **kwargs: Any) -> Union[float, int, bool, str, None]:...We support output of a floating points, integers, boolean values, and strings. We also recommend ensuring your
**kwargsso that your registered scorers are forward-compatible.
aggregator_fn: The aggregator function takes in an array of the row-wise outputs from your scorer and allows you to generate aggregates from those. The expected signature for the aggregator function is:def aggregator_fn(*, scores: List[Union[float, int, bool, str, None]]) -> Dict[str, Union[float, int, bool, str, None]]:...For aggregated values that you want to output from your scorer, return them as key-value pairs with the key corresponding to the label and the value.
import promptquality as pq
registered_scorer = pq.register_scorer(scorer_name="My Scorer", scorer_file="/path/to/scorer/file.py")
Your scorer will be executed in a Python 3.9 environment. The Python libraries available for your use are:
If you are using an ML model to make predictions, please ensure it is <= 500MB in size and uses either
tensorflow. We recommend optimizing it by using the ONNX Runtime if it is a larger model.
Please note that we regularly update the minor and patch versions of these packages. Major version updates are infrequent but if a library is critical to your scorer, please let us know and we'll provide 1+ week of warning before updating the major versions for those.
The name you choose here will be the name with which the values for this scorer appear in the UI later.
All your Registered Scorers will be shown under the Custom Metrics section of your Project Settings. The On/Off switch turns them on and off.
When your metrics are on, your registered scorer will be executed on new samples that get logged to Galileo Monitor (Note: scorers don't run retroactively, so past samples will not be scored). For each added Scorer, you'll see a new column in your Data view.