Choosing your Guardrail Metrics

How to choose and understand your guardrail metrics

Galileo Metrics

Galileo has built a menu of Guardrail Metrics for you to choose from. These metrics are tailored to your use case and are designed to help you evaluate your prompts and models.

Galileo's Guardrail Metrics are a combination of industry-standard metrics (e.g. BLEU, ROUGE-1, Perplexity) and an outcome of Galileo's in-house ML Research Team (e.g. Uncertainty, Correctness, Context Adherence).

Here's a list of the metrics supported today

Output Quality Metrics:

  • Uncertainty: Measures the model's certainty in its generated responses. Uncertainty works at the response level as well as at the token level. It has shown a strong correlation with hallucinations or made-up facts, names, or citations.

  • Correctness - Measures whether the facts stated in the response are based on real facts. This metric requires additional LLM calls. Combined with Uncertainty, Factuality is a good way of uncovering Hallucinations.

  • BLEU & ROUGE-1 - These metrics measure n-gram similarities between your Generated Responses and your Target output. These metrics require a {target} column in your dataset.

  • Prompt Perplexity - Measure the perplexity of a prompt. Previous research has shown that as perplexity decreases, generations tend to increase in quality.

RAG Quality Metrics:

  • Context Adherence - Measures whether your model's response was purely based on the context provided. This metric is intended for RAG users. We have two options for this metric: Basic and Plus.

    • Context Adherence Basic is powered by small language models we've trained. It's free of cost.

    • Context Adherence Plus includes an explanation or rationale for the rating. These metrics and the explanations are powered by an LLM (e.g. OpenAI GPT3.5) and thus incur additional costs. Plus has shown to have better performance.

  • Completeness - Measures how thoroughly your model's response covered relevant information from the context provided. This metric is intended for RAG users.

    • Completeness Basic is powered by small language models we've trained. It's free of cost.

    • Completeness Plus includes an explanation or rationale for the rating. These metrics and the explanations are powered by an LLM (e.g. OpenAI GPT3.5) and thus incur additional costs. Plus has shown to have better performance.

  • Chunk Attribution - Measures which individual chunks retrieved in a RAG workflow influenced your model's response. This metric is intended for RAG users. This metric is computed by prompting an LLM, and thus requires additional LLM calls to compute.

    • Chunk Attribution Basic is powered by small language models we've trained. It's free of cost.

    • Chunk Attribution Plus is powered by an LLM (e.g. OpenAI GPT3.5) and thus incurs additional costs. Plus has shown to have better performance.

  • Chunk Utilization - For each chunk retrieved in a RAG workflow, measures the fraction of the chunk text that influenced your model's response. This metric is intended for RAG users.

    • Chunk Attribution Basic is powered by small language models we've trained. It's free of cost.

    • Chunk Attribution Plus is powered by an LLM (e.g. OpenAI GPT3.5) and thus incurs additional costs. Plus has shown to have better performance.

  • Context Relevance - Measures how relevant the context provided was to the user query. This metric is intended for RAG users. This metric requires {context} and {query} slots in your data, as well as embeddings for them (i.e. {context_embedding}, {query_embedding}.

Safety Metrics:

  • Private Identifiable Information - This Guardrail Metric surfaces any instances of PII in your model's responses. We surface whether your text contains any credit card numbers, social security numbers, phone numbers, street addresses, and email addresses.

  • Toxicity - Measures whether the model's responses contained any abusive, toxic, or foul language.

  • Tone - Classifies the tone of the response into 9 different emotion categories: neutral, joy, love, fear, surprise, sadness, anger, annoyance, and confusion.

  • Sexism - Measures how 'sexist' a comment might be perceived ranging in the values of 0-1 (1 being more sexist).

  • Prompt Injection - Detects and classifies various categories of prompt injection attacks.

  • More coming very soon.

A more thorough description of all Guardrail Metrics can be found here.

Find the aliases (constant names) for Galileo Guardrail Metrics here. You'll need the aliases to use them in code.

If you want to set up your custom metrics, please see instructions here.

Last updated