Customizable Metrics

Models supported

Number of Judges supported

Changing the model or number of judges of a Chainpoll metric

Galileo

Client Reference

API Reference

Examples

AI Research

Deployments

Get GenAI Studio

Improve metric accuracy by customizing your Chainpoll-powered metrics

Customize Chainpoll-powered Metrics

Evaluate, Observe, and Protect your GenAI applications

What is Galileo?

Stop experimenting in spreadsheets and notebooks. Use Evaluate’s powerful insights to build GenAI systems that just work.

Overview

Quickstart

Integrate Evaluate Into My Existing Application With Python

Prompt Engineering From A Ui

How To

Before starting your experiments, we recommend creating an evaluation set.

Create an Evaluation Set

How to use Galileo Evaluate for prompt engineering

Evaluate and Optimize Prompts

How to use Galileo Evaluate with RAG applications

Evaluate and Optimize RAG Applications

How to choose and understand your guardrail metrics

Choose your Guardrail Metrics

Learn how to turn on metrics when creating runs in your Python environment.

Enabling Scorers in Runs

How to use Galileo Evaluate to find Hallucinations

Identify Hallucinations

Galileo GenAI Studio supports Custom Metrics (programmatic or GPT-based) for all your Evaluate and Observe projects. Depending on where, when, and how you want these metrics to be executed, you have the option to choose between **Custom Scorers** and **Registered Scorers**.

Register Custom Metrics

Prompt Management-Storage

Easily compare multiple LLM runs in a single screen for better decision making

A/B Compare Prompts

Execute prompts in bulk for faster experimentation

Experiment with Multiple Prompts

If you're building a multi-step workflow or chain (e.g. a RAG system, an Agent, or a chain) and want to experiment with multiple combinations of parameters or your versions at once, Chain Sweeps are your friend.

Experiment with Multiple Workflows

Galileo allows you to do qualitative human evaluations of your prompts and responses.

Evaluate with Human Feedback

While you are experimenting with your prompts you will probably be tuning many parameters.

Add Tags and Metadata to Prompt Runs

If you already have a dataset of requests and application responses, and you want to log and evaluate these on Galileo without re-generating the responses, you can do so via our worflows.

Log Pre-generated Responses in Python

If you want to fetch your logged data and metrics programmatically, you can do so via our Python clients.

Programmatically fetch logged data

Galileo Evaluate is geared for cross-functional collaboration. Most of the teams using Galileo consist of a mix of the following personas

Collaborate with other personas

All projects on Galileo can be shared with others to enable collaboration.

Share a project

To download the results of your evaluation you can use the Export function. To export your runs, simply click on _Export Prompt Data._

Export your Evaluation Runs

An important step towards debugging and evaluating your LLM applications is understanding your metric values and what led to them.

Understand your metric's values

Expected outputs are a key element for evaluating LLM applications. They provide benchmarks to measure model accuracy, identify errors, and ensure consistent assessments.

Logging and Comparing against your Expected Answers

Controlling access to projects using roles and groups

Access Control

Learn how to use Automatic Run Ranking to find the best run

Finding the best run

Learn how to automatically improve your prompts

Automatic Prompt Optimization

Integrations

Galileo allows you to integrate with your Langchain application natively through callbacks

Langchain

No matter how you're orchestrating your workflows, we have an interface to help you upload them to Galileo.

Logging Workflows

Project

Runs in Galileo are experiments or iterations done within a [project](/galileo/gen-ai-studio-products/galileo-evaluate/concepts/project).

Template

Metrics are quantitative or qualitative ways to express insights about the [run](/galileo/gen-ai-studio-products/galileo-evaluate/concepts/run).

Metrics

Your metrics are failing and you're not sure why? Below are a few reasons some of your metrics might fail and what you can do about them

Errors When Computing Metrics

Confused by whether to use Context Adherence or Instruction Adherence? Below are several differences between the two metrics and when to use each one

Context Adherence versus Instruction Adherence

How to monitor your apps with Galileo Observe

Choosing Your Guardrail Metrics

Registered Metrics enable the ability for your team to define the custom metrics (programmatic or GPT-based) for your Observe projects.

Registering And Using Custom Metrics

How to set up Alerts and automatically be alerted when things go wrong

Setting Up Alerts

Once your monitored LLM app is up and running and you've selected your Guardrail Metrics, you can start monitoring your LLM app using Galileo.

Identifying And Debugging Issues

Galileo Observe allows you to monitor your Retrieval-Augmented Generation (RAG) application with out-of-the-box Tracing and Analytics.

Monitoring Your Rag Application

Learn how to manually log your data via our Python Logger

Logging Data Via Python

Programmatically Fetching Logged Data

To download your Observe Data you can use the Export function.

Exporting Your Data

How to Set Up Access Control

Learn how to manually log your data from your Langchain Chains

Logging Data Via Langchain Callback

Creating And Using Stages

Invoking Rulesets

Defining Rules

Your Protect Rules rely on [Guardrail Metrics](/galileo/gen-ai-studio-products/galileo-protect/how-to/supported-metrics-and-operators). Metrics are calculated using ML models, which can have varying latencies.

Setting A Timeout On Your Protect Requests

When you're using the Galileo Protect product, once you've created a project and a stage, you can pause and resume the stage.

Pausing Or Resuming A Stage

Editing Centralized Stages

Galileo Protect can also be used within your Langchain workflows. You can use Protect to validate inputs and outputs at different stages of your workflow. We provide a `tool` that allows you to easily integrate Protect into your Langchain workflows. 

A set of rulesets that are applied during _one_ invocation.

Stage

All of the Rules within a Ruleset are executed in parallel, and the final resolution depends on all of the rules being completed.

Ruleset

A condition or rule you never want your application to break. It's composed of three ingredients

Rule

Galileo will provide a set of action types (override, passthrough), that the user can use, along with a configuration for each action type.

Action

Ground Truth Adherence

Instruction Adherence

Correctness

Understand Galileo's Chunk Relevance Luna Metric

Chunk Relevance

Uncertainty

Understanding Galileo's Prompt Perplexity Metrics

Prompt Perplexity

Understand Galileo's Context Relevance Metric

Context Relevance

Private Identifiable Information

Toxicity

Tone

Sexism

BLEU and ROUGE

Understand Galileo's Prompt Injection metric

Prompt Injection

Automatic Data Insights on your Seq2Seq dataset

Configuring Dq Auto

Upon completing a run, you'll be taken to the Galileo Console.

Console Walkthrough

Galileo Alerts are your starting point in your data inspection journey.

Using Alerts

Finetuning an LLM often requires large datasets.

Visualizing And Understanding Your Data

It is crucial to quickly identify errors in ML training data and fix them. This is incredibly hard to do at scale when working with millions of data points.

Using Data Error Potential

On dataset splits where generations are enabled (e.g. the _Test split_), you'll be seeing Uncertainty Scores and Token-level Uncertainty highlighting

Using Uncertainty

Similarity search allows you to discover **similar samples** within your datasets

Finding Similar Samples

Taking Action

Galileo NLP Studio supports Natural Language Processing Tasks across the life-cycle of your model development.

Product Features

The Dataset View provides an interactive data table for inspecting your datasets.

Dataset View

The Embeddings View provides a visual playground for you to interact with your datasets.

Embeddings View

Insights Panel

Alerts

To help you make sense of your data and your embeddings view, Galileo provides out-of-the-box Clustering and Explainability.

Clustering

For use cases with complex data and error types (e.g. Named Entity Recognition, Object Detection or Semantic Segmentation), the **Error Types Chart** gives you an insight into exactly how the Ground Truth differed from your model's predictions

Error Types Breakdown

Similarity search provides out of the box ability to discover **similar samples** within your datasets.

Similarity Search

Slices is a powerful Galileo feature that allows you to monitor, across training runs, a sub-population of the dataset based on metadata filters.

Dataset Slices

Actions help close the inspection loop and error discovery process. We support a number of actions.

Actions

Track your experiments, data and models in one place

Compare Across Runs

Galileo has integrates seamlessly with your tools.

Third Party  3p  Integrations

Galileo + Delta Lake  Databricks 

You have questions, we have (some) answers!

FAQs

Galileo AI Research

This page provides a brief overview of the research behind Galileo's RAG Quality Metrics.

Rag Quality Metrics Using Luna

ChainPoll is a powerful, flexible technique for LLM-based evaluation that is unique to Galileo. It is used to power multiple metrics across the Galileo platform.

Chainpoll

Rag Quality Metrics Using Chainpoll

Galileo Data Error Potential  (Dep) 

Data Drift Detection

Likely Mislabeled

Detecting samples on the decision boundary

Class Boundary Detection

This page describes the rich error types offered by Galileo for Object Detection

Errors In Object Detection

Explains our Automatic Prompt Optimization Client with a detailed walk through

Enterprise Deployment

Before deploying Galileo, ensure the following prerequisites are met.

Pre Requisites

  Dependencies

Setting Up New Users

This page details the steps to deploy a Galileo Kubernetes cluster in Microsoft Azure's AKS service environment.

Azure AKS

The Galileo applications run on managed Kubernetes-like environments, but this document will specifically cover the configuration and deployment of an Exoscale Cloud SKS environment.

Exoscale

The following guide will walk you through steps you can take to make sure your Galileo cluster is properly deployed and running well.

Introduction

Evaluate

Observe

Protect

Galileo Guardrail Metrics

Fine Tune

Galileo NLP Studio

Customize Chainpoll-powered Metrics

Changing the model or number of judges of a Chainpoll metric

Customizable Metrics

Models supported

Number of Judges supported

Introduction

Evaluate

Observe

Protect

Galileo Guardrail Metrics

Fine Tune

Galileo NLP Studio

​Changing the model or number of judges of a Chainpoll metric

​Customizable Metrics

​Models supported

​Number of Judges supported

Changing the model or number of judges of a Chainpoll metric

Customizable Metrics

Models supported

Number of Judges supported