#banner {
  background-color: #fef1d7;

  p {
    color: #383645;
  }

  a {
    color: #383645;
    border-bottom: 1px solid #383645;
  }

  button {
    color: #9b98ae;
  }
}


// REO script
!(function () {
  var e, t, n;
  (e = "638190bf025179e"),
    (t = function () {
      Reo.init({ clientID: "638190bf025179e" });
    }),
    ((n = document.createElement("script")).src = "https://static.reo.dev/" + e + "/reo.js"),
    (n.async = !0),
    (n.onload = t),
    document.head.appendChild(n);
})();

// Hubspot script
const script = document.createElement("script");
script.type = "text/javascript";
script.id = "hs-script-loader";
script.async = true;
script.defer = true;
script.src = "//js.hs-scripts.com/23114811.js";
document.head.appendChild(script);

// RB2B script.
!(function () {
  var reb2b = (window.reb2b = window.reb2b || []);
  if (reb2b.invoked) return;
  reb2b.invoked = true;
  reb2b.methods = ["identify", "collect"];
  reb2b.factory = function (method) {
    return function () {
      var args = Array.prototype.slice.call(arguments);
      args.unshift(method);
      reb2b.push(args);
      return reb2b;
    };
  };
  for (var i = 0; i < reb2b.methods.length; i++) {
    var key = reb2b.methods[i];
    reb2b[key] = reb2b.factory(key);
  }
  reb2b.load = function (key) {
    var script = document.createElement("script");
    script.type = "text/javascript";
    script.async = true;
    script.src = "https://s3-us-west-2.amazonaws.com/b2bjsstore/b/" + key + "/8XOE9GH5EDOM.js.gz";
    var first = document.getElementsByTagName("script")[0];
    first.parentNode.insertBefore(script, first);
  };
  reb2b.SNIPPET_VERSION = "1.0.1";
  reb2b.load("8XOE9GH5EDOM");
})();


Metrics

Modules

Want to try Galileo? Get in touch with us here!

Evaluate, Observe, and Protect your GenAI applications

What is Galileo?

ℹ️ These docs are for current Galileo customers. Docs for the free version of Galileo, can be found [here](https://v2docs.galileo.ai/).

Galileo

Stop experimenting in spreadsheets and notebooks. Use Evaluate’s powerful insights to build GenAI systems that just work.

Overview of Galileo Evaluate

Monitor and analyze generative AI models with Galileo Observe, using real-time data insights to maintain performance and ensure quality outputs.

Overview of Galileo Observe

How to monitor your apps with Galileo Observe

Getting Started | Galileo Observe

Explore Galileo Protect to safeguard AI applications with customizable rulesets, error detection, and robust metrics for enhanced AI governance.

Overview of Galileo Protect

Get started with Galileo Protect using this quickstart guide, covering setup, ruleset creation, and integration into AI workflows for secure operations.

Quickstart Guide | Galileo Protect

Utilize Galileo's Guardrail Metrics to monitor generative AI models, ensuring adherence to quality, correctness, and alignment with project goals.

Overview of Galileo Guardrail Metrics

Fine-tune large language models with Galileo's LLM Fine-Tune tools, enabling precise adjustments for optimized AI performance and output quality.

Overview of Galileo LLM Fine-Tune

file

Galileo NLP Studio supports Natural Language Processing Tasks across the life-cycle of your model development.

Training High-Quality Supervised NLP Models | Galileo

You have questions, we have (some) answers!

FAQs

Explore Galileo's client references, including Python and TypeScript integrations, to streamline Evaluate, Observe, and Protect module implementations.

Client References

Api reference

Healthcheck

Get Token

Create a new Evaluate run with workflows.

Use this endpoint to create a new Evaluate run with workflows. The request body should contain the `workflows` to be ingested and evaluated.

Additionally, specify the `project_id` or `project_name` to which the workflows should be ingested. If the project does not exist, it will be created. If the project exists, the workflows will be logged to it. If both `project_id` and `project_name` are provided, `project_id` will take precedence. The `run_name` is optional and will be auto-generated (timestamp-based) if not provided.

The body is also expected to include the configuration for the scorers to be used in the evaluation. This configuration will be used to evaluate the workflows and generate the results.

Create a new Evaluate Run

Fetch evaluation results for a specific run including rows and aggregate information.

Get Evaluate Run Results

Log workflows to an Observe project.

Use this endpoint to log workflows to an Observe project. The request body should contain the
`workflows` to be ingested.

Additionally, specify the `project_id` or `project_name` to which the workflows should be ingested.
If the project does not exist, it will be created. If the project exists, the workflows will be logged to it.
If both `project_id` and `project_name` are provided, `project_id` will take precedence.

Log Workflows to an Observe Project

Get workflows for a specific run in an Observe project.

Get Workflows

Learn how to use the 'Invoke Protect' API endpoint in Galileo's Protect module to process payloads with specified rulesets effectively.

Invoke Protect

Log Traces

Traces Available Columns

Spans Available Columns

Query Traces

Query Spans

Get Trace

Get Span

List Log Streams

Create Log Stream

Get Log Stream

Update Log Stream

Delete Log Stream

List Experiments

Create Experiment

Get Experiment

Update Experiment

Delete Experiment

Procures the column information for experiments.

Experiments Available Columns

List Feedback Templates

Create Feedback Template V2

Get Feedback Template

Delete Feedback Template

Update Feedback Template

Reorder Feedback Templates

Get Feedback Rating V2

Create Feedback Rating V2

Delete Feedback Rating V2

Apply Bulk Feedback V2

When a Protect execution completes with the status specified in the configuration, the webhook specified is
triggered with this payload.

Protect notification

Explore Galileo's practical examples covering real-world use cases and workflows for Evaluate, Observe, and Protect modules across AI projects.

Examples

Galileo AI Research

This page provides a brief overview of the research behind Galileo's RAG Quality Metrics.

Rag Quality Metrics Using Luna

ChainPoll is a powerful, flexible technique for LLM-based evaluation that is unique to Galileo. It is used to power multiple metrics across the Galileo platform.

Chainpoll

Learn how ChainPoll metrics assess retrieval-augmented generation (RAG) system quality, improving accuracy and performance of generative AI models.

Rag Quality Metrics Using Chainpoll

Learn about Galileo's Data Error Potential (DEP) score, a metric to identify and categorize machine learning data errors, enhancing data quality and model performance.

Galileo Data Error Potential  (Dep) 

Discover Galileo's data drift detection methods to monitor AI model performance, identify data changes, and maintain model reliability in production.

Data Drift Detection

Likely Mislabeled

Detecting samples on the decision boundary

Class Boundary Detection

This page describes the rich error types offered by Galileo for Object Detection

Errors In Object Detection

Gain an overview of Galileo deployment options, covering supported platforms like Amazon EKS and Google GKE, setup requirements, and best practices.

Enterprise Deployment

Before deploying Galileo, ensure the following prerequisites are met.

Pre Requisites

Understand Galileo deployment prerequisites and dependencies to ensure a smooth installation and integration across supported platforms.

Dependencies

Learn how to onboard new users in Galileo deployments with detailed instructions on user roles, access control, and permissions management.

Setting Up New Users

The following guide will walk you through steps you can take to make sure your Galileo cluster is properly deployed and running well.

Post Deployment Checklist

This page covers our SSO Integration support with information we need to setup SSO for your Galileo cluster.

SSO Integration

This page covers networking, security and access control provisions that Galileo deployments enable

  Security &  Access Control

This page covers concerns regarding residency of data and compliances provided by Galileo.

Data Privacy And Compliance

Start using Galileo Evaluate with this quickstart guide, covering prompt engineering, AI evaluation, and integrating tools into existing workflows.

Quickstart Guide | Galileo Evaluate

Learn how to integrate Galileo Evaluate into your Python applications, featuring step-by-step guidance and code samples for streamlined integration.

Integrate Evaluate Into My Existing Application With Python

Explore UI-driven prompt engineering in Galileo Evaluate to create, test, and refine prompts with intuitive interfaces and robust evaluation tools.

Prompt Engineering From A UI

Follow step-by-step instructions in Galileo Evaluate to assess generative AI models, configure metrics, and analyze performance effectively.

How-To Guide | Galileo Evaluate

Before starting your experiments, we recommend creating an evaluation set.

Create an Evaluation Set

How to use Galileo Evaluate for prompt engineering

Evaluate and Optimize Prompts

How to use Galileo Evaluate with RAG applications

Evaluate and Optimize RAG Applications

Select and understand guardrail metrics in Galileo Evaluate to effectively assess your prompts and models, utilizing both industry-standard and proprietary metrics.

Choose your Guardrail Metrics

Learn how to turn on metrics when creating runs in your Python environment.

Enabling Scorers in Runs

How to use Galileo Evaluate to find Hallucinations

Identify Hallucinations

Galileo GenAI Studio supports Custom Metrics (programmatic or GPT-based) for all your Evaluate and Observe projects. Depending on where, when, and how you want these metrics to be executed, you have the option to choose between **Custom Scorers** and **Registered Scorers**.

Register Custom Metrics

Manage and store your AI prompts efficiently in Galileo Evaluate, with tools for organizing, versioning, and analyzing prompt performance at scale.

Prompt Management-Storage

Easily compare multiple LLM runs in a single screen for better decision making

A/B Compare Prompts

Experiment with multiple prompts in Galileo Evaluate to optimize generative AI performance using iterative testing and comprehensive analysis tools.

Experiment with Multiple Prompts

If you're building a multi-step workflow or chain (e.g. a RAG system, an Agent, or a chain) and want to experiment with multiple combinations of parameters or your versions at once, Chain Sweeps are your friend.

Experiment with Multiple Workflows

Galileo allows you to do qualitative human evaluations of your prompts and responses.

Evaluate with Human Feedback

While you are experimenting with your prompts you will probably be tuning many parameters.

Add Tags and Metadata to Prompt Runs

If you already have a dataset of requests and application responses, and you want to log and evaluate these on Galileo without re-generating the responses, you can do so via our worflows.

Log Pre-generated Responses in Python

Evaluate and Optimize Agents

If you want to fetch your logged data and metrics programmatically, you can do so via our Python clients.

Programmatically fetch logged data

Galileo Evaluate is geared for cross-functional collaboration. Most of the teams using Galileo consist of a mix of the following personas

Collaborate with other personas

All projects on Galileo can be shared with others to enable collaboration.

Share a project

To download the results of your evaluation you can use the Export function. To export your runs, simply click on _Export Prompt Data._

Export your Evaluation Runs

Gain insights into your metric values in Galileo Evaluate with explainability features, including token-level highlighting and generated explanations for better analysis.

Understanding Metric Values | Galileo Evaluate How-To

Expected outputs are a key element for evaluating LLM applications. They provide benchmarks to measure model accuracy, identify errors, and ensure consistent assessments.

Logging and Comparing against your Expected Answers

Improve metric accuracy by customizing your Chainpoll-powered metrics

Customize Chainpoll-powered Metrics

Manage user permissions and securely share projects in Galileo Evaluate using detailed access control features, including system roles and group management.

Access Control Guide | Galileo Evaluate

Learn how to use Automatic Run Ranking to find the best run

Finding the best run

Using Datasets

Learn how to use Galileo's Autogen feature to generate LLM-as-a-judge metrics.

Auto-generating an LLM-as-a-judge

Learn how to customize your LLM-powered metrics with Continuous Learning via Human Feedback.

Customizing your LLM-powered metrics via CLHF

Discover Galileo Evaluate's integrations with AI tools and platforms, enabling seamless connectivity and enhanced generative AI evaluation workflows.

Integrations | Galileo Evaluate

Understand project concepts in Galileo Evaluate, including organization of datasets, metrics, and workflows for AI evaluation.

Project Concepts | Galileo Evaluate

Runs in Galileo are experiments or iterations done within a [project](/galileo/gen-ai-studio-products/galileo-evaluate/concepts/project).

Leverage templates in Galileo Evaluate to standardize metrics, model assessments, and workflows for efficient generative AI evaluation.

Template

Metrics are quantitative or qualitative ways to express insights about the [run](/galileo/gen-ai-studio-products/galileo-evaluate/concepts/run).

Learn how human ratings in Galileo Evaluate enable accurate model evaluations and improve performance through qualitative feedback.

Human Ratings

Find solutions to common errors in computing metrics within Galileo Evaluate, including missing integrations and rate limit issues, to streamline your AI evaluations.

Error Computing Metrics | Galileo Evaluate FAQ

Understand the distinctions between Context Adherence and Instruction Adherence metrics in Galileo Evaluate to assess generative AI outputs accurately.

Context vs. Instruction Adherence | Galileo Evaluate FAQ

Learn how to use Galileo Observe to monitor and analyze generative AI models, including setup instructions, data logging, and workflow integrations.

How-To Guide | Galileo Observe

Select and understand guardrail metrics in Galileo Observe to effectively evaluate your LLM applications, utilizing both industry-standard and proprietary metrics.

Choosing Your Guardrail Metrics

Registered Metrics enable the ability for your team to define the custom metrics (programmatic or GPT-based) for your Observe projects.

Registering And Using Custom Metrics

How to set up Alerts and automatically be alerted when things go wrong

Setting Up Alerts

Once your monitored LLM app is up and running and you've selected your Guardrail Metrics, you can start monitoring your LLM app using Galileo.

Identifying And Debugging Issues

Galileo Observe allows you to monitor your Retrieval-Augmented Generation (RAG) application with out-of-the-box Tracing and Analytics.

Monitoring Your Rag Application

Learn how to manually log your data via our Python Logger

Logging Data Via Python

Fetch logged data programmatically in Galileo Observe with step-by-step instructions for seamless integration into automated workflows and analysis tools.

Programmatically Fetching Logged Data

To download your Observe Data you can use the Export function.

Exporting Your Data

Gain insights into your metric values in Galileo Observe with explainability features, including token-level highlighting and generated explanations for better analysis.

Introduction

Evaluate

Observe

Protect

Galileo Guardrail Metrics

Fine Tune

Galileo NLP Studio

What is Galileo?

Metrics

Modules

Evaluate

Observe

Protect

Want to try Galileo? Get in touch with us here!

Introduction

Evaluate

Observe

Protect

Galileo Guardrail Metrics

Fine Tune

Galileo NLP Studio

​Metrics

​Modules

Evaluate

Observe

Protect

​Want to try Galileo? Get in touch with us here!

Metrics

Modules

Want to try Galileo? Get in touch with us here!