Integrate Evaluate into my existing application with Python

If you already have a prototype or an application you're looking to run experiments and evaluations over, Galileo Evaluate allows you to hook into it and log the inputs, outputs, and any intermediate steps to Galileo for further analysis.

In this QuickStart, we'll show you how to:

  • Integrate with your Langchain apps

  • Integrate with your custom chain apps

Let's dive in!

Langchain

Galileo supports the logging of chains from langchain. To log these chains, we require using the callback from our Python client promptquality.

Before creating a run, you'll want to make sure you have an evaluation set (a set of questions / sample inputs you want to run through your prototype for evaluation). Your evaluation set should be consistent across runs.

First, we are going to construct a simple RAG chain with Galileo's documentations stored in a vectorDB using Langchain:

from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from typing import List
from langchain.prompts import ChatPromptTemplate
from langchain.schema import StrOutputParser
from langchain.schema.runnable import RunnablePassthrough
from langchain.schema.document import Document

# Load text from webpage
loader = WebBaseLoader("https://www.rungalileo.io/blog/deep-dive-into-llm-hallucinations-across-generative-tasks")
data = loader.load()

# Split text into documents
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
splits = text_splitter.split_documents(data)

# Add text to vector db
embedding = OpenAIEmbeddings()
vectordb = Chroma.from_documents(documents=splits, embedding=embedding)

# Create a retriever
retriever = vectordb.as_retriever()

def format_docs(docs: List[Document]) -> str:
    return "\n\n".join([d.page_content for d in docs])
    
template = """Answer the question based only on the following context:

    {context}

    Question: {question}
    """
prompt = ChatPromptTemplate.from_template(template)
model = ChatOpenAI()

chain = {"context": retriever | format_docs, "question": RunnablePassthrough()} | prompt | model | StrOutputParser()

Next, you can log in with Galileo:

import promptquality as pq
pq.login({YOUR_GALILEO_URL})

After that, you can set up the GalileoPromptCallback:

from promptquality import Scorers
scorers = [Scorers.context_adherence_basic, 
           Scorers.completeness_basic, 
           Scorers.pii,
           ...]
#This is the list of metrics you want to evaluate your run over.

galileo_handler = pq.GalileoPromptCallback(
    project_name="quickstart_project", scorers=scorers,
)
#Each "run" will appear under this project. Choose a name that'll help you identify what you're evaluating

Finally, you can run the chain experiments across multiple intputs with Galileo Callback:

inputs = [
    "What are hallucinations?",
    "What are intrinsic hallucinations?",
    "What are extrinsic hallucinations?"
]
chain.batch(inputs, config=dict(callbacks=[galileo_handler]))

# publish the results of your run
galileo_handler.finish()

For more detailed information on Galileo's Langchain integration, check out instructions here.

Custom Chains

If you're not using an orchestration library, or using one other than Langchain, we also provide a similar interface for uploading your executions that do not use a callback mechanism.

import promptquality as pq
from promptquality import NodeType, NodeRow
import uuid


def my_llm_app(input):
   return <generated_response>
   
eval_set = [
    "What are hallucinations?",
    "What are intrinsic hallucinations?",
    "What are extrinsic hallucinations?"
]   
   
rows = []
for data_point in eval_set:
    chain_id = uuid.uuid4()
    rows.append(
        NodeRow.for_llm(
            id=chain_id,
            root_id=chain_id,
            step=0,
            prompt=data_point,
            response=my_llm_app(data_point)
        )
    )

pq.login({YOUR_GALILEO_URL})
pq.chain_run(rows, project_name="my_first_project")

Please check out this page here for more information on logging experiments with custom Python logger.

Running multiple experiments in one go

If you want to run multiple experiments in one go (e.g. use different templates, experiment with different retriever params, etc.), check out Chain Sweeps.

Next, run your chain over your Evaluation set and log the results to Galileo.

Last updated