If you already have a prototype or an application you’re looking to run experiments and evaluations over, Galileo Evaluate allows you to hook into it and log the inputs, outputs, and any intermediate steps to Galileo for further analysis.

In this QuickStart, we’ll show you how to:

  • Integrate with your workflows

  • Integrate with your Langchain apps

Let’s dive in!

Logging Workflows

If you’re looking to log your workflows, we provide an interface for uploading your executions.

import promptquality as pq

pq.login()
from promptquality import EvaluateRun

metrics = [pq.Scorers.context_adherence_plus, pq.Scorers.prompt_injection]

evaluate_run = EvaluateRun(run_name="my_run", project_name="my_project", scorers=metrics)
# Define your inputs.
eval_set = [
    "What are hallucinations?",
    "What are intrinsic hallucinations?",
    "What are extrinsic hallucinations?"
]
# Define your run.
evaluate_run = EvaluateRun(run_name="my_run", project_name="my_project", scorers=metrics)
# Run the evaluation set on your app and log the results.
for input in eval_set:
    output = llm.call(input) # Pseudo-code, replace with your LLM call.
    evaluate_run.add_single_step_workflow(input=input, output=output, model=<my_model_name>)

Finally, log your Evaluate run to Galileo:

evaluate_run.finish()

Please check out this page here for more information on logging experiments with our Python logger.

Langchain

Galileo supports the logging of chains from langchain. To log these chains, we require using the callback from our Python client promptquality.

Before creating a run, you’ll want to make sure you have an evaluation set (a set of questions / sample inputs you want to run through your prototype for evaluation). Your evaluation set should be consistent across runs.

First, we are going to construct a simple RAG chain with Galileo’s documentations stored in a vectorDB using Langchain:

from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from typing import List
from langchain.prompts import ChatPromptTemplate
from langchain.schema import StrOutputParser
from langchain.schema.runnable import RunnablePassthrough
from langchain.schema.document import Document

# Load text from webpage
loader = WebBaseLoader("https://www.rungalileo.io/blog/deep-dive-into-llm-hallucinations-across-generative-tasks")
data = loader.load()

# Split text into documents
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
splits = text_splitter.split_documents(data)

# Add text to vector db
embedding = OpenAIEmbeddings()
vectordb = Chroma.from_documents(documents=splits, embedding=embedding)

# Create a retriever
retriever = vectordb.as_retriever()

def format_docs(docs: List[Document]) -> str:
    return "\n\n".join([d.page_content for d in docs])

template = """Answer the question based only on the following context:

    {context}

    Question: {question}
    """
prompt = ChatPromptTemplate.from_template(template)
model = ChatOpenAI()

chain = {"context": retriever | format_docs, "question": RunnablePassthrough()} | prompt | model | StrOutputParser()

Next, you can log in with Galileo:

import promptquality as pq
pq.login({YOUR_GALILEO_URL})

After that, you can set up the GalileoPromptCallback:

from promptquality import Scorers
scorers = [Scorers.context_adherence_basic,
           Scorers.completeness_basic,
           Scorers.pii,
           ...]
#This is the list of metrics you want to evaluate your run over.

galileo_handler = pq.GalileoPromptCallback(
    project_name="quickstart_project", scorers=scorers,
)
#Each "run" will appear under this project. Choose a name that'll help you identify what you're evaluating

Finally, you can run the chain experiments across multiple intputs with Galileo Callback:

inputs = [
    "What are hallucinations?",
    "What are intrinsic hallucinations?",
    "What are extrinsic hallucinations?"
]
chain.batch(inputs, config=dict(callbacks=[galileo_handler]))

# publish the results of your run
galileo_handler.finish()

For more detailed information on Galileo’s Langchain integration, check out instructions here.