Integrate Evaluate into my existing application with Python
If you already have a prototype or an application you're looking to run experiments and evaluations over, Galileo Evaluate allows you to hook into it and log the inputs, outputs, and any intermediate steps to Galileo for further analysis.
In this QuickStart, we'll show you how to:
Integrate with your Langchain apps
Integrate with your custom chain apps
Let's dive in!
Langchain
Galileo supports the logging of chains from langchain. To log these chains, we require using the callback from our Python client promptquality.
Before creating a run, you'll want to make sure you have an evaluation set (a set of questions / sample inputs you want to run through your prototype for evaluation). Your evaluation set should be consistent across runs.
First, we are going to construct a simple RAG chain with Galileo's documentations stored in a vectorDB using Langchain:
from langchain.embeddings import OpenAIEmbeddingsfrom langchain.vectorstores import Chromafrom langchain.chat_models import ChatOpenAIfrom langchain.document_loaders import WebBaseLoaderfrom langchain.text_splitter import RecursiveCharacterTextSplitterfrom typing import Listfrom langchain.prompts import ChatPromptTemplatefrom langchain.schema import StrOutputParserfrom langchain.schema.runnable import RunnablePassthroughfrom langchain.schema.document import Document# Load text from webpageloader =WebBaseLoader("https://www.rungalileo.io/blog/deep-dive-into-llm-hallucinations-across-generative-tasks")data = loader.load()# Split text into documentstext_splitter =RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)splits = text_splitter.split_documents(data)# Add text to vector dbembedding =OpenAIEmbeddings()vectordb = Chroma.from_documents(documents=splits, embedding=embedding)# Create a retrieverretriever = vectordb.as_retriever()defformat_docs(docs: List[Document]) ->str:return"\n\n".join([d.page_content for d in docs])template ="""Answer the question based only on the following context:{context} Question: {question} """prompt = ChatPromptTemplate.from_template(template)model =ChatOpenAI()chain ={"context": retriever | format_docs,"question":RunnablePassthrough()}| prompt | model |StrOutputParser()
Next, you can log in with Galileo:
import promptquality as pqpq.login({YOUR_GALILEO_URL})
After that, you can set up the GalileoPromptCallback:
from promptquality import Scorersscorers = [Scorers.context_adherence_basic, Scorers.completeness_basic, Scorers.pii, ...]#This is the list of metrics you want to evaluate your run over.galileo_handler = pq.GalileoPromptCallback( project_name="quickstart_project", scorers=scorers,)#Each "run" will appear under this project. Choose a name that'll help you identify what you're evaluating
Finally, you can run the chain experiments across multiple intputs with Galileo Callback:
inputs = ["What are hallucinations?","What are intrinsic hallucinations?","What are extrinsic hallucinations?"]chain.batch(inputs, config=dict(callbacks=[galileo_handler]))# publish the results of your rungalileo_handler.finish()
For more detailed information on Galileo's Langchain integration, check out instructions here.
Custom Chains
If you're not using an orchestration library, or using one other than Langchain, we also provide a similar interface for uploading your executions that do not use a callback mechanism.
import promptquality as pqfrom promptquality import NodeType, NodeRowimport uuiddefmy_llm_app(input):return<generated_response>eval_set = ["What are hallucinations?","What are intrinsic hallucinations?","What are extrinsic hallucinations?"] rows = []for data_point in eval_set: chain_id = uuid.uuid4() rows.append( NodeRow.for_llm( id=chain_id, root_id=chain_id, step=0, prompt=data_point, response=my_llm_app(data_point) ) )pq.login({YOUR_GALILEO_URL})pq.chain_run(rows, project_name="my_first_project")
Please check out this page here for more information on logging experiments with custom Python logger.
Running multiple experiments in one go
If you want to run multiple experiments in one go (e.g. use different templates, experiment with different retriever params, etc.), check out Chain Sweeps.
Next, run your chain over your Evaluation set and log the results to Galileo.