Galileo
Search
⌃K

Using Prompt with RAG or Vector Databases

If you are incorporating Retrieval-Augmented Generation (RAG) or utilizing vector databases within your application, Galileo offers a comprehensive suite of metrics and experimentation tools to help you optimize your prompt, model, and vector DB configuration.

Metrics

Context Adherence
Context Adherence (fka Groundedness) measures whether your model's response was purely based on the context provided, i.e. the response didn't state any facts not contained in the context provided. For RAG users, Context Adherence is a measurement of hallucinations.
If a response is grounded in the context (i.e. it has a value of 1 or close to 1), it only contains information given in the context. If a response is not grounded (i.e. it has a value of 0 or close to 0), it's likely to contain facts not included in the context provided to the model.
To fix low Context Adherence values, we recommend (1) ensuring your context DB has all the necessary info to answer the question, and (2) adjusting the prompt to tell the model to stick to the information it's given in the context.
Note: This metric is computed by prompting an LLM multiple times, and thus requires additional LLM calls to compute.
Context Relevance
Context Relevance measures how relevant (or similar) the context provided was to the user query. This metric requires {context} and {query} slots in your data, as well as embeddings for them (i.e. {context_embedding}, {query_embedding}.
Context Relevance is a relative metric. High Context Relevance values indicate significant similarity or relevance. Low Context Relevance values can be a sign that you need to augment your knowledge base or vector DB with additional documents, modify your retrieval strategy or use better embeddings.
Non-RAG specific Metrics
Other metrics such as Uncertainty and Correctness might be useful as well. If these don't cover all your needs, you can always write custom metrics.

Code Walkthrough

To integrate Galileo Prompt into your setup, you'll first need to install our Python client, with the PyArrow extensions: pip install promptquality[arrow].
Galileo works with all major vector database providers. We'll show you how this can work with Pinecone for illustrative purposes.
Before we begin, here's how you could set up a simple pinecone vector db and write a retriever function that fetches context from it.
import pinecone
from sentence_transformers import SentenceTransformer
​
## --- Setup ---
# Init your Pinecone Index
pinecone.init(api_key="YOUR_KEY", environment="YOUR_ENV")
index = pinecone.Index("YOUR_INDEX_NAME)
​
# Query code
retriever = SentenceTransformer('multi-qa-MiniLM-L6-cos-v1')
def get_context(question):
xq = retriever.encode([question]).tolist()
xc = index.query(xq, top_k=1, include_metadata=True, include_values=True)
# Extract the context passage from the pinecone search result
contexts = [x["metadata"]['context'] for x in xc["matches"]]
embeddings = [x["values"] for x in xc["matches"]]
return contexts[0], embeddings[0], xq[0]
First, import prompquality and log into your Galileo instance:
import promptquality as pq
​
pq.login("your.galileo.console.url.com")
Next, define your dataset and augment it with the context and vector embeddings fetched from your pinecone DB.
import pandas as pd
​
# my_queries.csv contains containing a list of 'query' items
# Each 'query' represents a user query we'd like the model
# to answer using context from a vector DB.
# It could also contain a 'target' column for BLEU and ROUGE computation.
df = pd.read_csv('my_queries.csv')
​
contexts, context_embeddings, query_embeddings = [], [], []
for query in df["query"].to_list():
context, context_embedding, query_embedding = get_context(query)
contexts.append(context)
context_embeddings.append(context_embedding)
query_embeddings.append(query_embedding)
​
df["context"] = contexts
df["context_embedding"] = context_embeddings
df["query_embedding"] = query_embeddings
Next, choose the metrics you'd like to run and create any of your own.
from promptquality import Scorers
​
metrics = [
Scorers.groundedness,
Scorers.context_relevance,
# Uncertainty, BLEU, and ROUGE are automatically included
]
Then, define your template:
template = """You are a helpful assistant. Given the following context, please answer the question. Provide an accurate and factual answer.
Context: {context}
Question: {query}
Your answer: """
Finally, use pq.run to run your prompts and evaluation metrics.
pq.run(project_name='rag_demo_9_12',
template=template,
dataset=df.to_dict(orient="records"),
scorers=metrics,
settings=pq.Settings(model_alias=SupportedModels.chat_gpt_16k))
Once your runs complete, you can view your results on the Galileo console and/or using pq.get_metrics() and pq.get_rows()

Experimenting with your retrieval logic

If you're experimenting with your retrieval logic and would like to use Galileo to evaluate the results, you can pass any parameters or notes that would help you identify your experiment run as additional metadata fields in your dataset.
This metadata will appear as additional fields in your Prompt Runs table and allow you to compare your model's performance across experiments.