Context Adherence Plus

Understand Galileo's Context Adherence Plus Metric

The metric is mainly intended for RAG workflows.

Definition: Measures whether your model's response was purely based on the context provided. A high Context Adherence score means your response is supported by the context provided.

Context Adherence is a measurement of closed-domain hallucinations: cases where your model said things that were not provided in the context.

If a response is adherent to the context (i.e. it has a value of 1 or close to 1), it only contains information given in the context. If a response is not adherent (i.e. it has a value of 0 or close to 0), it's likely to contain facts not included in the context provided to the model.

Calculation: Context Adherence Plus is computed by sending additional requests to your LLM, using a carefully engineered chain-of-thought prompt that asks the model to judge whether or not the response was grounded in the context. The metric requests multiple distinct responses to this prompt, each of which produces an explanation along with a final judgment: yes or no. The Context Adherence Plus score is the fraction of "yes" responses, divided by the total number of responses.

We also surface one of the generated explanations. The surfaced explanation is always chosen to align with the majority judgment among the responses. In other words, if the score is greater than 0.5, the explanation will provide an argument that the response is grounded; if the score is less than 0.5, the explanation will provide an argument that it is not grounded.

Usefulness: To fix low Context Adherence Plus values, we recommend (1) ensuring your context DB has all the necessary info to answer the question, and (2) adjusting the prompt to tell the model to stick to the information it's given in the context.

Deep dive: to read more about the research behind this metric, see RAG Quality Metrics using ChainPoll.

Note: This metric is computed by prompting an LLM multiple times, and thus requires additional LLM calls to compute.

Last updated