ChainPoll is a powerful, flexible technique for LLM-based evaluation built by Galileo’s Research team. It is used to power multiple Guardrail Metrics across the Galileo platform:

  • Context Adherence Plus

  • Chunk Attribution & Utilization

  • Completeness Plus

  • Correctness

Chainpoll leverages a chain-of-thought prompting technique and prompting an LLM multiple times to calculate metric values. There are two levers one can customize for a Chainpoll metric:

  • The model that gets queried

  • The number of times we prompt that model

Generally, better models will provide more accurate metric values, and a higher number of judges will increase the accuracy and stability of metric values. We’ve configured our Chainpoll-powered metrics to balance the trade-off of Cost and Accuracy.

Changing the model or number of judges of a Chainpoll metric

We allow customizing execution parameters for the AI-powered metrics from our Guardrail Store. By default, these metrics use gpt-3.5-turbo for the model and 3 judges. To customize this, when creating your run you can customize these metrics as:


pq.run(..., scorers=[
            pq.CustomizedChainPollScorer(
                        scorer_name=pq.CustomizedScorerName.context_adherence_plus,
                        model_alias=pq.Models.gpt_4o_mini,
                        num_judges=7)
        ]
      )

The metrics that can be customized are:

  1. Chunk attribution + utilization: pq.CustomizedScorerName.chunk_attribution_utilization_plus

  2. Completeness: pq.CustomizedScorerName.completeness_plus

  3. Context Adherence: pq.CustomizedScorerName.context_adherence_plus

  4. Correctness: pq.CustomizedScorerName.correctness

These can be used alongside any OpenAI or Azure models that use the Chat Completions API and set to use anywhere between 1 and 10 judges.