Definition: Measures how much the model is deciding randomly between multiple ways of continuing the output. Uncertainty is measured at both the token level and the response level. Higher uncertainty means the model is less certain.
Calculation: Uncertainty at the token level tells us how confident the model is of the next token given the preceding tokens. Uncertainty at the response level is simply the maximum token-level Uncertainty, over all the tokens in the model's response.
Usefulness: Our research has found high uncertainty scores correlate with hallucinations, made up facts, and citations. Looking at highly uncertain responses can flag areas where your model is struggling