What are Galileo Alerts?

After you complete a run, Galileo surfaces a summary of issues it has found in your dataset in the Alerts section. Each Alert represents a problematic pocket of data that Galileo has identified. Clicking on an alert will filter the dataset to this problematic subset of data and allow you to fix them.

Alerts will also educate you on why this subset of your data might be causing issues and tell you how you can fix this. You can think of Alerts as a partner Data Scientist working with you to find and fix your data.

Alerts that we support today

We support a growing list of alerts, and are open to feature requests! Some of the highlights include:

Likely MislabeledLeverages our Likely Mislabeled algorithm to surface the samples we believe were incorrectly labeled by your annotators
MisclassifiedSurfaces mismatches between your data and the model’s prediction
Hard For The ModelExposes the samples we believe we hard for your model to learn. These are samples with high Data Error Potential scores
Low Performing ClassesClasses that performed significantly worse than average (e.g. their F1 score was 1 std below the mean F1 score)
Low Performing MetadataSlices the data by different metadata values and shows any subsets of data that perform significantly worse than average
High Class Imbalance is Impacting PerformanceExposes classes that have a low relative class distribution in the training set and perform poorly in the validation/test set
High Class OverlapSurfaces classes our Class Overlap algorithm detected as being confused by one another by the model
Out Of CoverageSurfaces samples in your validation/test split that are fundamentally different from samples contained in your training set
PIIIdentifies any Personal Identifiable Information in your data
Non-Primary LanguageExposes samples that are not in the primary language of your dataset
Semantic Cluster with High DEPSurfaces semantic clusters of data found through our Clustering algorithm that have high Data Error Potential
High Uncertainty SamplesSurfaces samples that exist on the model’s decision boundary
[Inference Only] Data DriftThe data your model sees in this inference run has drifted from what it was trained on
[Named Entity Recognition Only] High Frequency Problematic WordShows you words that the models struggles with (i.e. have high Data Error Potential) more than 50% of the time
[Named Entity Recognition or Semantic Segmentation Only] False PositivesSpans or Segments predicted by the model for which the Ground Truth has no annotation
[Named Entity Recognition Only] False NegativesSurfaces spans for which the Ground Truth had an annotation but the model didn’t predict any
[Named Entity Recognition Only] Shifted SpansSurfaces spans where the beginning and end locations are not aligned in the Ground Truth and Prediction
[Object Detection Only] Background Confusion ErrorsSurfaces predictions that don’t overlap significantly with any Ground Truth
[Object Detection Only] Localization MistakesSurfaces detected objects that overlap poorly with their corresponding Ground Truth
[Object Detection Only] Missed PredictionsSurfaces annotations the model failed to make predictions for
[Object Detection Only] Misclassified PredictionsSurfaces objects that were assigned a different label than their associated Ground Truths
[Object Detection Only]Surfaces instances where multiple duplicate predictions were being made for the same object

How to request a new alert?

Have a great idea for a new alert? We’d love to hear about it! File any requests under your Profile Avatar Menu > “Bug Report or Feature Request”, and we’ll immediately get your request telescope