What are Galileo Alerts?
Galileo Alerts are your starting point in your data inspection journey. After you complete a run, Galileo surfaces a summary of issues it has found in your dataset in the Alerts section. Each Alert represents a problematic pocket of data that Galileo has identified. Clicking on an alert will filter the dataset to this problematic subset of data and allow you to fix them.
Alerts will also educate you on why this subset of your data might be causing issues and tell you how you can fix this. You can think of Alerts as a partner Data Scientist working with you to find and fix your data.

Alerts that we support today

We support a growing list of alerts, and are open to feature requests! Some of the highlights include:
Likely Mislabeled
Leverages our Likely Mislabeled algorithm to surface the samples we believe were incorrectly labeled by your annotators
Surfaces mismatches between your data and the model's prediction
Hard For The Model
Exposes the samples we believe we hard for your model to learn. These are samples with high Data Error Potential scores
Low Performing Classes
Classes that performed significantly worse than average (e.g. their F1 score was 1 std below the mean F1 score)
Low Performing Metadata
Slices the data by different metadata values and shows any subsets of data that perform significantly worse than average
High Class Imbalance is Impacting Performance
Exposes classes that have a low relative class distribution in the training set and perform poorly in the validation/test set
High Class Overlap
Surfaces classes our Class Overlap algorithm detected as being confused by one another by the model
Out Of Coverage
Surfaces samples in your validation/test split that are fundamentally different from samples contained in your training set
Identifies any Personal Identifiable Information in your data
Non-Primary Language
Exposes samples that are not in the primary language of your dataset
Semantic Cluster with High DEP
Surfaces semantic clusters of data found through our Clustering algorithm that have high Data Error Potential​
High Uncertainty Samples
Surfaces samples that exist on the model's decision boundary
[Inference Only] Data Drift
The data your model sees in this inference run has drifted from what it was trained on
[Named Entity Recognition Only] High Frequency Problematic Word
Shows you words that the models struggles with (i.e. have high Data Error Potential) more than 50% of the time
[Named Entity Recognition or Semantic Segmentation Only] False Positives
Spans or Segments predicted by the model for which the Ground Truth has no annotation
[Named Entity Recognition Only] False Negatives
Surfaces spans for which the Ground Truth had an annotation but the model didn't predict any
[Named Entity Recognition Only] Shifted Spans
Surfaces spans where the beginning and end locations are not aligned in the Ground Truth and Prediction
[Object Detection Only] Background Confusion Errors
Surfaces predictions that don’t overlap significantly with any Ground Truth
[Object Detection Only] Localization Mistakes
Surfaces detected objects that overlap poorly with their corresponding Ground Truth
[Object Detection Only] Missed Predictions
Surfaces annotations the model failed to make predictions for
[Object Detection Only] Misclassified Predictions
Surfaces objects that were assigned a different label than their associated Ground Truths
[Object Detection Only]
Surfaces instances where multiple duplicate predictions were being made for the same object

How to request a new alert?

Have a great idea for a new alert? We'd love to hear about it! File any requests under your Profile Avatar Menu > "Bug Report or Feature Request", and we'll immediately get your request