Using Galileo for Image Classification you can improve your classification models by improving the quality of your training data.
Once errors are identified, Galileo allows you to take action in-tool or helps you take these erroneous samples to your labeling tool or python environments. Fixing erroneous training data consistently leads to significant improvements in your model quality in production.
What is image classification?
Image classification is the task of assigning a label or class to an entire image. Images are expected to have only one class for each image. Here's an example dataset with the model's predictions along with a Ground Truth for each label. This also gives us a glimpse into how Galileo surfaces annotation errors in your labeled dataset!
How to use Galileo for Image Classification?
Console Walkthrough for Image Classification
Upon completing a run, you'll be taken to the Galileo Console. The first thing you'll notice is your image dataset on the right. On each image, we show you the Ground Truth and Prediction labels, and the Data Error Potential of the image. By default, your images are sorted by Data Error Potential.
You can also view your samples in the embeddings space of the model. This can help you get a semantic understanding of your dataset. Using features like Color-By DEP, you might discover pockets of problematic data (e.g. decision boundaries that might benefit from more samples or a cluster of garbage images).
Your left pane is called the Insights Menu. On the top you can see your dataset size and choose the metric you want to guide your exploration by (Accuracy by default). Size and metric update as you add filters to your dataset.
Clicking on an Alert will filter the dataset to the subset of data that corresponds to the Alert.
Under metrics, you'll find different charts, such as:
- Accuracy by Class
- Sample Count by Class
- Overlapping Classes
- Top Misclassified Pairs
- DEP Distribution
These charts are dynamic and update as you add different filters. They're also interactive - clicking on a class or group of classes will filter the dataset accordingly, allowing you to inspect and fix the samples.
Once you've identified a problematic subset of data, Galileo allows you to fix your samples with the goal of improving your Accuracy or performance metric of choice. In Image Classification runs, we allow you to:
- Change Label - Re-assign the label of your image right in-tool
- Remove - Remove problematic images you want to discard from your dataset
- Export - Download your samples so you can fix them elsewhere
Your changes are tracked in your Edits Cart. There you can view a summary of the changes you've made, you can undo them, or download a clean and fixed dataset to retrain your model.
Your dataset splits are maintained on Galileo. Your data is logged as Training, Test and/or Validation split. Galileo allows you to explore each split independently. Some alerts, such as Underfitting Classes or Overfitting Classes look at cross-split performance. However, for the most part, each split is treated independently.
To switch splits, find the Splits dropdown next to your project and run name near the top of the screen. By default, the Training split is shown first.
The above notebooks show a variety of combinations to train an image classification model. In particular we show various ways of creating a dataset:
- from data in HugggingFace
- from local/remote images using a DataFrame that keeps track of paths + labels
- from local/remote images (GC or S3) using ImageFolder
We also show how to use various models such as standard CNNs like ResNet or Visual Transformers.
Looking to monitor your Image Classification model in production?
Once your model is in production, you can use Galileo's Model Monitoring platform to catch regressions in model quality, data or model drift, and find new real world high-value samples to add to your training data.