Galileo
Search
K

Fast.ai

Fast.ai dataquality callback for finding mislabeled data
The dataquality package provides a convenient way to log your dataset and model outputs to the Galileo platform for improved data quality monitoring. This documentation outlines how to use the FastAiDQCallback to log your PyTorch model outputs in fastai.
Follow the steps in the given order below.

Step 1: Logging Data Inputs

To log your computer vision (CV) dataset to Galileo, use the log_image_dataset function provided by dataquality. For computer vision (CV) datasets, you must upload the data to S3 or GCP and log it before loading it into fastai's dataloader. Additionally, use drop_last=False in the dataloader to ensure that you train on all of your data.
Here is an example of how to log your CV datasets to Galileo:
PyTorch - CV datasets
import dataquality as dq
dq.init(task_type="image_classification",
project_name="sample_torch_image_classification",
run_name="sample_run_0")
# Log the class labels in the order they are outputted by the model
# Make sure they are sorted alphabetically
labels_list = ["airplane", "cat", "dog", "horse", "zebra"]
dq.set_labels_for_run(list_of_labels)
# 🔭🌕 Log your pandas/huggingface/torch datasets to Galileo
dq.log_image_dataset(train_dataset, imgs_location_colname="s3path",
split="train", label="label")
dq.log_image_dataset(test_dataset, imgs_location_colname="s3path",
split="validation", label="label")
test_dataset["val"] = True
df = test_dataset.append(train_dataset)
dls = ImageDataLoaders.from_df(
df, path=".",
valid_col="val",
drop_last = False # 🔭🌕 make sure not drop data
)

Step 2: Logging the Model Outputs

Log model outputs from your PyTorch model's forward function.
Your model must be defined in the torch model-subclass-style and be executing eagerly.
To log your model outputs to Galileo, import the FastAiDQCallback from dataquality.integrations.fastai and pass in the classifier layer to the layer parameter. The classifier layer most of the time is the last layer which has the last hidden state or embeddings as an input and the classification logits as an output. Here is an example:
fastai logging via callback
# Import our integration for pytorch
from dataquality.integrations.fastai import FastAiDQCallback
from fastai.vision.all import *
learn = vision_learner(dls,
'beit_base_patch16_224',
metrics=error_rate)
dqc = FastAiDQCallback(
# Optionally pass in the classification layer
# layer = learn.model[0].model.fc_norm
) # 🔭🌕 Galileo logging
learn.add_cb(dqc) # 🔭🌕 Galileo logging
learn.fine_tune(2)
Note: If no layer is passed to the callback the last layer will be chosen.

Example Notebooks