Model Monitoring and Data Drift

Once your model is in production, it is essential to monitor its health:

Is there training<>production data drift? What unlabeled data should I select for my next training run? Is the model confidence dropping on an existing class in production? ...

To answer the above questions and more with Galileo, you will need:

  1. Your unlabeled production data

  2. Your model

โšก๏ธSimply run an inference job on production data to view, inspect and select samples directly in the Galileo UI.

Here is what to expect:

โ€ข Get the list of drifted data samples out of the box

โ€ข Get the list of on-the-class-boundary samples out of the box

โ€ข Quickly compare model confidence and class distributions between production and training runs

โ€ข Find similar samples to low-confidence production data within less than a second

... and a lot more

Full Walkthrough Tutorial

Follow our example notebook with Pytorch or read the full tutorial below.

After building and training a model, inference allows us to run that model on unseen data, such as deploying that model in production. In text classification, given an unseen set of documents, the task is to predict (as correctly as possible) the class of that document based on the data seen during training.

input = "Perfectly works fine after 10 years, would highly recommend. Great buy!!"
# Unknown output label
model.predict(input) --> "positive review"

Logging the Data Inputs

Log your inference dataset. Galileo will join these samples with the model's outputs and present them in the Console. Note that unlike training, where ground truth labels are present for validation, during inference we assume that no ground truth labels exist.

import torch
import dataquality
import pandas as pd
from transformers import AutoTokenizer

class InferenceTextDataset(
    def __init__(
        self, dataset: pd.DataFrame, inference_name: str
        self.dataset = dataset

        # ๐Ÿ”ญ๐ŸŒ• Galileo logging
        # Note 1: this works seamlessly because self.dataset has text, label, and
        # id columns. See `help(dq.log_dataset)` for more info
        # Note 2: We can set the inference_name for our run
        dq.log_dataset(self.dataset, split="inference", inference_name=inference_name)

        tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
        self.encodings = tokenizer(
            self.dataset["text"].tolist(), truncation=True, padding=True

    def __getitem__(self, idx):
        x = torch.tensor(self.encodings["input_ids"][idx])
        attention_mask = torch.tensor(self.encodings["attention_mask"][idx])

        return self.dataset["id"][idx], x, attention_mask

    def __len__(self):
        return len(self.dataset)

Logging the Inference Model Outputs

Log model outputs from within your model's forward function.

import torch
import torch.nn.functional as F
from torch.nn import Linear
from transformers import AutoModel

class TextClassificationModel(torch.nn.Module):
    """Defines a Pytorch text classification bert based model."""

    def __init__(self, num_labels: int):
        self.feature_extractor = AutoModel.from_pretrained("distilbert-base-uncased")
        self.classifier = Linear(self.feature_extractor.config.hidden_size, num_labels)

    def forward(self, x, attention_mask, ids):
        """Model forward function."""
        encoded_layers = self.feature_extractor(
            input_ids=x, attention_mask=attention_mask
        classification_embedding = encoded_layers[:, 0]
        logits = self.classifier(classification_embedding)

        # ๐Ÿ”ญ๐ŸŒ• Galileo logging
            embs=classification_embedding, logits=logits, ids=ids

        return logits

Putting it all together

Login and initialize a new project + run name or one matching an existing training run (this will add inference to that training run in the console). Then, load and log your inference dataset; load a pre-trained model; set the split to inference and run your inference run; finally call dq.finish()!

Note: If you're extending a current training run, the list_of_labels logged for your dataset must match exactly that used during training.

import numpy as np
import io
import random
from smart_open import open as smart_open
import s3fs
import torch
import torch.nn.functional as F
import torchmetrics
from tqdm.notebook import tqdm


# ๐Ÿ”ญ๐ŸŒ• Galileo logging - initialize project/run name

dq.init(task_type="text_classification", project_name=project_name, run_name=run_name)

device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu"))

inference_dataset = InferenceTextDataset(inference_df, inference_name="inference_run_1")

# ๐Ÿ”ญ๐ŸŒ• Galileo logging
# Note: if you are adding the inference run to a previous
# training run, the labels and there order must match that used
# in training. If you're logging inference in isolation then
# this order does not matter.
list_of_labels = ["labels", "ordered", "from", "trianing"]

inference_dataloader =

# Load your pre-trained model
model_path = "path/to/your/"
model = TextClassificationModel(num_labels=len(list_of_labels))


# ๐Ÿ”ญ๐ŸŒ• Galileo logging - naming your inference run
inference_name = "inference_run_1"
dq.set_split("inference", inference_name)

for data in tqdm(inference_dataloader):
    x_idxs, x, attention_mask = data
    x =
    attention_mask =

    model(x, attention_mask, x_idxs)

print("Finished Inference")

# ๐Ÿ”ญ๐ŸŒ• Galileo logging

print("Finished uploading")

To learn more about Data Drift, Class Boundary Detection or other Model Monitoring features, check out the Galileo Product Features Guide.

Last updated