🏃
Dataquality Integrations
Galileo offers a number of custom framework integrations to help you hook Galileo into your training flow.
If you have a well-trained model and want to understand its performance on your data, or you are looking to deploy an existing model and monitor it with Galileo, our custom framework integrations allow you to quickly fix and find data error across your model's lifecycle.

The easiest way to get started is by simply providing a pandas dataframe with your data. Our autoML solution does the rest! We refer to dq.auto for more details.
import pandas as pd
from sklearn.datasets import fetch_20newsgroups
import dataquality as dq
# Load the newsgroups dataset from sklearn
newsgroups_train = fetch_20newsgroups(subset='train')
newsgroups_test = fetch_20newsgroups(subset='test')
# Convert to pandas dataframes
df_train = pd.DataFrame({"text": newsgroups_train.data, "label": newsgroups_train.target})
df_test = pd.DataFrame({"text": newsgroups_test.data, "label": newsgroups_test.target})
dq.auto(
train_data=df_train,
test_data=df_test,
labels=newsgroups_train.target_names,
project_name="newsgroups_work",
run_name="run_1_raw_data"
)
The
dataquality
function automatically trains on the data for text classification with a model to provide your insights.dataquality
supports integrations with many different frameworks such as Keras
, Tensorflow
, PyTorch
or Transformers
, see below for a more exhaustive list. Here is a very shirt example on how to hook dataquality in your model.import dataquality as dq
with dq(
model,
labels=labels,
train_data=train_dataset,
test_data=test_dataset,
task="text_classification",
):
for epoch in range(epochs):
dq.set_split("train")
dq.set_epoch(epoch)
train()
dq.set_split("test")
test()
Here is another example on how to hook
dataquality
in your HuggingFace/Transformers model using transformers.Trainer
.from transformers import AutoModelForSequenceClassification
from transformers import Trainer
import dataquality
model_checkpoint = "distilbert-base-uncased"
num_labels = 2
train_dataset = encoded_train_dataset
test_dataset = encoded_test_dataset
tokenizer = tokenizer
compute_metrics = compute_metrics
model = AutoModelForSequenceClassification.from_pretrained(model_checkpoint, num_labels=num_labels)
trainer = Trainer(
model,
args_default,
train_dataset=encoded_train_dataset,
eval_dataset=encoded_test_dataset,
tokenizer=tokenizer,
compute_metrics=compute_metrics,
)
with dataquality(
trainer,
labels=labels,
train_data=train_dataset,
test_data=test_dataset,
task="text_classification",
):
trainer.train()
trainer.evaluate()
Last modified 26d ago