🌟
Logging Data to Galileo
The fastest way to find data errors in Galileo
When focusing on data-centric techniques for modeling, we believe it is important to focus on the data while keeping the model static. To enable this rapid workflow, we suggest you use the
dq.auto
workflow:After installing dataquality:
pip install dataquality
You simply add your data and wait for the model to train under the hood, and for Galileo to process the data. This processing can take between 5-15 minutes, depending on how much data you have.
auto
will wait until Galileo is completely done processing your data. At that point, you can go to the Galileo Console and begin inspecting.import dataquality as dq
dq.auto(train_data=train_df, val_data=val_df, test_data=test_df)
There are 3 general ways to use
auto
- Pass dataframes to
train_data
,val_data
andtest_data
(pandas or huggingface) - Pass paths to local files to
train_data
,val_data
andtest_data
- Pass a path to a huggingface Dataset to the
hf_data
parameter
dq.auto
supports both Text Classification and Named Entity Recognition tasks, with Multi-Label support coming soon. dq.auto
automatically determines the task type based off of the provided data schema.To see the other available parameters as well as more usage examples, see
help(dq.auto)
Use
auto
if:- You are looking to apply the most data-centric techniques to improve your data
- You don’t yet have a model to train
- You want to agnostically understand and fix your available training data
If you have a well-trained model and want to understand its performance on your data, or you are looking to deploy an existing model and monitor it with Galileo, please use our custom framework integrations.
Last modified 1mo ago