Logging Data to Galileo
The fastest way to find data errors in Galileo
When focusing on data-centric techniques for modeling, we believe it is important to focus on the data while keeping the model static. To enable this rapid workflow, we suggest you use the
After installing dataquality:
pip install dataquality
You simply add your data and wait for the model to train under the hood, and for Galileo to process the data. This processing can take between 5-15 minutes, depending on how much data you have.
autowill wait until Galileo is completely done processing your data. At that point, you can go to the Galileo Console and begin inspecting.
import dataquality as dq
dq.auto(train_data=train_df, val_data=val_df, test_data=test_df)
There are 3 general ways to use
- Pass dataframes to
test_data(pandas or huggingface)
- Pass paths to local files to
- Pass a path to a huggingface Dataset to the
dq.autosupports both Text Classification and Named Entity Recognition tasks, with Multi-Label support coming soon.
dq.autoautomatically determines the task type based off of the provided data schema.
To see the other available parameters as well as more usage examples, see
- You are looking to apply the most data-centric techniques to improve your data
- You don’t yet have a model to train
- You want to agnostically understand and fix your available training data
If you have a well-trained model and want to understand its performance on your data, or you are looking to deploy an existing model and monitor it with Galileo, please use our custom framework integrations.