Spacy
Spacy only works for Named Entity Recognition tasks.
Spacy
import dataquality as dq
# 🔭🌕 Galileo logging - initialize project/run name
dq.login()
dq.init(task_type="text_ner", project_name="example_project", run_name="example_run")
Log a human-readable version of your dataset. Galileo will join these samples with the model's outputs and present them in the Console.
Spacy
from dataquality.integrations.spacy import log_input_examples
# Create Spacy examples from your data
# type(train_examples[0]) == spacy.training.example.Example
# 🔭🌕 Galileo logging
log_input_examples(train_examples, "training")
Initialize your model, then call our watch function to wrap it and auto-log.
watch(nlp)
wraps the spacy.Language
with some Galileo logging code to instrument the necessary metrics from the model.Spacy
from dataquality.integrations.spacy import watch
optimizer = nlp.initialize(lambda: train_examples+test_examples)
# 🔭🌕 Galileo wrapper
watch(nlp)
Now you are ready to train your model! Log where you are within the training pipeline (epoch and current split) and behind the scenes Galileo will track the different stages of training and will combine your model outputs with your logged input data.
Spacy
...
for epoch in range(num_epochs):
# 🔭🌕 Galileo logging
dq.set_epoch(epoch)
# 🔭🌕 Galileo logging
# Training epoch
dq.set_split("training") # 🔭🌕
nlp.update(...)
# 🔭🌕 Galileo logging
# Evaluation
dq.set_split("test") # 🔭🌕
nlp.evaluate(...)
...
To finish, simply call
dq.finish
and your data will be uploaded and processed by the Galileo API server. This may take a few minutes, depending on the size of your dataset.Spacy
dq.finish() # 🔭🌕 This will wait until the run is processed by Galileo
Last modified 6mo ago