Galileo
Search
K

John Snow Labs

Effortless pyspark support
The dataquality package offers a low code integration for logging NER data with JSL.
Here are the steps to log your data to the console:
Step 1: Create a project and watch the your NER pipeline
You can find an example pipeline in the models of JSL.
from dataquality.integrations.jsl import JSLProject
# Galileo logging for John Snow Labs
project = JSLProject(
project_name="test_project",
run_name="ner_test_run_01",
url="https://optional-cluster-url.com"
)
# Watch a JSL pipeline
project.watch(pipeline)
Step 2: Evaluate
Now it's time to provide the data by calling project.evaluate you pass in the data and the split.
train_df= CoNLL().readDataset(spark, './eng.train')
test_df= CoNLL().readDataset(spark, './eng.test')
project.evaluate(training_df, split="training")
project.evaluate(test_df, split="test")
Step 3: Finish
That's it afterwards call project.finsh() to finalize the run
project.finsh()
Bonus:
You can also create a project and start evaluation directly and this will call finish automatically
# 🔭🌕 Galileo logging
project = JSLProject(
project_name="jsl_testrun",
run_name="ner_test_01",
url="console.dev.rungalileo.io",
pipeline=pipeline,
dataset=df
finish=True
)
Notebook resources:

Last modified 2mo ago