Upon completing a run, you'll be taken to the Galileo Console. By default, your Training split will be shown first. You can use the dropdown on the top-right to change it. The first thing you'll notice is your dataset on the right.
By default you will see on each row the Input, its Target (Expected Output), the Generated Output if available, and the Data Error Potential (DEP) of the sample. The samples are sorted by DEP, showing the hardest samples first. Each Token in the Target also has a DEP value, which can easily be seen via highlighting.
Grid view with one row per sample.
You can also view your samples in the embeddings space of the model. This can help you get a semantic understanding of your dataset. Using features like Color-By DEP, you might discover pockets of problematic data (e.g. decision boundaries that might benefit from more samples or a cluster of garbage samples).
Embedding view colored by DEP. The hard/problematic points will be colored in red.
Your left pane is called the Insights Menu. On the top, you can see your dataset size and choose the metric you want to guide your exploration by (F1 by default). Size and metric value update as you add filters to your dataset.
Clicking on an Alert will filter the dataset to the subset of data that corresponds to the Alert.
These charts are dynamic and update as you add different filters. They are also interactive - clicking on a class or group of classes will filter the dataset accordingly, allowing you to inspect and fix the samples.
The third tab is for your Clusters. We automatically cluster your dataset taking into account frequent words and semantic distance. For each Cluster, we show you its average DEP score and the size of the cluster - factors you can use to determine which clusters are worth looking into.
Description of various clusters.
Analyzing the various Clusters side-by-side with the embeddings view is often a hepful way to discover interesting pockets of data.
Clusters and Embeddings side-by-side.