The data explorer enables you to explore your pipelines as well as inspecting inputs and outputs of the pipeline's components. The explorer can be a helpful tool to debug your pipeline and to get a better understanding of the data that is being processed. It can also be used to compare different pipeline runs which can be useful to understand the impact of changes in your pipeline.
The explorer consists of 4 main tabs:
In the general overview, you can select the pipeline and pipeline run you want to explore. You will be able to see the different components that were run in the pipeline run and get an overview of your latest runs.
The data explorer shows an interactive table of the loaded fields from a given component. In this you can:
- Browse through different parts of the data
- Visualize images
- Search for specific rows using a search query
- Visualize long documents using a document viewer
- Compare different pipeline runs (coming soon!)
The image explorer tab enables the user to choose one of the image columns and analyse these images.
The numerical analysis tab shows global statistics of the numerical columns of the loaded subset ( mean, std, percentiles, ...).
How to use?#
You can setup the data explorer container with the
fondant explore CLI command, which is installed
together with the Fondant python package.
Where the base path can be either a local or remote base path. Make sure to pass the proper mount
credentials arguments when using a remote base path or a local base path
that references remote datasets. You can do that either with
mount your default local cloud credentials to the pipeline. Or You can also use
--extra-volumnes flag to specify credentials or local files you need to mount.
To stop the data explorer service you can use the following commands: