Kubeflow
Kubeflow Runner#
Leverages Kubeflow pipelines on any Kubernetes cluster. All Fondant needs is a url pointing to the Kubeflow pipeline host and an Object Storage provider ( S3, GCS, etc) to store data produced in the pipeline between steps. We have compiled some references and created some scripts to get you started with setting up the required infrastructure.
Installing the Kubeflow runner#
Make sure to install Fondant with the Kubeflow runner extra.
Materialize a dataset with Kubeflow#
You will need a Kubeflow cluster to run your workflow on and specify the host of that cluster. More info on setting up a Kubeflow pipelines deployment and the host path can be found in the kubeflow infrastructure documentation.
The dataset ref is reference to a fondant dataset (e.g. pipeline.py
) where a dataset instance
exists.
Once your workflow is running you can monitor it using the Kubeflow UI.
Assigning custom resources to your run#
Each component can optionally be constrained to run on particular node(s) using node_pool_label
and node_pool_name
. You can find these under the Kubernetes labels of your cluster.
You can use the default node label provided by Kubernetes or attach your own. Note that the value of
these labels is cloud provider specific. Make sure to assign a GPU if required, the specified node
needs to
have an available GPU.