Skip to content

Setting up kubeflow#

Introduction#

In order to run Fondant on Kubeflow Pipelines, we'll need:

  • A kubernetes cluster
  • Kubeflow pipelines installed on the cluster
  • A registry to store custom component images like (docker hub, Github Container Registry, etc)

This can be on any kubernetes cluster, if you don't have access to a setup like this or you feel uncomfortable to setup your own we have provided some basic scripts to get you started on GCP or on a small scale locally.

IMPORTANT

  • These script serve just a kickstart to help you setup Kubeflow for running Fondant, these are not production ready environments.
  • Spinning up a cluster on a cloud vendor will incur a cost.
  • You should never run a script without inspecting it so please familiarize yourself with the commands defined in the Makefiles and adapt it to your own needs.

If you already have a kubernetes cluster#

If you already have a kubernetes cluster set up, and you have configured kubectl you can install kubeflow pipelines following this guide

Kubeflow on AWS#

There are multiple guides on how to setup kubeflow pipelines on AWS:

Fondant needs the host url of kubeflow pipelines which you can fetch ( depending on your setup).

The BASE_PATH can be an S3 bucket

Kubeflow on Google Cloud#

There are several ways to get up and running with kubeflow pipelines on Google Cloud.

OR you can use the scripts we provide to get a simple setup going#

  1. If you don't already have a google cloud project ready you can follow this guide to set one up, you will need to have set up billing.

  2. Make sure you have the gcloud cli installed (and it is the latest version) and that you have it configured to use your project by using gcloud init.

3. Setup Default compute Region and Zone

  1. Install kubectl

  2. Run gcp.mk Makefile (located in the scripts/ folder) which will do the following:

  3. Setup all gcp services needed

  4. Start a GKE cluster
  5. Create a google storage bucket for data artifact storage
  6. Authenticate the local machine
  7. Install kubeflow pipelines on the cluster

To run the complete makefile use (note this might take some time to complete):

make -f gcp.mk

Or run specific steps:

make -f gcp.mk authenticate-gcp-cluster

Getting the variables for your pipeline#

Running the following command:

make -f gcp.mk kubeflow-ui

Will print out the BASE_PATH and HOST which you can use to configure your pipeline. The HOST url will also allow you to use the kubeflow ui when opened in a browser.

In order to delete the setup:#

make -f gcp.mk delete

More Information#