Setting up kubeflow#
Introduction#
In order to run Fondant on Kubeflow Pipelines, we'll need:
- A kubernetes cluster
- Kubeflow pipelines installed on the cluster
- A registry to store custom component images like (docker hub, Github Container Registry, etc)
This can be on any kubernetes cluster, if you don't have access to a setup like this or you feel uncomfortable to setup your own we have provided some basic scripts to get you started on GCP or on a small scale locally.
IMPORTANT
- These script serve just a kickstart to help you setup Kubeflow for running Fondant, these are not production ready environments.
- Spinning up a cluster on a cloud vendor will incur a cost.
- You should never run a script without inspecting it so please familiarize yourself with the commands defined in the Makefiles and adapt it to your own needs.
If you already have a kubernetes cluster#
If you already have a kubernetes cluster set up, and you have configured kubectl you can install kubeflow pipelines following this guide
Kubeflow on AWS#
There are multiple guides on how to setup kubeflow pipelines on AWS:
Fondant needs the host url of kubeflow pipelines which you can fetch ( depending on your setup).
The BASE_PATH can be an S3 bucket
Kubeflow on Google Cloud#
There are several ways to get up and running with kubeflow pipelines on Google Cloud.
- On the GCP marketplace
- How to do a standalone deployment of kubeflow pipelines on GKE
- Customizable deployments through overlays
OR you can use the scripts we provide to get a simple setup going#
-
If you don't already have a google cloud project ready you can follow this guide to set one up, you will need to have set up billing.
-
Make sure you have the gcloud cli installed (and it is the latest version) and that you have it configured to use your project by using
gcloud init
.
3. Setup Default compute Region and Zone
-
Install kubectl
-
Run gcp.mk Makefile (located in the
scripts/
folder) which will do the following: -
Setup all gcp services needed
- Start a GKE cluster
- Create a google storage bucket for data artifact storage
- Authenticate the local machine
- Install kubeflow pipelines on the cluster
To run the complete makefile use (note this might take some time to complete):
Or run specific steps:
Getting the variables for your pipeline#
Running the following command:
Will print out the BASE_PATH and HOST which you can use to configure your pipeline. The HOST url will also allow you to use the kubeflow ui when opened in a browser.