SageMaker
SageMaker Runner#
Leverage AWS SageMaker to run your Fondant workflow.
This makes it easy to scale up your workflows in a serverless manner without worrying about infrastructure deployment.
The Fondant SageMaker runner will compile your Fondant workflow to a SageMaker pipeline spec and submit it to SageMaker.
IMPORTANT
Using the SageMaker runner will create a through cache rule on the private ECR registry of your account. This is required to make sure that SageMaker can access the public reusable images used by Fondant components.
Installing the SageMaker runner#
Make sure to install Fondant with the SageMaker runner extra.
Prerequisites#
- You will need a sagemaker domain and user with the correct permissions. You can follow the instructions here to set this up. Make sure to note down the role arn(
arn:aws:iam::<account_id>:role/service-role/AmazonSageMaker-ExecutionRole-<creation_timestamp>
) of the user you are using since you will need it. - You will need to have an AWS account and have the AWS CLI installed and configured.
-
Fondant on SageMaker uses an s3 bucket to store the pipeline artifacts. You will need to create an s3 bucket that SageMaker can use to store artifacts (manifests and data). You can create a bucket using the AWS CLI:
IMPORTANT
Regarding the bucket and SageMaker permissions:
- If you use the the term 'sagemaker' in the name of the bucket, SageMaker will automatically have the correct permissions to the access bucket.
- If you use any other name or existing bucket you will need to add a policy on the role that SageMaker uses to access the bucket.
You can then set this bucket as the base_path
of your pipeline with the syntax: s3://<bucket_name>/<path>
.
Running a pipeline with SageMaker#
Since compiling a sagemaker spec requires access to the AWS SageMaker API, you will need to be logged in to AWS with a role that has all the required permissions to launch a SageMaker pipeline.
Once your workflow is running you can monitor it using the SageMaker Studio.
Using custom Fondant components on SageMaker#
SageMaker only supports images hosted on a private ECR registry. If you want to use custom Fondant components on SageMaker you will need to build and push them to your private ECR registry first. You can do this using the fondant build
command.
But first you need to login into Docker with valid ECR credentials more info here:
aws ecr get-login-password --region <region> | docker login --username AWS --password-stdin <aws_account_id>.dkr.ecr.<region>.amazonaws.com
You will need to create a repository for you component first (one time operation):
Now you can use the fondant build
command (which uses Docker under the hood) to build and push your custom components to your private ECR registry:
fondant build <component dir> -t <aws_account_id>.dkr.ecr.<region>.amazonaws.com/<component_name>:<tag>
Assigning custom resources to the pipeline#
The SageMaker runner supports assigning a specific instance_type
to each component. This can be done by using the resources block when defining a component.
If not specified, the default instance_type
is ml.t3.medium
. The instance_type
needs to be a valid SageMaker instance type you can find more info here.