Skip to content


Production-ready data processing made easy and shareable
Explore the docs ยป

Discord PyPI version License GitHub Workflow Status Coveralls

๐Ÿš€ Production-ready ๐Ÿ‘ถ Easy ๐Ÿ‘ซ Shareable
Benefit from built-in features such as autoscaling, data lineage, and pipeline caching, and deploy to (managed) platforms such as Vertex AI, Sagemaker, and Kubeflow Pipelines. Implement your custom data processing code using datastructures you know such as Pandas dataframes. Move from local development to remote deployment without any code changes. Fondant components are defined by a clear interface, which makes them reusable and shareable.
Compose your own pipeline using components available on our hub.

๐Ÿชค Why Fondant?#

With the advent of transfer learning and now foundation models, everyone has started sharing and reusing machine learning models. Most of the work now goes into building data processing pipelines, which everyone still does from scratch. This doesn't need to be the case, though, if processing components would be shareable and pipelines composable. Realizing this is the main vision behind Fondant.

Towards that end, Fondant offers:

  • ๐Ÿ”ง Plug โ€˜nโ€™ play composable data processing pipelines
  • ๐Ÿงฉ Library containing off-the-shelf reusable components
  • ๐Ÿผ A simple Pandas based interface for creating custom components
  • ๐Ÿ“Š Built-in lineage, caching, and data explorer
  • ๐Ÿš€ Production-ready, scalable deployment
  • โ˜๏ธ Integration with runners across different clouds (Vertex, Sagemaker, Kubeflow)

(back to top)

๐Ÿ’จ Getting Started#

Eager to get started? Follow our step by step guide to get your first pipeline up and running.

(back to top)

๐Ÿงฉ Reusable components#

Fondant comes with a library of reusable components that you can leverage to compose your own pipeline:

  • Data ingestion: S3, GCS, ABS, Hugging Face, local file system, ...
  • Data Filtering: Duplicates, language, visual style, topic, format, aesthetics, NSFW, license, ...
  • Data Enrichment: Captions, segmentations, embeddings, ...
  • Data Transformation: Image cropping, image resizing, text chunking, ....
  • Data retrieval: Common Crawl, LAION, ...

๐Ÿ‘‰ Check our Component Hub for an overview of all available components

(back to top)

๐Ÿช„ Example pipelines#

We have created several ready-made example pipelines for you to use as a starting point for exploring Fondant. They are hosted as separate repositories containing a notebook tutorial so you can easily clone them and get started:

๐Ÿ“– RAG tuning pipeline
End-to-end Fondant pipelines to index and evaluate RAG (Retrieval-Augmented Generation) systems.

๐Ÿ›‹๏ธ ControlNet Interior Design Pipeline
An end-to-end Fondant pipeline to collect and process data for the fine-tuning of a ControlNet model, focusing on images related to interior design.

๐Ÿ–ผ๏ธ Filter creative common license images
An end-to-end Fondant pipeline that starts from our Fondant-CC-25M creative commons image dataset and filters and downloads the desired images.

โš’๏ธ Installation#

First, run the minimal Fondant installation:

pip install fondant

Fondant also includes extra dependencies for specific runners, storage integrations and publishing components to registries. The dependencies for the local runner (docker) is included by default.

For more detailed installation options, check the installation pageon our documentation.

๐Ÿ‘จโ€๐Ÿ’ป Usage#


Fondant allows you to easily define data pipelines comprised of both reusable and custom components. The following pipeline for instance uses the reusable load_from_hf_hub component to load a dataset from the Hugging Face Hub and process it using a custom component:

from fondant.pipeline import Pipeline

pipeline = Pipeline(name="example pipeline", base_path="./data")

dataset =
        "dataset_name": "lambdalabs/pokemon-blip-captions"

dataset = dataset.apply(
        "resize_width": 128,
        "resize_height": 128,

Custom use cases require the creation of custom components. Check out our getting started page to learn more about how to build custom pipelines and components.

Running your pipeline#

Once you have a pipeline you can easily run (and compile) it by using the built-in CLI:

fondant run local

To see all available runner and arguments you can check the fondant CLI help pages

fondant --help

Or for a subcommand:

fondant <subcommand> --help

(back to top)

๐Ÿ‘ญ Contributing#

We welcome contributions of different kinds:

Issues If you encounter any issue or bug, please submit them as a Github issue. You can also submit a pull request directly to fix any clear bugs.
Suggestions and feedback Our roadmap and priorities are defined based on community feedback. To provide input, you can join our discord or submit an idea in our Github Discussions.
Framework code contributions If you want to help with the development of the Fondant framework, have a look at the issues marked with the good first issue label. If you want to add additional functionality, please submit an issue for it first.
Reusable components Extending our library of reusable components is a great way to contribute. If you built a component which would be useful for other users, please submit a PR adding them to the components/ directory. You can find a list of possible contributable components here or your own ideas are also welcome!

For a detailed view on the roadmap and day to day development, you can check our github project board.

You can also check out our architecture page to familiarize yourself with the Fondant architecture and repository structure.

Environment setup#

We use poetry and pre-commit to enable a smooth developer flow. Run the following commands to set up your development environment:

pip install poetry
poetry install --all-extras
pre-commit install

(back to top)