Skip to content

2023/09#

25 million Creative Commons image dataset released

Fondant is an open-source project that aims to simplify and speed up large-scale data processing by making containerized components reusable across pipelines & execution environments, shared within the community.

A current challenge for generative AI is compliance with copyright laws. For this reason, Fondant has developed a data-processing pipeline to create a 500-million dataset of Creative Commons images to train a latent diffusion image generation model that respects copyright. Today, as a first step, we are releasing a 25-million sample dataset and invite the open source community to collaborate on further refinement steps.