DeepGuardDB: Real and Text-to-Image Synthetic Images Dataset

  • Ikram Reghioua Reghioua (Creator)
  • Mouna Yasmine Namani Namani (Creator)
  • Gueltoum Bendiab Bendiab (Frères Mentouri Constantine 1 University) (Creator)
  • Mohamed Aymen Labiod Labiod (Creator)
  • Stavros Shiaeles (Creator)

Dataset

Description

"Recent advancements in deep learning and generative models have significantly enhanced text-to-image (T2I) synthesis, allowing for the creation of highly realistic images based on textual inputs. While this progress has expanded the creative and practical applications of AI, it also presents new challenges in distinguishing between authentic and AI-generated images. This challenge raises serious concerns in areas such as security, privacy, and digital forensics. In response, there has been growing attention on the development of advanced AI-based detectors designed to reliably differentiate between synthetic and real images, ensuring data authenticity and protection against potential misuse. Using reliable and diverse datasets of fake and real data is crucial for training and evaluating the learning models effectively. For that, the research community has made significant efforts to develop dedicated datasets for this specific purpose. As the T2I generation tools continue to evolve rapidly, there is an ongoing need to update and refine existing datasets to keep pace with the latest advancements. This constant evolution drives us to continuously improve our resources, ensuring that they reflect the state-of-the-art in image generation. In this context, we have constructed the DeepGuardDB dataset, which plays a pivotal role in evaluating and enhancing models designed to differentiate between AI-generated images and real ones. To ensure a comprehensive and representative evaluation, the DeepGuardDB dataset has been meticulously curated, addressing the limitations of existing datasets by incorporating a diverse array of visual content. DeepGuardDB dataset leverages Stable Diffusion3, which produces higher-quality images in addition to Imagen and DALL-E 3. DeepGuardDB contains 12,000 images, evenly split between real and generated images, with 6000 (50%) representing each category. The real images included in DeepGuardDB are collected from two well-established datasets, each recognized for its richness and diversity: MS-COCO (Microsoft Common Objects in Context) and Flickr30k. For the AI-generated images, DeepGuardDB leverages three of the most advanced T2I generation platforms available today: Stable Diffusion 3, Imagen, and DALL-E 3. The synthetic images were created using the same prompts as those used to generate the real images. By employing identical textual descriptions, the AI aimed to produce images that closely resemble the authentic ones. This approach highlights the challenge of distinguishing between real and AI-generated content, as the use of the same prompts ensures that both sets of images share similar themes, subjects, and visual cues"
Date made available10 Jul 2024
PublisherIEEE DataPort

Cite this