Enhanced dataset synthesis using CTGAN for metagenomic dataset

Volkan Ince, Mohamed Bader-El-Den, Omer Faruk Sari

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The examination of bacterial communities has increasingly relied on machine learning methods and metagenomic analysis, providing novel solutions across various domains. However, the restricted size of metagenomic datasets presents challenges for robust model training. Consequently, data augmentation techniques, such as Conditional Tabular Generative Adversarial Networks (CTGAN), have obtained attention. This study seeks to utilize machine learning algorithms, incorporating CTGAN, to assess the influence of microbial community composition on the growth patterns of Clostridium bacteria in metagenomic dataset. Additionally, the study employs SHAP analysis to explain feature importance and contrast model performance pre- and post-data augmentation. The findings demonstrate notable enhancements in classification metrics subsequent to data augmentation, particularly evident when excluding the 'Day' feature. Moreover, SHAP analysis identifies pivotal features, notably the absence of the 'Day' variable post-CTGAN synthesis, emphasizing the significance of specific bacterial genera like Clostridium in bacterial growth dynamics. Overall, this study underscores the efficacy of data augmentation techniques, specifically CTGAN, in enhancing machine learning model performance for metagenomic data classification tasks, with implications for refining food safety and healthcare protocols. Further research could explore advanced data augmentation methodologies and validate outcomes on more expansive datasets for practical implementation.

Original languageEnglish
Title of host publication2024 IEEE 12th International Conference on Intelligent Systems, IS 2024 - Proceedings
EditorsVassil Sgurev, Vladimir Jotsov, Vincenzo Piuri, Luybka Doukovska, Radoslav Yoshinov
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1-6
Number of pages6
ISBN (Electronic)9798350350982
ISBN (Print)9798350350999
DOIs
Publication statusPublished - 9 Oct 2024
Event12th IEEE International Conference on Intelligent Systems, IS 2024 - Varna, Bulgaria
Duration: 29 Aug 202431 Aug 2024

Publication series

NameIEEE International Conference on Intelligent Systems
PublisherIEEE
ISSN (Print)2832-4145
ISSN (Electronic)2767-9802

Conference

Conference12th IEEE International Conference on Intelligent Systems, IS 2024
Country/TerritoryBulgaria
CityVarna
Period29/08/2431/08/24

Keywords

  • Explainable AI
  • Generative AI
  • Metagenomic data
  • Supervised machine learning

Fingerprint

Dive into the research topics of 'Enhanced dataset synthesis using CTGAN for metagenomic dataset'. Together they form a unique fingerprint.

Cite this