Synthetic data generation for classification of dyslexics by machine learning

Antonio Carlos da Silva Junior, Emanuela Cristina Ramos Gonçalves, Paulo Schor, Martina Navarro, Felipe Mancini

Research output: Contribution to journalArticlepeer-review

Abstract

Objective: This study aims to apply the synthetic data generation technique with the aid of data cleaning techniques for the classification of dyslexics and non - dyslexics.

Method: Outliers were selected by specialist. Synthetic of data Generated. For each of five algorithms, characteristics were selected with exhaustive search. Each algorithm was executed with the selected characteristics and then their calibration curves were compared. Results: Logistic regression presented the best results with 99% accuracy and area under the ROC curve of 0.999, besides obtaining the best calibration curve.

Conclusion: The use of synthetic data generation and feature selection were able to make all algorithms achieve excellent results in the classification of dyslexic and non - dyslexic. Logistic regression was selected as the best algorithm for dyslexic classification.
Translated title of the contributionSynthetic data generation for classification of dyslexics by machine learning
Original languagePortuguese
Pages (from-to)10-16
JournalJournal of Health Informatics
Volume13
Issue number1
Publication statusPublished - 1 Jan 2021

Fingerprint

Dive into the research topics of 'Synthetic data generation for classification of dyslexics by machine learning'. Together they form a unique fingerprint.

Cite this