Skip to content

Geração de dados sintéticos para classificação de disléxicos por meio de aprendizado de máquina

Research output: Contribution to journalArticlepeer-review

  • Antonio Carlos da Silva Junior
  • Emanuela Cristina Ramos Gonçalves
  • Paulo Schor
  • Dr Martina Navarro
  • Felipe Mancini
Objective: This study aims to apply the synthetic data generation technique with the aid of data cleaning techniques for the classification of dyslexics and non - dyslexics.

Method: Outliers were selected by specialist. Synthetic of data Generated. For each of five algorithms, characteristics were selected with exhaustive search. Each algorithm was executed with the selected characteristics and then their calibration curves were compared. Results: Logistic regression presented the best results with 99% accuracy and area under the ROC curve of 0.999, besides obtaining the best calibration curve.

Conclusion: The use of synthetic data generation and feature selection were able to make all algorithms achieve excellent results in the classification of dyslexic and non - dyslexic. Logistic regression was selected as the best algorithm for dyslexic classification.
Translated title of the contributionSynthetic data generation for classification of dyslexics by machine learning
Original languagePortuguese
Pages (from-to)10-16
JournalJournal of Health Informatics
Issue number1
Publication statusPublished - 1 Jan 2021

Related information

Relations Get citation (various referencing formats)

ID: 27250803