Abstract
Objective: This study aims to apply the synthetic data generation technique with the aid of data cleaning techniques for the classification of dyslexics and non - dyslexics.
Method: Outliers were selected by specialist. Synthetic of data Generated. For each of five algorithms, characteristics were selected with exhaustive search. Each algorithm was executed with the selected characteristics and then their calibration curves were compared. Results: Logistic regression presented the best results with 99% accuracy and area under the ROC curve of 0.999, besides obtaining the best calibration curve.
Conclusion: The use of synthetic data generation and feature selection were able to make all algorithms achieve excellent results in the classification of dyslexic and non - dyslexic. Logistic regression was selected as the best algorithm for dyslexic classification.
Method: Outliers were selected by specialist. Synthetic of data Generated. For each of five algorithms, characteristics were selected with exhaustive search. Each algorithm was executed with the selected characteristics and then their calibration curves were compared. Results: Logistic regression presented the best results with 99% accuracy and area under the ROC curve of 0.999, besides obtaining the best calibration curve.
Conclusion: The use of synthetic data generation and feature selection were able to make all algorithms achieve excellent results in the classification of dyslexic and non - dyslexic. Logistic regression was selected as the best algorithm for dyslexic classification.
Translated title of the contribution | Synthetic data generation for classification of dyslexics by machine learning |
---|---|
Original language | Portuguese |
Pages (from-to) | 10-16 |
Journal | Journal of Health Informatics |
Volume | 13 |
Issue number | 1 |
Publication status | Published - 1 Jan 2021 |