Abstract
Objectives - To explore the process, strategies and limitations to obtaining research viable data from dental electronic recordsMethods: System query language (SQL) was used to extract cross-sectional data from primary care patient management systems used in the University of Portsmouth Dental Academy which is a training and state funded dental Academy in England; spanning a four-year period. A review of the user interface was used to inform SQL syntax that would produce data on patients’ demography, socio-economic status and behaviours. Validation and data cleaning was undertaken in line with principles from Rahm and Hai Do (2013) and Maletic and Marcus (2000) frameworks.
Results - A two stage process of data extraction was essential; first, a pilot extraction and second, a main data extraction. The pilot of 4,343 patient records informed a more robust second main extract of 6,351 records. Record validation and cleaning processes Identified limitations that included i)overriding of data ii) missing attributes iii) lack of homogeneity iv) data entry errors v) naming conflicts vi) test records. Data cleaning involved merging of naming conflicts, omission of missing variables and test records and validation involved manual record checks and analysis of samples of the data.
Conclusions - The primary users’ data entry processes need to be streamlined to ensure appropriate population of data. Software developers need to carefully align syntax and variables following system upgrades to previous versions of software in order to ensure homogeneity of variable names across different periods. Researchers need to adapt stringent validation and cleaning strategies to guarantee that the electronic data used in research are accurate.
Results - A two stage process of data extraction was essential; first, a pilot extraction and second, a main data extraction. The pilot of 4,343 patient records informed a more robust second main extract of 6,351 records. Record validation and cleaning processes Identified limitations that included i)overriding of data ii) missing attributes iii) lack of homogeneity iv) data entry errors v) naming conflicts vi) test records. Data cleaning involved merging of naming conflicts, omission of missing variables and test records and validation involved manual record checks and analysis of samples of the data.
Conclusions - The primary users’ data entry processes need to be streamlined to ensure appropriate population of data. Software developers need to carefully align syntax and variables following system upgrades to previous versions of software in order to ensure homogeneity of variable names across different periods. Researchers need to adapt stringent validation and cleaning strategies to guarantee that the electronic data used in research are accurate.
Original language | English |
---|---|
Publication status | Published - Jun 2016 |
Event | IADR 94th General Session - , Korea, Republic of Duration: 22 Jun 2016 → 25 Jun 2016 |
Conference
Conference | IADR 94th General Session |
---|---|
Country/Territory | Korea, Republic of |
Period | 22/06/16 → 25/06/16 |