Skip to content

A data mining approach for early mortality prediction of patients in intensive care units

Student thesis: Doctoral Thesis

Mortality prediction for hospitalized patients is an important problem. Over the past few decades, several severity scoring systems and machine learning models have been developed for predicting mortality in hospitals in general, and in intensive care units in particular. However, early mortality prediction in intensive care remains an open challenge. Most research has focused on Severity of Illness scores or Data Mining models designed for risk estimation at least 24 or 48 hours after intensive care admission. In this study, we aim to provide a model that can predict mortality from the patient’s early hours of admission and to reach a performance that is better than existing methods.

This research is conducted on the Multiparameter Intelligent Monitoring in Intensive Care database. An in-depth analysis of the database has been conducted. Problem assumptions and initial attribute selections have been defined. Relevant data has been preprocessed, extracted and converted for data mining analysis.

The thesis starts by presenting two initial studies to compare the performance of the different approaches for handling mortality prediction: (1) A comparative study of Severity of Illness scores for ICU mortality prediction and (2) A time-series analysis for ICU mortality prediction using data mining classification models. The two studies have enabled the provision of a pioneer framework for early mortality prediction named ’EMPICU’, which investigates thoroughly the prediction effectiveness of data mining classification models, after 6 hours of admission. The framework is tested for classification performance with different attribute selections and different classification models handling both missing values and class imbalance problems. The best performing model is the EMPICU-Random Forests with the 7 physiological vital signs in addition to age with excellent performance with Area Under the Receiver Operating Characteristic curve of 0.90. The EMPICU-Random Forests model at 6 hours of admission outperformed Severity of Illness scores at 24 hours after admission, which indicates that the proposed model predicts earlier with higher performance.
Original languageEnglish
Awarding Institution
Supervisors/Advisors
Award date2018
Relations Get citation (various referencing formats)

ID: 12937923