AbstractThis dissertation investigates state-of-the-art machine learning methods (both shallow and deep) and their application for knowledge extraction, prediction, recognition, and classification of large-scale real-world problems in different areas (healthcare, online recommender systems, pattern recognition and security, prediction in finance, etc.).
The first part of this work focuses on the missing data problem and its impact on a variety of machine learning tasks (i.e., classification, regression and learn to rank), introducing new methods to tackle this problem for medium, large and big datasets. After an initial overview of the literature on missing data imputation, a classification task for the identification of radar signal emitters with a high percentage of missing values in its features is investigated. Successively, the impact of missing data on Recommender Systems is examined, focussing on Online Travel Agencies and the ranking of their properties. In relation to the missing data imputation problem, two novel approaches have been introduced, the first one is an aggregation model of the most suitable imputation techniques based on their performance for each individual feature of the dataset. The second one aims to impute missing values at scale (large datasets) through a distributed neural network implemented in Apache Spark.
The second part of this dissertation investigates the use of Deep Learning techniques to tackle three real-world problems. In the first one, both Convolutional Neural Networks and Long Short Term Memory Networks are used for the detection of hypoxia during childbirth. Next, the profitability of Multivariate Long Short Term Memory Networks for the forecast of stock market volatility is explored. Lastly, Convolutional Neural Networks and Stacked Autoencoders are used to detect threats from hand-luggage and courier parcel x-ray images.
|Date of Award||Aug 2019|