Data analytics for online travelling recommendation system: a case study

Alessio Petrozziello, Ivan Jordanov

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Nowadays, the online travel agencies (OTAs) provide the main service for booking holidays, business trips, accommodations, etc. As in all online services where users, items, and decisions are involved, there is a necessity for a Recommender System (RS) to facilitate the navigation of catalogues and websites. For a travel RS the use of a pure collaborative filtering approach is not feasible because the user-item matrix is way too sparse. For this reason, a content-based filtering is investigated in this work, focusing on one of its main problems: missing features. An initial exploratory analysis helps to identify a class of poorly ranked properties (e.g., Vacation Rentals (VR)). To deal with the missingness in the data, several state-of-the-art imputation methods (K-NN, Random Forests, and Gradient-Boosted Trees) are investigated and their performance critically analysed and tested. These techniques are applied following dataset preprocessing that includes cleaning, feature scaling, and standardization. In addition to that, a k-fold cross validation is used to validate the imputation results and reduce the possibility of overfitting. Three similarity measures (Jaccard, Weighted Hamming and Fuzzy-C-Means rankings) based on engineered non-historical features (amenities and geographical position) are analysed and employed for determining the best proxy for unavailable features.
Original languageEnglish
Title of host publicationProceedings of Modelling, Identification and Control (MIC2017)
EditorsM. H. Hamza
PublisherACTA Press
Number of pages7
ISBN (Print)978-0-88986-988-2, 978-0-88986-989-9
Publication statusPublished - 1 Mar 2017
EventThe 36th IASTED International Conference on Modelling, Identification and Control: MIC 2017 - Innsbruck, Austria
Duration: 20 Feb 201721 Feb 2017

Publication series

ISSN (Print)1025-8973


ConferenceThe 36th IASTED International Conference on Modelling, Identification and Control
Internet address


  • Data Analytics
  • Big Data
  • Fuzzy-C-Means
  • Random Forests
  • K-NN
  • Missing Features
  • Recommender Systems


Dive into the research topics of 'Data analytics for online travelling recommendation system: a case study'. Together they form a unique fingerprint.

Cite this