Abstract
Nowadays, the online travel agencies (OTAs) provide the main service for booking holidays, business trips, accommodations, etc. As in all online services where users, items, and decisions are involved, there is a necessity for a Recommender System (RS) to facilitate the navigation of catalogues and websites. For a travel RS the use of a pure collaborative filtering approach is not feasible because the user-item matrix is way too sparse. For this reason, a content-based filtering is investigated in this work, focusing on one of its main problems: missing features. An initial exploratory analysis helps to identify a class of poorly ranked properties (e.g., Vacation Rentals (VR)). To deal with the missingness in the data, several state-of-the-art imputation methods (K-NN, Random Forests, and Gradient-Boosted Trees) are investigated and their performance critically analysed and tested. These techniques are applied following dataset preprocessing that includes cleaning, feature scaling, and standardization. In addition to that, a k-fold cross validation is used to validate the imputation results and reduce the possibility of overfitting. Three similarity measures (Jaccard, Weighted Hamming and Fuzzy-C-Means rankings) based on engineered non-historical features (amenities and geographical position) are analysed and employed for determining the best proxy for unavailable features.
Original language | English |
---|---|
Title of host publication | Proceedings of Modelling, Identification and Control (MIC2017) |
Editors | M. H. Hamza |
Publisher | ACTA Press |
Pages | 106-112 |
Number of pages | 7 |
ISBN (Print) | 978-0-88986-988-2, 978-0-88986-989-9 |
DOIs | |
Publication status | Published - 1 Mar 2017 |
Event | The 36th IASTED International Conference on Modelling, Identification and Control: MIC 2017 - Innsbruck, Austria Duration: 20 Feb 2017 → 21 Feb 2017 https://www.iasted.org/conferences/pastinfo-848.html |
Publication series
Name | |
---|---|
ISSN (Print) | 1025-8973 |
Conference
Conference | The 36th IASTED International Conference on Modelling, Identification and Control |
---|---|
Country/Territory | Austria |
City | Innsbruck |
Period | 20/02/17 → 21/02/17 |
Internet address |
Keywords
- Data Analytics
- Big Data
- Fuzzy-C-Means
- Random Forests
- K-NN
- Missing Features
- Recommender Systems