Approximate Nearest Neighbor Search with Hashing and Quantization

  • Weixiang Hong

Student thesis: Doctoral Thesis

Abstract

Nearest Neighbor Search (NN Search) pursues the data item whose distance to the query item is the smallest. Due to the heavy computational cost of NN Search, a lot of efforts have recently been devoted to Approximate Nearest Neighbor Search (ANN Search), which aims to efficiently find the data items with sufficiently small (not necessarily the smallest) distances to query item. An appealing solution for ANN Search is to accelerate per-item distance computation via hashing or vector quantization.
This thesis presents my contributions to hashing and vector quantization in three folds: 1). The acceleration of hashing computation. I proposed Fried Binary Embedding (FBE) and Tensorized Projections (TP) for generating effective long hashing codes efficiently. For example, FBE not only demonstrates promising search accuracy, but also accelerates the complexity of hashing computation from O(n2) to O(n log n). 2). The combination of hashing and deep learning. I incorporated the Linear Discriminative Analysis on top of deep neural networks, which leads to intra-class compact and inter-class separable hash codes. 3). The improvement to vector quantization. I eliminated the L2 norm computation in vector quantization elegantly via asymmetric mapping. Also, a distributed optimization algorithm is developed to train vector quantization methods in distributed settings.
In future, the following two directions in ANN Search are worth further investigation: 1). Develop deep vector quantization algorithms in a similar way to deep hashing, i.e., to tackle vector quantization in an end-to-end manner by deep learning; 2). Build ANN Search systems (such as FAISS by Facebook, ScaNN by Google) by combining the best of both worlds, i.e., to incorporate indexing and hashing/vector quantization.
Date of Award31 Mar 2023
Original languageEnglish
SupervisorHonghai Liu (Supervisor)

Cite this

'