
Benefit and Bias of Approximate Nearest Neighbor Search for Machine Learning and Data Mining
The search for nearest neighbors is an emerging and increasingly vital component in data analysis tasks, for example using vector embedding databases. Typically, the search is the bottleneck in terms of efficiency. Approximate nearest neighbor (ANN) search methods are often employed to speed up the application. However, different methods for ANN search come with different biases that can be positive or negative for the downstream application. In this project, the bias of different ANN methods and its impact on different applications will be studied.