Boosting Selection Of Most Similar Entities In Large Scale Datasets | Sun Analytics

Submitted by michael on Mon, 07/05/2021 - 09:30
Excerpt

Comparing very large feature vectors and picking the best matches, in practice often results in performing a sparse matrix multiplication followed by selecting the top-n multiplication results. In this blog, we implement a customized Cython function for this purpose. When comparing our Cythonic approach to doing the same with SciPy and NumPy functions, our approach improves the speed by about 40% and reduces memory consumption. The GitHub code of our approach is available here.