Amazon’s Amazing Universal Graph Traversal Algorithm

How Amazon Improved K-Nearest Neighbor Search on Graphs by 60%

Devansh
9 min read9 hours ago

As I’ve stated many times (especially on my primary newsletter, AI Made Simple)- I really like the Amazon Science blog. Of all the big-tech research groups- I find their work to be the most practical and direct. In this newsletter, I wanted to highlight their work with the publication- “FINGER: Fast inference for graph-based approximate nearest neighbor search” for two key reasons-

  1. Firstly, their approach is universal and can be applied to any graph constructed (since the optimization is done on the search side, not on the graph construction). This is an underrated aspect of optimizations, and the universality of this algorithm makes testing it out very useful.
  2. It hits some amazing numbers, “FINGER significantly outperforms existing acceleration approaches and conventional libraries by 20% to 60% across different benchmark datasets
Figure 4: Experimental results of HNSW graph-based methods. Throughput versus Recall@10 chart is plotted for all datasets. Top row presents datasets with 𝐿2 distance measure and bottom row presents datasets with angular distance measure. We can observe a significant performance gain of FINGER over all existing HNSW graph-based implementations. Best viewed in color

I’ve been experimenting with this idea for the Knowledge Graphing + GraphRAG for our work at IQIDIS AI, a very, very promising Legal AI platform that I’m working on building (more details here if you want to sign up for an early preview). While I don’t really have much to say right now (too many aspects to consider before I can make assessments of my own), I figured sharing this idea would be interesting to y’all, especially given how popular graphs are in building search/generation based products.

To evaluate our approach, we compared FINGER’s performance to that of three prior graph-based approximation methods on three different datasets. Across a range of different recall10@10 rates — or the rate at which the model found the query’s true nearest neighbor among its 10 top candidates — FINGER searched more efficiently than all of its predecessors. Sometimes the difference was quite dramatic — 50%, on one dataset, at the high recall rate of 98%, and almost 88% on another dataset, at the recall rate of 86%.

To understand how this (or any other powerful algorithm in general) works, it can be very helpful first to understand the high-level intuition for it. This can help us develop a deeper appreciation for the algorithmic design choices- and even think about how we might want to change an industry-standard solution into something that matches our requirements.

How FINGER works

The secret to this performance lies in a fairly simple, but important insight- “when calculating the distance between the query and points that are farther away than any of the candidates currently on the list, an approximate distance measure will usually suffice.” Essentially, if I only want to consider top-5 similarity scores, then the exact distance of no. 6 and below is irrelevant to me. I can skip the heavy-duty distance computations in that case. I also don’t want to consider points that are further than my upper-bound distance. And as it turns out that after a while, “over 80 % of distance calculations are larger than the upper-bound. Using greedy graph search will inevitably waste a significant amount of computing time on non-influential operations.

FINGER defines the distance between a query vector, q, and a new graph node vector, d, by reference to the vector of a previously explored node, c. Both q and c can be represented as the sums of projections along c (qproj and dproj) and “residual” vectors (qres and dres) orthogonal to c.

This leads Amazon to explore approximations for the distance measurement for the points further away. They finally settle on an approximation based on geometry + some vector algebra. Take a query vector, q, our current node whose neighbors are being explored, c, and one of c’s neighbors, d, whose distance from q we’ll compute next. “Both q and d can be represented as the sums of projections along c and “residual vectors” perpendicular to c. This is, essentially, to treat c as a basis vector of the space.

The phrase “Basis Vector of the Space” might seem intimidating. Let’s take a teeny moment to understand what it is. I think it’s a super important idea, so I think it’s worth a brief detour-

Vector Spaces

Put simply, a Vector Space is the universe of a bunch of objects, which we call Vectors (it’s literally a space of Vectors). However, as with all good things in life, not every universe is good enough to get that special Vector Space Label. No, sir, we have standards here. For a space to be a vector space, you need to be able to:

  1. Add vectors together: If you have two vectors, you should be able to combine them to get another vector in the same space.
  2. Multiply vectors by numbers (scalars): You should be able to stretch, shrink, or even reverse the direction of a vector by multiplying it by a number.

If you're doing more formal math, it’s important to know the axioms around Vector Spaces, but we can skip them here (they are fairly simple, but a lot of High Level Math is getting very specific about definitions and concepts so that someone can else can figure out how our world changes when you decide to ditch those definitions and create new ones)-

Good guide here.

Vectors Spaces are a deceptively simple idea that pop up in a lot of powerful areas for computation (this is one of the reasons why so many people make you learn Linear Algebra in Computer Science courses).

One of the ideas that go hand with VSs are Basis Vectors. In fact these are so tied in together that you already use Basis Vectors all the time w/o realizing it-

Basis Vectors

Let’s say we have a huge, thriving universe with lots of Vectors vectoring around. Now, within this vector space, how do we describe the location of a specific point or a specific vector? That’s where basis vectors come in. They provide the framework for navigating and describing any vector within the space.

The x,y, and z planes that you’re all no doubt familiar with are basis vectors. You can change the Basis Vectors of the space, to play with the Geometry. Math has a lot fewer rules than you’d think which is what it makes it so much fun (and also terrifying- given how much of our world is built on unprovable assumptions, which is exactly what I want to think about the next time you’re driving on a bridge or in an underground Subway).

By breaking down a vector space into its fundamental building blocks, basis vectors make it easier to analyze and understand its properties. I do a lot of Linear Algebra, and read a lot lot more AI Research, and most of my reading/thinking is done in context of Basis Vectors, not the Spaces themselves.

Once again, we don’t just label any Vector that comes our way. Basis vectors are a special set of vectors that have two key properties:

  1. Spanning: Given that we use them for navigation, basis vectors can be combined (through addition and scalar multiplication) to create any other vector in the space. They wouldn’t be any good for navigation in the Vector Space if they couldn’t reach every point in it.
  2. Linear Independence: None of the basis vectors can be created by combining the others. If your basis vector is so unspecial that anything it gives you can be given to you by 1,2, or more (we don’t judge)- then it should be dropped. If any of you work in ML Engineering- this is one of the reasons why things like collinearity within features can be a problem. It’s also why the Complex Plane seems so interesting, but that’s for another day-
Source

Back to the main topic- how Did Amazon Optimize their Graph Similarity searches. Let’s take a look into the math-

Understanding FINGER

As mentioned earlier, Amazon’s algorithm FINGER relies on Geometry and Linear Algebra. Let’s look into the idea in more detail here-

Firstly we know that if the algorithm is exploring neighbors of c, we have already calculated the distance between c and q. By combining that with some precomputations (a list of which can be found in Section B of the appendix), “estimating the distance between q and d is simply a matter of estimating the angle between their residual vectors.”

So how do we get that angle? For that we use the angles between the residual vectors of c’s immediate neighbors. “The idea is that, if q is close enough to c that c is worth exploring, then if q were part of the graph, it would probably be one of c’s nearest neighbors. Consequently, the relationships between the residual vectors of c’s other neighbors tell us something about the relationships between the residual vector of one of those neighbors — d — and q’s residual vector.

Figure 2: Empirical observation that distances between query and most points in a database are larger than the upper-bound. (a) shows results on FashionMNIST-60K-784 dataset and (b) shows results on Glove-1.2M-100 dataset. We observed that starting from the 5th step of greedy graph search (i.e., running line 2 in Algorithm 1 five times), both experiments show more than 80% of data points will be larger than the current upper-bound. These distance computations won’t affect search updates.

There are some derivations, but I’m not going to touch upon them here b/c I really have nothing to add to them. Going over it here would be the newsletter equivalent of reading off the slides. If you want to check the math, check out Section 3.2 of the paper. For the rest of us let’s look at the FINGER algorithm.

Nothing too crazy, assuming you accept the math for the approximation. Aside from that, the big things to note would be line 14, where we switch to approximate distance computations to account for the observation that “starting from the 5th step of greedy graph search both experiments show more than 80% of data points will be larger than the current upper-bound.

As mentioned earlier, this relatively simple approach leads to some very solid performance gains, “We can see that on the three selected datasets, HNSW-FINGER all performs 10%-15% better than HNSW-FINGER-RP. This results directly validates the proposed basis generation scheme is better than random Gaussian vectors. Also notice that HNSW-FINGER-RP actually performs better than HNSW(pecos) on all datasets. This further validates that the proposed approximation scheme is useful, and we could use different angle estimations methods to achieve good acceleration

Pretty cool, huh? There’s a lot of nonsense going on about the end of AI, or that LLMs are hitting a wall. This is completely wrong. Research always runs into dead-ends, that doesn’t mean the research itself wasn’t meaningful. Sometimes the challenges/dead-ends for one branch of research catalyze a new direction. The costs/difficulty of Graph Search led to some very interesting work in graph optimizations, which will influence a lot of others. Ignore the fear/rage baiters, humanity and science are not slowing down anytime soon- and new(ish) developments like FINGER are great examples of this.

Thank you for reading and hope you have a wonderful day.

Dev ❤

If you liked this article and wish to share it, please refer to the following guidelines.

I put a lot of work into writing this newsletter. To do so, I rely on you for support. If a few more people choose to become paid subscribers, the Chocolate Milk Cult can continue to provide high-quality and accessible education and opportunities to anyone who needs it. If you think this mission is worth contributing to, please consider a premium subscription. You can do so for less than the cost of a Netflix Subscription.

Upgrade to paid

Many companies have a learning budget, and you can expense your subscription through that budget. You can use the following for an email template.

That is it for this piece. I appreciate your time. As always, if you’re interested in working with me or checking out my other work, my links will be at the end of this email/post. And if you found value in this write-up, I would appreciate you sharing it with more people. It is word-of-mouth referrals like yours that help me grow. You can share your testimonials over here.

I regularly share mini-updates on what I read on the Microblogging sites X(https://twitter.com/Machine01776819), Threads(https://www.threads.net/@iseethings404), and TikTok(https://www.tiktok.com/@devansh_ai_made_simple)- so follow me there if you’re interested in keeping up with my learnings.

Reach out to me

Use the links below to check out my other content, learn more about tutoring, reach out to me about projects, or just to say hi.

Small Snippets about Tech, AI and Machine Learning over here

AI Newsletter- https://artificialintelligencemadesimple.substack.com/

My grandma’s favorite Tech Newsletter- https://codinginterviewsmadesimple.substack.com/

Check out my other articles on Medium. : https://rb.gy/zn1aiu

My YouTube: https://rb.gy/88iwdd

Reach out to me on LinkedIn. Let’s connect: https://rb.gy/m5ok2y

My Instagram: https://rb.gy/gmvuy9

My Twitter: https://twitter.com/Machine01776819

--

--

Devansh
Devansh

Written by Devansh

Writing about AI, Math, the Tech Industry and whatever else interests me. Join my cult to gain inner peace and to support my crippling chocolate milk addiction