Paper shows why you will struggle at Machine Learning
This is easily the most technically complex paper I’ve ever read.
I’ve been going Multiplying Matrices without Matrices (link: https://arxiv.org/abs/2106.10860). And it’s a paper I have spent a lot of time on. How can I not? The abstract claims, “Experiments using hundreds of matrices from diverse domains show that it often runs 100× faster than exact matrix products and 10× faster than current approximate methods. In the common case that one matrix is known ahead of time, our method also has the interesting property that it requires zero multiply-adds.” If you understand machine learning, this has huge implications for the learning process.
At the same time, I came across the above tweet on my timeline. And I can definitely see where this is coming from. Meaningful ML is by its nature multi-disciplinary. While the code for an LSTM and Random Forests stay the same, the context around the problem changes. Depending on what you’re working on, the way you get, prepare, clean, and evaluate your data changes. Thus you will end up needing to become proficient at multiple things. This process involves a lot of Googling and can be very frustrating/disheartening.
The paper is a rather extreme example of that. I double major in Math and Computer Science. Selected my courses to get good at Coding and AI/ML in particular. So I’m well suited to understanding the details. But even after a month, a lot of this paper is very challenging.
In this article, I will use the paper as an example of why good Machine Learning is difficult. I will explain why that’s a good thing for you, and what you can do to benefit from this. If nothing else, I hope that by the end of this article you understand what it takes to get to a high level at ML.
Understanding the Implications of this paper
A quick word on why this paper is greatness. In machine learning data points are represented as multi-dimensional matrices. Multiplying matrices is very important for a lot of functions. It is also notoriously difficult. To those interested, this article by Quanta is pretty good to understand.
This is where the paper gets insane. “In the common case that one matrix is known ahead of time, our method also has the interesting property that it requires zero multiply-adds.” When might we see such cases? Imagine our model has the weights and just needs to compute the predictions based on input. The weights are a matrix we know which will be multiplied with the input matrix. Given how much this process happens, your savings will really add up.
Why this paper is a nightmare to understand.
So now that we have some idea of why this concept is important let’s talk about why this paper is challenging. Simply put, it traverses a lot of technical fields. Here’s a depiction of the Product Quantization they use:
Not only is it using Vectors, but it also relies on prototype learning, hashing, and aggregation. This would require very good coding and mathematical skills. Even their hashing is far from basic. The authors rely on hashing trees, which can be terryfing. Check out section 4.1 for more details. The complexity and wide-ranging nature of the paper was best articulated by the authors as “our work draws on a number of different fields but does not fit cleanly into any of them”. Developing your understanding of the basics will help you at least understand the assumptions and experiment setups.
For a detailed look at some of the assumptions in the paper, check out this video. I go over the assumptions, a concrete example of the matrix multiplication approximation. Make sure to pause the video and read the snippets I’ve taken from the paper. I found them particularly insightful.
Why this complexity is a Good Thing for you
Obviously not every Machine Learning/AI venture is as complex as this paper. However, real-life ML will be complex. Following is an exchange I had with someone who read and enjoyed my article, 5 Unsexy Truths About Working in Machine Learning.
The complexity of Machine Learning opens a lot of doors. It means that there is always new ways to try things, new knowledge to discover, new protocols/ensembles to invent. It will allow you to specialize in the fields you’re most interested in. If you’re willing to put in the work and struggle, you will soon be able to develop your own value-adds. And that’s when it gets fun. How to become a Machine Learning Expert is an article to help you speed up the process. As long as you’re willing to find areas you’re interested in and dive into them, you will be able to get great results in your Machine Learning Journeys.
If you liked this article, check out my other content. I post regularly on Medium, YouTube, Twitter, and Substack (all linked below). I focus on Artificial Intelligence, Machine Learning, Technology, and Software Development. If you’re preparing for coding interviews check out: Coding Interviews Made Simple.
For one-time support of my work following are my Venmo and Paypal. Any amount is appreciated and helps a lot:
Venmo: https://account.venmo.com/u/FNU-Devansh
Paypal: paypal.me/ISeeThings
Reach out to me
If that article got you interested in reaching out to me, then this section is for you. You can reach out to me on any of the platforms, or check out any of my other content. If you’d like to discuss tutoring, text me on LinkedIn, IG, or Twitter. If you’d like to support my work, using my free Robinhood referral link. We both get a free stock, and there is no risk to you. So not using it is just losing free money.
Check out my other articles on Medium. : https://rb.gy/zn1aiu
My YouTube: https://rb.gy/88iwdd
Reach out to me on LinkedIn. Let’s connect: https://rb.gy/m5ok2y
My Instagram: https://rb.gy/gmvuy9
My Twitter: https://twitter.com/Machine01776819
If you’re preparing for coding interviews: https://codinginterviewsmadesimple.substack.com/
Get a free stock on Robinhood: https://join.robinhood.com/fnud75