Using Surrogate Gradients and STE in Machine Learning

Solving one of Neural Network’s Biggest Challenges

Devansh
3 min readAug 17, 2024

Neural Networks are very powerful but they are held back by one huge weakness- their reliance on gradients. When building solutions in real-life scenarios, you won’t always have a differential search space to work with, making gradient computations harder. Let’s talk about a way to tackle this-

Straight Through Estimators (STEs)

STEs address this by allowing backpropagation through functions that are not inherently differentiable. Imagine a step function, which is essential in many scenarios, but its gradient is zero almost everywhere. STEs bypass this by using an approximate gradient during backpropagation. It’s like replacing a rigid wall with a slightly permeable membrane, allowing information to flow even where it shouldn’t, mathematically speaking.

Surrogate Gradients

Similar to STEs, surrogate gradients offer a way to train neural networks with non-differentiable components. They replace the true gradient of a function with an approximation that is differentiable. This allows backpropagation to proceed through layers that would otherwise block the flow of gradient information.

Why They Matter

These techniques are invaluable for:

  • Binarized Neural Networks: where weights and activations are constrained to be either -1 or 1, greatly improving efficiency on resource-limited devices
  • Quantized Neural Networks: where weights and activations are represented with lower precision, reducing memory footprint and computational cost
  • Reinforcement Learning: where actions might be discrete or environments might have non-differentiable dynamics

Fundamentally, surrogate training elements (STEs) and surrogate gradients serve as powerful tools that bridge the gap between the abstract world of gradients and the practical constraints of problem-solving. They unleash the full potential of neural networks in scenarios where traditional backpropagation falls short, allowing for the creation of more efficient and flexible solutions.

One powerful use-case we’ve recently seen with them has been the implementation of Matrix Multiplication Free LLMs, which use surrogate gradients (STE) to handle the ternary weights and quantization. By doing so, they are able to drop their memory requirements by 61% in unoptimized kernels and 10x in optimized settings.

Read more about MatMul Free LLMs and how they use STE over here-

If you like this article, please consider becoming a premium subscriber to AI Made Simple so I can spend more time researching and sharing information on truly important topics. We have a pay-what-you-can model, which lets you support my efforts to bring high-quality AI Education to everyone for less than the price of a cup of coffee (click here to learn more).

I provide various consulting and advisory services. If you‘d like to explore how we can work together, reach out to me through any of my socials over here or reply to this email.

I regularly share mini-updates on what I read on the Microblogging sites X(https://twitter.com/Machine01776819), Threads(https://www.threads.net/@iseethings404), and TikTok(https://www.tiktok.com/@devansh_ai_made_simple)- so follow me there if you’re interested in keeping up with my learnings.

Reach out to me

Use the links below to check out my other content, learn more about tutoring, reach out to me about projects, or just to say hi.

Small Snippets about Tech, AI and Machine Learning over here

AI Newsletter- https://artificialintelligencemadesimple.substack.com/

My grandma’s favorite Tech Newsletter- https://codinginterviewsmadesimple.substack.com/

Check out my other articles on Medium. : https://rb.gy/zn1aiu

My YouTube: https://rb.gy/88iwdd

Reach out to me on LinkedIn. Let’s connect: https://rb.gy/m5ok2y

My Instagram: https://rb.gy/gmvuy9

My Twitter: https://twitter.com/Machine01776819

--

--

Devansh
Devansh

Written by Devansh

Writing about AI, Math, the Tech Industry and whatever else interests me. Join my cult to gain inner peace and to support my crippling chocolate milk addiction

No responses yet