Using Randomness Effectively in Deep Learning

Why is it good for DataAugmentation and not for selecting features

Devansh
8 min readMar 30, 2022

I’m a huge believer in introducing noise and randomness into your training data. I believe the performance benefits, ability to generalize, and robustness are all too good to ignore. Thus, I’ve written/talked a lot about it throughout my content. Recently, a reader of mine reached out with an interesting question. He wanted to know why it was that randomness in aspects such as Data Augmentation, but not in selecting features (Garbage In, Garbage Out). I figured this would make for a good topic since I stress the integration of noise and randomness into machine learning pipelines, but haven’t covered why it works so well. In this article, I will cover the principle of Randomness and why it works so well in Machine Learning. To ask me questions or request/recommend topic ideas, scroll to the bottom of this article and connect with me on the different platforms.

To understand this answer, let’s first understand the crucial background information.

How Machine Learning Works, for 5th graders

Traditional Machine Learning, Data Science, Most Artificial Intelligence, and even more complex Deep Learning networks all operate on a few fundamental assumptions:

  1. All the data has an underlying distribution that is consistent across multiple samples. Furthermore, the presence of one sample doesn’t impact the other samples. Statistics nerds know this as the IID (independently and identically distributed)principle.
  2. The distribution of the data and features can be learned as used to infer meaningful information about the data. This could be wrt to predicting targets, grouping data together, forecasting future performance, etc.

Assuming that our data meets these assumptions, we can start implementing Machine Learning models. Our models will simply take the data and attempt to learn the underlying distribution of the data. In other words, a model will attempt to fit a function to the data given.

Different models will attempt things differently. Image taken from this U-Washington page

However, it is important to remember that machine learning models have their own assumptions that they make about the data. This is why different models perform when given different kinds of data distributions, and why it is crucial to consider the context of the problem when making any conclusions from your Data Science/Machine Learning Pipeline.

One of my favorite things about RF is that they don’t make any additional assumptions about the data. This makes them ideal for handling outliers etc. The image is taken from here

This is also why a solid mathematical foundation is key for Machine Learning. It will allow you to understand your data/reports and make the best decisions.

Why the correct features/data is so important

From this perspective, it should be obvious why clean data is so important to model training. And why everyone and their grandma is always screaming about the importance of Data. If you don’t pick good features, or if your data is not representative of the overall (real-world) data distribution, then you end up with useless models.

This is why I stress the importance of dealing with Data Drift in my recommended ML Projects. Image Source

The famous, almost cliche, principle “Garbage In, Garbage Out” is based on this very principle. If you feed in Garbage Data, you will get a lot of garbage output.

But wait a minute. So far, I’ve gotten on a soapbox and preached up random Input and training with noise to anyone who would listen. By definition, random noise does not conform to any distribution. Could it be, that random noise is Garbage (cue the shock)? Then why do we get such fantastic results from it?

Why Data Augmentation contradicts GIGO

Take TrivialAugement, an amazingly performant Data Augmentation policy for Computer Vision that requires no tuning (for my breakdown on it, watch this video). The augmentation policy can produce samples that look very strange. Google also had fantastic results integrating a lot of randomness into their Image Classification pipeline.

For my analysis of this publication, check out this link

This is not limited to Computer Vision Classification tasks. The above snip is taken from Microsoft’s “Efficiently and effectively scaling up language model pretraining for best language representation model on GLUE and SuperGLUE”. The model solves tasks and tests in Natural Language Processing. As highlighted, it utilizes noisy inputs during training to improve robustness.

Taken from Facebook’s Post: Deep learning to translate between programming languages

Even Facebook’s Coding Language Translator integrates faulty code to improve performance. So why does random data augmentation perform so well, despite being garbage? Is GIGO incomplete? Is random noise secretly helpful?

Why Data Augmentation and Noise Work

To answer this question, I want you to look at the tasks we are referring to. When it comes to Data Augmentation, the practice is mostly common in either Natural Language Processing or Computer Vision. Both of these types of Data are inherently diverse.

Think of this article and content creation in general. You’re reading this because the algorithms have somehow decided (either now or in the past)that my style of writing/communication is a match for you. Other people have also covered some of the topics I have talked about, but they often have different audiences. There are 100 ways to write/say the same 3 things.

All the articles here are informative and not clickbait. That’s why I’m the most woke. Follow to join the cult

Computer Vision might not seem that diverse, but that’s where you’d be wrong. Not only do we have a lot of classes and interactions, but images of the same class can be very different. And if adversarial learning has taught us anything, changes that are imperceptible to humans still throw of image classifiers.

When it comes to Machine Learning, we like Chaos. Photo by Markus Spiske on Unsplash

See the trend? Random Data Augmentation works exceptionally well when we are dealing with tasks that have a lot of natural variance. Coding Styles, Pictures, Texts can all be very different while conveying the same basic information. By adding an element of chaos with Random Data Augmentation/Noise you better replicate the real-world diversity you will see in your inputs.

However, there is a consideration you have to make before you start implementing this method.

Context Matters

You know how I’m always highlighting that Machine Learning Papers should be interpreted in context? This is going to be a prime example of that. Don’t just blindly rush in and try to use random noise because all the recent papers seem to be doing it. Why? Look at the table, from the Google paper mentioned earlier.

This extra data is on top of an already massive dataset.

The amounts of data they already had in the baseline were massive. Much more than most people will have when training. Thus when they introduced the random noise/input, they were able to add a new dimension without messing up the foundational fitting function. If your proportion of randomness to clean datasets is too high, you run the risk of messing up your model. To borrow an analogy from Taleb about antifragility, adding noise to your pipeline is ingesting small amounts of poison into your system to build up your immunity (an actual practice called mithridatization). The correct amount will actually work. Too much will mess you up.

Using Synthetic Data- The Future?

We have also seen a new trend of using purely synthetic data in ML pipelines. These can serve various functions such as being inexpensive and avoiding certain privacy regulations. For synthetic data, we see the opposite trend (data tries to resemble real-world data as much as possible). The authors of SinGAN-Seg combined high-level generators and Neural Style Transfer to create extremely realistic and useful medical images.

Read more about “Fake It Till You Make It Face analysis in the wild using synthetic data alone” here

Microsoft’s “Fake It Till You Make It Face analysis in the wild using synthetic data alone” does something similar. It uses exceptional tech to train a facial feature detector using only synthetic faces. They tested that on real faces, and the results were SOTA. You can read my analysis here.

Using very fine-grained synthetic data in the pipeline might be the future of countering threats such as Deepfakes. This would add another layer to data augmentation. As the pipelines get more complex, these very distinct fields will start to converge and we will see projects and infrastructures using multiple techniques. While this would allow for powerful solutions, increasingly complex systems can also become problematic. Research has shown that ML pipelines (and our evaluation of them) are often susceptible to arbitrary factors. To learn more about this, and how to counter this, watch this video.

That’s it for this article. If you want to get good at Machine Learning, foundational skills are a must. This article is a step-by-step guide for those of you looking to get into Machine Learning. It links to tons of free resources that you can use to improve your skills.

It’s promo time. But real talk, my newsletter (Substack) has helped a lot of people out.

To truly get good at Machine Learning, a base in Software Engineering will be crucial. They will help you conceptualize, build, and optimize your ML. My daily newsletter, Coding Interviews Made Simple covers topics in Algorithm Design, Math, Recent Events in Tech, Software Engineering, and much more to make you a better developer. I am currently running a 20% discount for a WHOLE YEAR, so make sure to check it out.

Think of the ROI that this student made by subscribing.

I created Coding Interviews Made Simple using new techniques discovered through tutoring multiple people into top tech firms. The newsletter is designed to help you succeed, saving you from hours wasted on the Leetcode grind.

To help me write better articles and understand you fill out this survey (anonymous). It will take 3 minutes at most and allow me to improve the quality of my work.

Feel free to reach out if you have any interesting jobs/projects/ideas for me as well. Always happy to hear you out.

For monetary support of my work following are my Venmo and Paypal. Any amount is appreciated and helps a lot. Donations unlock exclusive content such as paper analysis, special code, consultations, and specific coaching:

Venmo: https://account.venmo.com/u/FNU-Devansh

Paypal: paypal.me/ISeeThings

Reach out to me

Use the links below to check out my other content, learn more about tutoring, or just to say hi. Also, check out the free Robinhood referral link. We both get a free stock (you don’t have to put any money), and there is no risk to you. So not using it is just losing free money.

Check out my other articles on Medium. : https://rb.gy/zn1aiu

My YouTube: https://rb.gy/88iwdd

Reach out to me on LinkedIn. Let’s connect: https://rb.gy/m5ok2y

My Instagram: https://rb.gy/gmvuy9

My Twitter: https://twitter.com/Machine01776819

If you’re preparing for coding/technical interviews: https://codinginterviewsmadesimple.substack.com/

Get a free stock on Robinhood: https://join.robinhood.com/fnud75

--

--

Devansh

Writing about AI, Math, the Tech Industry and whatever else interests me. Join my cult to gain inner peace and to support my crippling chocolate milk addiction