Evaluating Label Dispersion. Is it the Best Metric for Evaluating Model Uncertainty?

An interesting idea. However, how does it hold up in comparison?

Devansh
6 min readAug 12, 2021

If you’re following my works for some time, you will see that I cover a lot of Computer Vision Research. Computer Vision has seen some really innovative uses of traditional Machine Learning techniques, alongside heavy investment. And for good reason. Images are an expensive input to process. Given how useful Computer Vision is in various lucrative applications, there are huge incentives to create a low-cost, high-performance learning agent. Among these various innovations is a very interesting Machine Learning protocol called Active Learning. Done right, Active Learning will lead to lower training costs while not sacrificing performance. To maximize performance in Active Learning, we need a reliable way to evaluate how confident a model is with a particular sample. The authors of “When Deep Learners Change Their Mind: Learning Dynamics for Active Learning” present a new metric for doing just that.

The Table shows us that Label Dispersion has better performance than other Active Learning Protocols

In this article, I will introduce the basic idea of active learning, and why you should keep your eye on it. Once we have a good understanding of it, we will go into Label Dispersion, their proposed metric for evaluating uncertainty in a trained machine learning network. We will look at the concept, and evaluate the performance. As always, an annotated version of the paper is linked in the end. Go over it to know my detailed thoughts on the paper, and their methods (beyond what I can go into here). If this article is useful to you, be sure to clap and share it with other people. Feedback is always appreciated.

Understanding Active Learning

This is the process used by the authors.

A good definition of Active Learning is, “Active learning is a special case of machine learning in which a learning algorithm can interactively query a user (or some other information source) to label new data points with the desired outputs.” In simple words, after training with an initial labeled data set, we traverse through an unlabeled dataset to select the “best” samples. These samples are sent to a teacher (could even be a human)who then labels the data sample and adds it to the labeled pool. The model is retrained. You might be wondering what benefit this has over something like Semi-Supervised Learning. Having a human annotate the data samples is expensive, so why not just implement SSL and use pseudo-labels?

Make sure you like and Subscribe.

The answer is simple. While the cost of annotation per data sample is higher, Active Learning uses a fraction of the total samples. This is the secret. Active Learning offsets the need for immense data pools typically seen in Computer Vision Research (literal Petabytes of data). To get a more comprehensive understanding of this topic, check out this video. It’s not necessary to understand the rest of this article, but Active Learning is definitely something you should know about for the future.

So how is Model Uncertainty relevant in Active Learning?

Quick answer, most Active Learning agents select samples they are most uncertain about. The logic is the following. If your model is already confident about a sample, annotating and adding that to the training pool is kind of useless. Samples models aren’t confident about will work best when annotated and added to our list. Therefore, an accurate way to evaluate the uncertainty of a model is our best bet in creating a high-performing Active Learning learning Agent.

Why Network Confidence is not Enough

The above image explains our problem clearly. Often Neural Networks can be very confident about their predictions, even when that prediction is actually wrong. This can cause two problems:

  1. They will pass by samples that they should be learning because of overconfidence. They will think they understand a sample really well. Take a look at image d. The network has 0.99 confidence on a wrong prediction.
  2. Similarly, the model will pick other images where it has faulty lower confidence. This is not as bad as the first case, but it makes your learning more inefficient. For larger-scale projects, this can add up.

Label Dispersion: A new metric

Now that we see why Model Confidence by itself is not the best metric, let’s look at Label Dispersion. For me, there are a few considerations that make a metric great. One is performance (obviously). It should also be relatively inexpensive to compute. And lastly, the metric should make sense (on a logical level).

The formula for calculating dispersion.

To the left is the math they use to calculate label dispersion. The notation might seem intimidating but the idea is deceptively simple. We have our model predict the label of an image a bunch of times. If we have the same prediction, we will have a low label dispersion. Look at image a below. Since the model always predicts a car, it has a low dispersion. If we have different labels being predicted, then our model is “confused” about them. This means a higher value. While the other images (b,c,d) have similar prediction confidence, they have significantly higher label dispersion values.

Look at it again

This approach checks off 2 of our 3 criteria. It makes sense on an intuitive level and is relatively inexpensive to compute (predicting a sample using a trained model is always cheap). The table, in the beginning, showed promising results. Let’s look at other performances to see how well Label Dispersion hold up.

On the benchmarks CIFAR10 and CIFAR100, we see dispersion performing very well in terms of accuracy. This is a very promising sign and shows the potential of Label Dispersion as a metric for calculating model uncertainty. This checks off our final criteria on the checklist.

Conclusion

Based on the paper results, Label Dispersion is certainly a promising metric that we can use in Active Learning. I will be looking into it more. Being a new metric, it needs some more testing in more contexts and benchmarks before anything conclusive can be said. However, it is a very promising metric so keep your eyes open for it. People will definitely be using more in the future.

Paper

As promised, here is the annotated paper that you can read to get more insight.

Reach Out to Me

If the article got you interested in reaching out to me, then this section is for you. You can reach out to me on any of the platforms, or check out any of my other content. If you’d like to discuss tutoring, text me on LinkedIn, IG, or Twitter. I help people with Machine Learning, AI, Math, Computer Science, and Coding Interviews.

If you’d like to support my work, using my free Robinhood referral link. We both get a free stock, and there is no risk to you. So not using it is just losing free money.

Check out my other articles on Medium. : https://rb.gy/zn1aiu

My YouTube: https://rb.gy/88iwdd

Reach out to me on LinkedIn. Let’s connect: https://rb.gy/m5ok2y

My Instagram: https://rb.gy/gmvuy9

My Twitter: https://twitter.com/Machine01776819

My Substack: https://devanshacc.substack.com/

Live conversations at twitch here: https://rb.gy/zlhk9y

Get a free stock on Robinhood: https://join.robinhood.com/fnud75

--

--

Devansh
Devansh

Written by Devansh

Writing about AI, Math, the Tech Industry and whatever else interests me. Join my cult to gain inner peace and to support my crippling chocolate milk addiction

No responses yet