Devansh
1 min readJan 12, 2022

--

You bring up a valid point. Wrt to the point you mention about the optimizers being similar in terms of direction, it's something I allude to in on my respnses to a question I got about this paper. So I would be inclined to agree with you. It's a possiblity that the difference in magnitude of the protocols is much higher than their directions. If that is the case, it would be interesting to learn more about it.

However, there is a point of distinction that I would like to bring up. While you're correct in saying that the update rules look similar in terms of formulations, the idea is that with higher dimensions, similar looking formulae will start to diverge. So the update directions should in theory start to diverge. This paper is more off a validation to a belief than a completely new idea. The dominance of magnitude was certainly unexpected (atleast to me).

I'm going to share an article written on Medium that does a fantastic job showing how different protocols end up doing in some scenarios. If you're still not convinced, feel free to DM me and we can have a longer discussion. It will be linked as a response to this article.

--

--

Devansh
Devansh

Written by Devansh

Writing about AI, Math, the Tech Industry and whatever else interests me. Join my cult to gain inner peace and to support my crippling chocolate milk addiction

Responses (1)