LSTMs are not transformers, so I guess that’s why they didn’t mention it. But yes, I would be interested in the results of comparisons there.
About the Cherry Picking- I’m not sure it would favor the smaller models, but we can always test that out. A lot of research teams do show that. About it being similar to a regression, the model breakdown is very similar. So it would not surprise me if they had similar results.
Thank you for the paper. I’ll look into it.