Menu Close

ELLIS Young Innovators Talk: Frederik Kunstner

๐Ÿ“… Date: Friday, March 27, 2026
๐Ÿ•š Time: 13:30 โ€“ 14:30
Location: Heinzel Seminar Room, Office Building West



Title: Why language models are difficult to train without Adam


Abstract:

Adam is the default optimizer to train language models, because gradient descent is too slow. In this talk we’ll try to understand why. We revisit common interpretations of Adam and why they are insufficient to explain the observed performance gap, and instead show that Adam fixes a problem coming from text data. In text, a few words are very frequent but there also is a long tail of infrequent words. We show experimentally that the performance gap is related to this frequency imbalance, and study a simplified language model when this phenomenon can be formalized