We study the problem of fitting task-specific learning rate schedules from the perspective of hyperparameter optimization, aiming at good generalization. We describe the structure of the gradient of a validation error w.r.t. the learning rate schedule – the hypergradient. Based on this, we introduce MARTHE, a novel online algorithm guided by cheap approximations of the hypergradient that uses past information from the optimization trajectory to simulate future behaviour. It interpolates between two recent techniques, RTHO [Franceschi et al., 2017] and HD [Baydin et al., 2018], and is able to produce learning rate schedules that are more stable leading to models that generalize better.
MARTHE: Scheduling the Learning Rate Via Online Hypergradients / Michele Donini, Luca Franceschi, Orchid Majumder, Massimiliano Pontil, Paolo Frasconi. - ELETTRONICO. - (2020), pp. 2119-2125. (Intervento presentato al convegno International Joint Conference on Artificial Intelligence).
MARTHE: Scheduling the Learning Rate Via Online Hypergradients
Paolo Frasconi
2020
Abstract
We study the problem of fitting task-specific learning rate schedules from the perspective of hyperparameter optimization, aiming at good generalization. We describe the structure of the gradient of a validation error w.r.t. the learning rate schedule – the hypergradient. Based on this, we introduce MARTHE, a novel online algorithm guided by cheap approximations of the hypergradient that uses past information from the optimization trajectory to simulate future behaviour. It interpolates between two recent techniques, RTHO [Franceschi et al., 2017] and HD [Baydin et al., 2018], and is able to produce learning rate schedules that are more stable leading to models that generalize better.File | Dimensione | Formato | |
---|---|---|---|
0293.pdf
accesso aperto
Descrizione: Articolo Principale
Tipologia:
Pdf editoriale (Version of record)
Licenza:
Tutti i diritti riservati
Dimensione
500.43 kB
Formato
Adobe PDF
|
500.43 kB | Adobe PDF |
I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.