GD : Num pType => Neg pType => Random pType => {auto mon : ComMonoid pType} -> FromDouble pType => {default 0.001 _ : pType} -> Optimiser (Const pType) () Gradient descent optimiser. Has trivial state
@lr is the learning rate
Visibility: public exportGA : Num pType => Neg pType => Random pType => {auto mon : ComMonoid pType} -> FromDouble pType => {default 0.001 _ : pType} -> Optimiser (Const pType) () Gradient ascent optimiser. Has trivial state
@lr is the learning rate
Visibility: public exportmomentumUpdate : Num pType => Neg pType => pType -> pType -> pType -> pType -> (pType, pType)- Visibility: public export
lookAhead : Num pType => pType -> pType -> pType -> pType- Visibility: public export
GDMomentum : Num pType => Neg pType => Random pType => {auto mon : ComMonoid pType} -> FromDouble pType => {default False _ : Bool} -> {default 0.001 _ : pType} -> {default 0.9 _ : pType} -> Optimiser (Const pType) pType Gradient Descent with momentum, optionally with Nesterov acceleration
Visibility: public exportadamUpdate : Num pType => Neg pType => Fractional pType => Sqrt pType => pType -> pType -> pType -> pType -> pType -> pType -> pType -> pType -> pType -> (pType, (pType, (pType, (pType, pType)))) Adam update step
State is (m, v, beta1^t, beta2^t) where m and v are the first and second
moment estimates, and beta1^t, beta2^t are running powers for bias correction
Visibility: public exportAdam : Num pType => Neg pType => Random pType => {auto mon : ComMonoid pType} -> FromDouble pType => Fractional pType => Sqrt pType => {default 0.001 _ : pType} -> {default 0.9 _ : pType} -> {default 0.999 _ : pType} -> {default 1e-8 _ : pType} -> Optimiser (Const pType) (pType, (pType, (pType, pType))) Adam optimiser (Kingma & Ba, 2014)
Using 4 parameters for state for efficiency
@lr is the learning rate
@beta1 is the exponential decay rate for the first moment estimate
@beta2 is the exponential decay rate for the second moment estimate
@epsilon is a small constant for numerical stability
Visibility: public export