8.1.0.6. gb-param
(packages/gblearn2/gb-params.lsh)

Author(s): Yann LeCun

a hierarchy of classes that implements gradient-based learning algorithms. Various subclasses of gb-param can be used for various learning algorithms (stochastic gradient, with separate epsilons, with second derivatives.....) The standard gb-param type contains idx1 slots.

8.1.0.6.0. (new-index-offset s dlist o)
(packages/gblearn2/gb-params.lsh)


like new index, but puts the index at a specific offset in the storage s is a storage, dlist a list of dimensions, o an offset. Compilable macro.

8.1.0.6.1. idx1-param
(packages/gblearn2/gb-params.lsh)


a gb-param whose slots are idx1. This is an abstract class (useful classes are subclasses thereof). no learning algorithm is defined. Only the x slot is present. This class is useful for fixed (non adaptive) parameters.

8.1.0.6.1.0. (==> idx1-param resize s)
[MSG] (packages/gblearn2/gb-params.lsh)


resize idx1-param to s elements.

8.1.0.6.1.1. (new idx1-param s sts)
[CLASS] (packages/gblearn2/gb-params.lsh)


create a new idx1-param. s is the size, and sts is the initially allocated size.

8.1.0.6.1.2. (==> idx1-param load s)
[MSG] (packages/gblearn2/gb-params.lsh)


load the values of idx1-param from a the matrix file s , which must be an idx1 of floats. The number of elements in the file s must match the size of idx1-param .

8.1.0.6.1.3. (==> idx1-param save s)
[MSG] (packages/gblearn2/gb-params.lsh)


save the slot x of the idx1-param in file s . This can be subsequently recovered with the load method.

8.1.0.6.1.4. (==> idx1-param size)
[MSG] (packages/gblearn2/gb-params.lsh)


return the number of elements of idx1-param .

8.1.0.6.2. idx1-dparam
(packages/gblearn2/gb-params.lsh)


gb-param class for regular stochastic gradient descent algorithm

8.1.0.6.2.0. (new idx1-dparam s sts)
[CLASS] (packages/gblearn2/gb-params.lsh)


s is a size (possibly zero), and sts is the initial storage size which must be larger than zero, and (can be larger than s to avoid unnecessary reallocs when the size of the param is increased.

8.1.0.6.2.1. (==> idx1-dparam update-deltas knew kold)
[MSG] (packages/gblearn2/gb-params.lsh)


update the average derivatives deltas as follow: deltas = knew*dx + kold*deltas . The update method calls this whenever it is called with a non-zero inertia parameter.

8.1.0.6.2.2. (==> idx1-dparam update eta inertia)
[MSG] (packages/gblearn2/gb-params.lsh)


simple gradient descent update. This will use momentum if the inertia parameter is non zero. CAUTION: the deltas slot is not updated if the inertia is zero. When inertia is non zero, the deltas are updated as follows: deltas = (1-inertia)*dx + inertia*deltas (where dx is the gradient) and the parameters are subsequently updated as follows: x = eta*deltas .

8.1.0.6.3. idx1-dparam-eps
(packages/gblearn2/gb-params.lsh)


gb-param class for gradient descent with separate epsilons for each parameter

8.1.0.6.3.0. (new idx1-dparam-eps s sts)
[CLASS] (packages/gblearn2/gb-params.lsh)


s is a size (possibly zero), and sts is the initial storage size which must be larger than zero, and (can be larger than s to avoid unnecessary reallocs when the size of the param is increased.

8.1.0.6.3.1. (==> idx1-dparam-eps update eta inertia)
[MSG] (packages/gblearn2/gb-params.lsh)


simple gradient descent update with one individual learning rate per parameter. eta is a global learning rate by which each individual parameter learning rate will be multiplied. This will perform an update "with momentum" if the inertia parameter is non zero. CAUTION: the deltas slot is not updated if inertia is zero. When inertia is non zero, the deltas are updated as follows: deltas = (1-inertia)*dx + inertia*deltas (where dx is the gradient) and the parameters are subsequently updated as follows: x = eta*deltas .

8.1.0.6.3.2. (==> idx1-dparam-eps set-epsilons m)
[MSG] (packages/gblearn2/gb-params.lsh)


copy the values in vector m to epsilons

8.1.0.6.3.3. (==> idx1-dparam-eps set-epsilons m)
[MSG] (packages/gblearn2/gb-params.lsh)


copy the values in vector m to epsilons

8.1.0.6.4. idx1-ddparam
(packages/gblearn2/gb-params.lsh)


a gb-param class for the stochastic diagonal levenberg-marquardt algorithm. In addition to the usual update method, it has an update-bbprop method for computing the second derivatives, and a set-epsilons method to set the epsilons using the second derivatives.

8.1.0.6.4.0. (new idx1-ddparam s alloc)
[CLASS] (packages/gblearn2/gb-params.lsh)


s is the size (can be 0) alloc is the size of storages to be preallocated. This will prevent memory fragmentation when the size of the gb-param is subsequently increased.

8.1.0.6.4.1. (==> idx1-ddparam clear-ddx)
[MSG] (packages/gblearn2/gb-params.lsh)


set all the ddx vector slot to zero.

8.1.0.6.4.2. (==> idx1-ddparam clear-ddeltas)
[MSG] (packages/gblearn2/gb-params.lsh)


set all the ddeltas vector slot to zero.

8.1.0.6.4.3. (==> idx1-ddparam update-ddeltas knew kold)
[MSG] (packages/gblearn2/gb-params.lsh)


update average second derivative ddeltas as follows: ddeltas = knew*ddx + kold*ddletas . where ddx is the instantaneous second derivative.

8.1.0.6.4.4. (==> idx1-ddparam update-xaverage kappa)
[MSG] (packages/gblearn2/gb-params.lsh)


Update running average of x xaverage += kappa ( x - xaverage )

8.1.0.6.4.5. (==> idx1-ddparam copy-xaverage)
[MSG] (packages/gblearn2/gb-params.lsh)


Copy contents of xaverage into x

8.1.0.6.4.6. (==> idx1-ddparam swap-xaverage)
[MSG] (packages/gblearn2/gb-params.lsh)


Swap contents of x and xaverage

8.1.0.6.4.7. (==> idx1-ddparam saliencies)
[MSG] (packages/gblearn2/gb-params.lsh)


Compute the parameters saliencies as defined by the Optimal Brain Damage algorithm of (LeCun, Denker, Solla, NIPS 1989). This uses the average first and second derivatives of the energy with respect to each parameter to compute a saliency for each weight. The saliency is an estimate of how much the energy would increase by if the parameter was set to zero. It is computed as: Si = -Gi * Wi + 1/2 Hii * Wi^2 . A vector of saliencies is returned. The deltas and ddeltas field must have relevant values before this function is called.

8.1.0.6.4.8. (==> idx1-ddparam compute-epsilons mu)
[MSG] (packages/gblearn2/gb-params.lsh)


compute and set the epsilons using the second derivative. this method should be called after a few iteration of update-bbprop

8.1.0.6.4.9. Allocating an idx-state within an idx1-ddparam
(packages/gblearn2/gb-params.lsh)


It is often useful to have access to the parameters of a trainable module in two different ways. The first access method is through a slot in the module object (e.g. the slot kernel in a convolutional layer [a.k.a. c-layer]). This slot is generally an idxN-ddstate with an x slot (value), dx slot (gradient), and ddx slot (second derivatives). The second access method is through an idx1-ddparam that collects all the trainable parameters of a learning machine. The functions described here provide a way of allocating multiple idxN-ddstate instances within a single idx1-ddparam. As the modules of a learning machine are created, their paramter states are allocated within a single idx1-ddparam which collects all the parameters.

8.1.0.6.4.9.0. (==> idx1-ddparam alloc-idx0-ddstate)
[MSG] (packages/gblearn2/gb-params.lsh)


Allocate an idx0-ddstate in idx1-ddparam

8.1.0.6.4.9.1. (==> idx1-ddparam alloc-idx1-ddstate d0)
[MSG] (packages/gblearn2/gb-params.lsh)


Allocate an idx1-ddstate of size d0 idx1-ddparam

8.1.0.6.4.9.2. (==> idx1-ddparam alloc-idx2-ddstate d0 d1)
[MSG] (packages/gblearn2/gb-params.lsh)


Allocate an idx2-ddstate of size d0 , d1 idx1-ddparam

8.1.0.6.4.9.3. (==> idx1-ddparam alloc-idx3-ddstate d0 d1 d2)
[MSG] (packages/gblearn2/gb-params.lsh)


Allocate an idx3-ddstate of size d0 , d1 , d2 idx1-ddparam

8.1.0.6.4.9.4. (==> idx1-ddparam alloc-idx4-ddstate d0 d1 d2 d3)
[MSG] (packages/gblearn2/gb-params.lsh)


Allocate an idx4-ddstate of size d0 , d1 , d2 , d3 idx1-ddparam