8.1.0.2. Learning Algorithms

(packages/gblearn2/gb-trainers.lsh)

Author(s): Yann LeCun

Various learning algorithm classes are defined to train learning machines. Learning machines are generally subclasses of gb-module . Learning algorithm classes include gradient descent for supervised learning, and others.

8.1.0.2.0. eb-trainer

(packages/gblearn2/gb-trainers.lsh)

Abstract class for energy-based learning algorithms. The class contains an input, a (trainable) parameter, and an energy. this is an abstract class from which actual trainers can be derived.

8.1.0.2.1. supervised

(packages/gblearn2/gb-trainers.lsh)

A an abstract trainer class for supervised training with of a feed-forward classifier with discrete class labels. Actual supervised trainers can be derived from this. The machine's fprop method must have four arguments: input, output, energy, and desired output. A call to the machine's fprop must look like this:

   (==> machine fprop input output desired energy)

By default, output must be a class-state , desired an idx0 of int (integer scalar), and energy and idx0-ddstate (or subclasses thereof). The meter passed to the training and testing methods should be a classifier-meter , or any meter whose update method looks like this:

   (==> meter update output desired energy)

where output must be a class-state , desired an idx0 of int, and energy and idx0-ddstate .

8.1.0.2.1.0. (new supervised m p [e in out des])

[CLASS] (packages/gblearn2/gb-trainers.lsh)

create a new supervised trainer. Arguments are as follow:

m : machine to be trained.
p : trainable parameter object of the machine.
e : energy object (by default an idx0-ddstate).
in : input object (by default an idx3-ddstate).
out : output object (by default a class-state).
des : desired output (by default an idx0 of int).

8.1.0.2.1.1. (==> supervised train dsource mtr)

[MSG] (packages/gblearn2/gb-trainers.lsh)

train the machine with on the data source dsource and measure the performance with mtr . This is a dummy method that should be defined by subclasses.

8.1.0.2.1.2. (==> supervised test dsource mtr)

[MSG] (packages/gblearn2/gb-trainers.lsh)

measures the performance over all the samples of data source dsource . mtr must be an appropriate meter.

8.1.0.2.1.3. (==> supervised test-sample dsource mtr i)

[MSG] (packages/gblearn2/gb-trainers.lsh)

measures the performance over a single sample of data source dsource . This leaves the internal state of the meter unchanged, and can be used for a quick test of a whether a particular pattern is correctly recognized or not.

8.1.0.2.2. supervised-gradient

(packages/gblearn2/gb-trainers.lsh)

A basic trainer object for supervised stochastic gradient training of a classifier with discrete class labels. This is a subclass of supervised . The machine's fprop method must have four arguments: input, output, energy, and desired output. A call to the machine's fprop must look like this:

   (==> machine fprop input output desired energy)

where output must be a class-state , desired an idx0 of int (integer scalar), and energy and idx0-ddstate . The meter passed to the training and testing methods should be a classifier-meter , or any meter whose update method looks like this:

   (==> meter update output desired energy)

where output must be a class-state , desired an idx0 of int, and energy and idx0-ddstate . The trainable parameter object must understand the following methods:

(==> param clear-dx) : clear the gradients.
(==> param update eta inertia) : update the parameters with learning rate eta , and momentum term inertia .

If the diagonal hessian estimation is to be used, the param object must also understand:

(==> param clear-ddx) : clear the second derivatives.
(==> param update-ddeltas knew kold) : update average second derivatives.
(==> param compute-epsilons mu ) : set the per-parameter learning rates to the inverse of the sum of the second derivative estimates and mu .

8.1.0.2.2.0. (new supervised-gradient m p [e in out des])

[CLASS] (packages/gblearn2/gb-trainers.lsh)

create a new supervised-gradient trainer. Arguments are as follow:

m : machine to be trained.
p : trainable parameter object of the machine.
e : energy object (by default an idx0-ddstate).
in : input object (by default an idx3-ddstate).
out : output object (by default a class-state).
des : desired output (by default an idx0 of int).

8.1.0.2.2.1. (==> supervised-gradient train-online dsource mtr n eta [inertia] [kappa])

[MSG] (packages/gblearn2/gb-trainers.lsh)

train with stochastic (online) gradient on the next n samples of data source dsource with global learning rate eta . and "momentum term" inertia . Optionally maintain a running average of the weights with positive rate kappa . A negative value for kappa sets a rate equal to - kappa / age . No such update is performed if kappa is 0.

Record performance in mtr . mtr must understand the following methods:

  (==> mtr update age output desired energy)
  (==> mtr info)

where age is the number of calls to parameter updates so far, output is the machine's output (most likely a class-state ), desired is the desired output (most likely an idx0 of int), and energy is an idx0-state . The info should return a list of relevant measurements.

8.1.0.2.2.2. (==> supervised-gradient train dsource mtr eta [inertia] [kappa])

[MSG] (packages/gblearn2/gb-trainers.lsh)

train the machine on all the samples in data source dsource and measure the performance with mtr .

8.1.0.2.2.3. (==> supervised-gradient compute-diaghessian dsource n mu)

[MSG] (packages/gblearn2/gb-trainers.lsh)

Compute per-parameter learning rates (epsilons) using the stochastic diaginal levenberg marquardt method (as described in LeCun et al. "efficient backprop", available at http://yann.lecun.com ). This method computes positive estimates the second derivative of the objective function with respect to each parameter using the Gauss-Newton approximation. dsource is a data source, n is the number of patterns (starting at the current point in the data source) on which the estimate is to be performed. Each parameter-specific learning rate epsilon_i is computed as 1/(H_ii + mu), where H_ii are the diagonal Gauss-Newton estimates and mu is the blowup prevention fudge factor.

8.1.0.2.2.4. (==> supervised-gradient saliencies ds n)

[MSG] (packages/gblearn2/gb-trainers.lsh)

Compute the parameters saliencies as defined in the Optimal Brain Damage algorithm of (LeCun, Denker, Solla, NIPS 1989), available at http://yann.lecun.com. This computes the first and second derivatives of the energy with respect to each parameter averaged over the next n patterns of data source ds . A vector of saliencies is returned. Component i of the vector contains Si = -Gi * Wi + 1/2 Hii * Wi^2 , this is an estimate of how much the energy would increase if the parameter was eliminated (set to zero). Parameters with small saliencies can be eliminated by setting their value and epsilon to zero.