8.1.0.2. Learning Algorithms
|
(packages/gblearn2/gb-trainers.lsh) |
Author(s): Yann LeCun
Various learning algorithm classes are defined to train learning
machines. Learning machines are generally subclasses of
gb-module . Learning algorithm classes include gradient
descent for supervised learning, and others.
8.1.0.2.0. eb-trainer
|
(packages/gblearn2/gb-trainers.lsh) |
Abstract class for energy-based learning algorithms. The class contains
an input, a (trainable) parameter, and an energy. this is an abstract
class from which actual trainers can be derived.
8.1.0.2.1. supervised
|
(packages/gblearn2/gb-trainers.lsh) |
A an abstract trainer class for supervised training with of a
feed-forward classifier with discrete class labels. Actual supervised
trainers can be derived from this. The machine's fprop method must have
four arguments: input, output, energy, and desired output. A call to the
machine's fprop must look like this:
(==> machine fprop input output desired energy)
By default, output must be a
class-state , desired an
idx0 of int (integer scalar), and energy
and idx0-ddstate (or subclasses
thereof). The meter passed to the training and testing methods should be
a classifier-meter , or any meter
whose update method looks like this:
(==> meter update output desired energy)
where output must be a
class-state , desired an
idx0 of int, and energy and
idx0-ddstate .
8.1.0.2.1.0. (new supervised m p [e in out des])
|
[CLASS] (packages/gblearn2/gb-trainers.lsh) |
create a new supervised trainer.
Arguments are as follow:
- m :
machine to be trained.
- p : trainable parameter object of
the machine.
- e : energy object (by default an
idx0-ddstate).
- in : input object (by default an
idx3-ddstate).
- out : output object (by default a
class-state).
- des : desired output (by default
an idx0 of int).
8.1.0.2.1.1. (==> supervised train dsource mtr)
|
[MSG] (packages/gblearn2/gb-trainers.lsh) |
train the machine with on the data source dsource
and measure the performance with mtr .
This is a dummy method that should be defined by subclasses.
8.1.0.2.1.2. (==> supervised test dsource mtr)
|
[MSG] (packages/gblearn2/gb-trainers.lsh) |
measures the performance over all the samples of data source
dsource . mtr must be an
appropriate meter.
8.1.0.2.1.3. (==> supervised test-sample dsource mtr i)
|
[MSG] (packages/gblearn2/gb-trainers.lsh) |
measures the performance over a single sample of data source
dsource . This leaves the internal state of the meter
unchanged, and can be used for a quick test of a whether a particular
pattern is correctly recognized or not.
8.1.0.2.2. supervised-gradient
|
(packages/gblearn2/gb-trainers.lsh) |
A basic trainer object for supervised stochastic gradient training of a
classifier with discrete class labels. This is a subclass of
supervised . The machine's fprop method must have four
arguments: input, output, energy, and desired output. A call to the
machine's fprop must look like this:
(==> machine fprop input output desired energy)
where output must be a
class-state , desired an
idx0 of int (integer scalar), and energy
and idx0-ddstate . The meter passed to
the training and testing methods should be a
classifier-meter , or any meter whose update method looks
like this:
(==> meter update output desired energy)
where output must be a
class-state , desired an
idx0 of int, and energy and
idx0-ddstate . The trainable parameter object must understand
the following methods:
- (==> param
clear-dx) : clear the gradients.
- (==> param update eta inertia)
: update the parameters with learning rate eta
, and momentum term inertia .
If the diagonal hessian estimation is to be used, the param object
must also understand:
- (==> param
clear-ddx) : clear the second derivatives.
- (==> param update-ddeltas knew kold)
: update average second derivatives.
- (==> param compute-epsilons
mu ) : set the per-parameter learning rates to the
inverse of the sum of the second derivative estimates and
mu .
8.1.0.2.2.0. (new supervised-gradient m p [e in out des])
|
[CLASS] (packages/gblearn2/gb-trainers.lsh) |
create a new supervised-gradient
trainer. Arguments are as follow:
- m
: machine to be trained.
- p : trainable parameter object of
the machine.
- e : energy object (by default an
idx0-ddstate).
- in : input object (by default an
idx3-ddstate).
- out : output object (by default a
class-state).
- des : desired output (by default
an idx0 of int).
8.1.0.2.2.1. (==> supervised-gradient train-online dsource mtr n eta [inertia] [kappa])
|
[MSG] (packages/gblearn2/gb-trainers.lsh) |
train with stochastic (online) gradient on the next
n samples of data source dsource
with global learning rate eta . and
"momentum term" inertia . Optionally
maintain a running average of the weights with positive rate
kappa . A negative value for kappa sets a rate equal to -
kappa / age . No such
update is performed if kappa is 0.
Record performance in mtr .
mtr must understand the following methods:
(==> mtr update age output desired energy)
(==> mtr info)
where age is the number of calls to
parameter updates so far, output is
the machine's output (most likely a class-state
), desired is the desired output (most
likely an idx0 of int), and energy is
an idx0-state . The
info should return a list of relevant measurements.
8.1.0.2.2.2. (==> supervised-gradient train dsource mtr eta [inertia] [kappa])
|
[MSG] (packages/gblearn2/gb-trainers.lsh) |
train the machine on all the samples in data source
dsource and measure the performance with
mtr .
8.1.0.2.2.3. (==> supervised-gradient compute-diaghessian dsource n mu)
|
[MSG] (packages/gblearn2/gb-trainers.lsh) |
Compute per-parameter learning rates (epsilons) using the stochastic
diaginal levenberg marquardt method (as described in LeCun et al.
"efficient backprop", available at http://yann.lecun.com
). This method computes positive estimates the second derivative of the
objective function with respect to each parameter using the Gauss-Newton
approximation. dsource is a data
source, n is the number of patterns
(starting at the current point in the data source) on which the estimate
is to be performed. Each parameter-specific learning rate epsilon_i is
computed as 1/(H_ii + mu), where H_ii are the diagonal Gauss-Newton
estimates and mu is the blowup
prevention fudge factor.
8.1.0.2.2.4. (==> supervised-gradient saliencies ds n)
|
[MSG] (packages/gblearn2/gb-trainers.lsh) |
Compute the parameters saliencies as defined in the Optimal Brain Damage
algorithm of (LeCun, Denker, Solla, NIPS 1989), available at
http://yann.lecun.com. This computes the first and second derivatives of
the energy with respect to each parameter averaged over the next
n patterns of data source ds
. A vector of saliencies is returned. Component i
of the vector contains Si = -Gi * Wi + 1/2 Hii *
Wi^2 , this is an estimate of how much the energy would
increase if the parameter was eliminated (set to zero). Parameters with
small saliencies can be eliminated by setting their value and epsilon to
zero.