8.1.0.7.0. modules
|
(packages/gblearn2/gb-modules.lsh) |
Author(s): Yann LeCun
In Lush, building and training a complex system is done by assembling
basic blocks, called modules. Modules are subclasses of the class
gb-module. Though there are several predefined module classes, you can
define your own pretty easily. modules must understand 2 basic methods,
fprop and bprop, whose arguments are the inputs and outputs of the
module. Optionally most module should understand a bbprop method for
computing diagonal second derivatives. Modules can have as many
input/output "ports" as desired. these "ports" are passed as arguments
to the methods that require them, such as fprop and bprop. In most
cases, these arguments belong to the class idx-state, or one of its
subclasses. Some modules may have internal trainable parameters. When
this is the case, an idx3-ddparam object must be passed to the
constructor. Internal parameters will then be allocated in that param.
the bprop and bbprop methods ACCUMULATE gradients in these parameters so
multiple modules can share a single parameter and automatically compute
the correct gradient. Gradients on input ports are NOT accumulated.
A special class called trainer provides a convenient way to train and
test a module combined with pre- and post-processors. Once a module has
been created, inserting it in an instance of the trainer class is the
easiest and fastest way to train it on a database and to measure its
performance. the trainer class understands methods such as train, test
etc... Most of these methods take instances of database as argument.
They also take another argument called a meter. A meter is an object
whose role is to keep track of the performance of the machine during a
training or test session. trainers, meters, and databases can be put in
an instance of workbench that handles standard learning sequences
(estimate second derivatives, train, test....)
A certain number of predefined basic modules are provided in modules.sn.
This includes idx3-module, a "root" class of moduless with one input and
one output, both of type idx3-state. Many predefined modules are
subclasses of idx3-module. Also included are idx3-squasher (sigmoid
layer), logadd-layer (transforms a idx3-state into an idx1-state by
log-adding over the 2 spatial dimensions), mle-cost (cost module for
minimizing the cost of the desired output).
8.1.0.7.0.0. gb-module
|
(packages/gblearn2/gb-modules.lsh) |
The class gb-module is the basic class for objects that can be used with
the library of training routines. Specific trainable modules and cost
functions etc... are subclasses of gb-module
and can be combined to build complex adaptive machines.
gb-module are expected to accept at least the methods
fprop , bprop , and
optionally the following methods: bbprop
, load , and
save . the external "plugs" of a
gb-module are passed as argument to the methods. For example,
the fprop method of a module with one
input vector and one output vector, and one parameter vector can be
called with
(==> <gb-module> fprop <input> <parameter> <output>)
where input ,
parameter and output are
instances of the gb-state or one of
its subclasses. As a convention, the methods
fprop , bprop , and
bbprop take the same arguments in the same order. Results of
these methods are accumulated in the appropriate slot of the objects
passed as paramters. This allows modules to share inputs and outputs
while preserving the correctness of forward and backward propagations
a few convenient subclasses of gb-module
are predefined in the gblearn2
library. This includes cost functions, classifiers, and others.
8.1.0.7.0.0.0. (==> gb-module fprop [args])
|
[MSG] (packages/gblearn2/gb-modules.lsh) |
performs a forward propagation on the gb-module
. args are optional arguments which
represent the external "plugs" of the module. When possible, modules
with variable size outputs resize their output ports automatically.
See: (==> gb-module
bprop [ args ])
See: (==> gb-module
bbprop [ args ])
8.1.0.7.0.0.1. (==> gb-module bprop [args])
|
[MSG] (packages/gblearn2/gb-modules.lsh) |
performs a backward propagation on the gb-module
(propagates derivatives). args are
optional arguments which represent the external "plugs" of the module.
By convention, the list of args is the
same as for the fprop method. bprop assumes fprop has been called
beforehand. If the module has internal parameters, the bprop method will
ACCUMULATE the gradients in it, so that multiple modules can share the
same parameters
See: (==> gb-module
fprop [ args ])
See: (==> gb-module
bbprop [ args ])
8.1.0.7.0.0.2. (==> gb-module bbprop [args])
|
[MSG] (packages/gblearn2/gb-modules.lsh) |
performs a backward propagation of second derivatives on the
gb-module args are optional
arguments which represent the external "plugs" of the module. By
convention, the list of args is the
same as for the fprop method. bbprop assumes fprop and bprop have been
run beforehand. If the module has internal parameters, the bbprop method
will ACCUMULATE second derivatives in it, so that multiple modules can
share the same parameters
See: (==> gb-module
fprop [ args ])
See: (==> gb-module
bprop [ args ])
8.1.0.7.0.1. noop-module
|
(packages/gblearn2/gb-modules.lsh) |
a module that does not do anything (a place-holder). This is NOT an
identity-function module not compilable
See: id-module
See: gb-module
8.1.0.7.0.2. id-module
|
(packages/gblearn2/gb-modules.lsh) |
identity function module. It's a straight pass-through forward and
backward. arguments must be idx-ddstates non compilable.
8.1.0.7.0.3. idx4-module
|
(packages/gblearn2/gb-modules.lsh) |
a basic "root" class for modules that have one single idx-state input
and one single idx4-state output. the fprop, bprop and bbprop methods of
this root class merely act as identity functions
8.1.0.7.0.4. idx3-module
|
(packages/gblearn2/gb-modules.lsh) |
a basic "root" class for modules that have one single idx-state input
and one single idx3-state output. the fprop, bprop and bbprop methods of
this root class merely act as identity functions
8.1.0.7.0.5. idx2-module
|
(packages/gblearn2/gb-modules.lsh) |
a basic "root" class for modules that have one single idx-state input
and one single idx2-state output. the fprop, bprop and bbprop methods of
this root class merely act as identity functions
8.1.0.7.0.6. idx1-module
|
(packages/gblearn2/gb-modules.lsh) |
a basic "root" class for modules that have one single idx-state input
and one single idx1-state output. the fprop, bprop and bbprop methods of
this root class merely act as identity functions
8.1.0.7.0.7. idx4-squasher
|
(packages/gblearn2/gb-modules.lsh) |
a basic squashing function layer for idx4-state you can udefine
subclasses of this to change the squashing function
8.1.0.7.0.8. idx3-squasher
|
(packages/gblearn2/gb-modules.lsh) |
a basic squashing function layer for idx3-state you can udefine
subclasses of this to change the squashing function
8.1.0.7.0.9. idx4-sqsquasher
|
(packages/gblearn2/gb-modules.lsh) |
square of hyperbolic tangent (or a rational approximation to it).
8.1.0.7.0.10. idx3-sqsquasher
|
(packages/gblearn2/gb-modules.lsh) |
square of hyperbolic tangent (or a rational approximation to it).
8.1.0.7.0.11. idx4-halfsquare
|
(packages/gblearn2/gb-modules.lsh) |
takes half square of each component.
8.1.0.7.0.12. idx3-halfsquare
|
(packages/gblearn2/gb-modules.lsh) |
takes half square of each component.
8.1.0.7.0.13. logadd-layer
|
(packages/gblearn2/gb-modules.lsh) |
performs a log-add over spatial dimensions of an idx3-state output is an
idx1-state
8.1.0.7.0.13.0. cost
|
(packages/gblearn2/gb-modules.lsh) |
costs are a special type of modules (although there is no definite
subclass for them) with two inputs and one output. the output is an
idx0-ddstate which stores a cost or energy. one of the inputs is meant
to be the output of another module (e.g. a network), and the other input
a desired output (or any kind of supervisor signal like a
reinforcement). the gradient slot (dx) of the output state is generally
filled with +1. That way, the bprop method of the cost module
automatically computes the gradient.
8.1.0.7.0.14. idx3-cost
|
(packages/gblearn2/gb-modules.lsh) |
abstract class for a cost function that takes an idx3-state as input, an
int as desired output, and an idx0-state as energy.
8.1.0.7.0.15. mle-cost
|
(packages/gblearn2/gb-modules.lsh) |
a cost module that propagates the output corresponding to the desired
label. If the output is interpreted as a negative log likelihood,
minimizing this output is equivalent to maximizing the likelihood.
outputs are log-added over spatial dimensions in case of spatial
replication.
8.1.0.7.0.15.0. (new mle-cost classes si sj)
|
[CLASS] (packages/gblearn2/gb-modules.lsh) |
make a new mle-cost. classes is an
integer vector which contains the labels associated with each output.
From that vector, the reverse table is constructed to map labels to
class indices. Elements in classes
must be positive or 0, and not be too large, as a table as large as the
maximum value is allocated. si and
sj are the expected maximum sizes in the spatial dimensions
(used for preallocation to prevent memory fragmentation).
8.1.0.7.0.16. mmi-cost
|
(packages/gblearn2/gb-modules.lsh) |
a cost function that maximizes the mutual information between the actual
output and the desired output. This assumes that the outputs are costs,
or negative log likelihoods. this modules accepts spatially replicated
inputs.
8.1.0.7.0.16.0. (new mmi-cost classes priors si sj prm)
|
[CLASS] (packages/gblearn2/gb-modules.lsh) |
make a new mmi-cost. classes is an
integer vector which contains the labels associated with each output.
From that vector, the reverse table is constructed to map labels to
class indices. Elements in classes
must be positive or 0, and not be too large, as a table as large as the
maximum value is allocated. priors :
an idx1 of gbtypes, whose size must be the size of
classes +1. It specifies the prior probability for each
class, and for the junk class. The prior for the junk class must be in
the last element. In absence of a better guess, the prior vector should
be filled with 1/n, where n is its size. si
and sj are the expected maximum sizes
in the spatial dimensions (used for preallocation to prevent memory
fragmentation). prm is an idx1-ddparam
in which the value that determines the constant cost of the junk class
will be stored. If the system is to be trained without junk examples,
this parameter can be set to a very large value, and not be trained. The
effect of setting this parameter to a fixed value is to softly saturate
the costs of all the class to the half-square of that value (the overall
energy will never be significantly larger then the half-square of the
set value), and to softly clip the gradients, i.e. the units whose cost
is higher than the half-square of the set value will receive negligible
gradients. The parameter can be learned ONLY IF junk examples (with
label -1) are present in the training set. There is a method, called
set-junk-cost that allows to directly set the value of the junk without
having to figure out the half-square business.
8.1.0.7.0.16.1. (==> mmi-cost set-junk-cost c)
|
[MSG] (packages/gblearn2/gb-modules.lsh) |
set the constant cost of the junk class to c
. the underlying parameter is given the value (sqrt (* 2
c )), so c must be
positive.
8.1.0.7.0.17. fed-cost
|
(packages/gblearn2/gb-modules.lsh) |
a replicable cost module that computes difference between the desired
output (interpreted as a cost, log-summed over space) and the free
energy of the set of outputs (i.e. the logsum of all the outputs over
all locations). A label of -1 indicates that the sample is "junk" (none
of the above). This cost module makes sense if it follows a an e-layer.
FED stands for "free energy difference".
8.1.0.7.0.18. crossentropy-cost
|
(packages/gblearn2/gb-modules.lsh) |
a replicable cross-entropy cost function. computes the log-sum over the
2D spatial output of the log cross-entropy between the desired
distribution over output classes and the actual distribution over output
classes produced by the network. This is designed to
8.1.0.7.0.19. edist-cost
|
(packages/gblearn2/gb-modules.lsh) |
a replicable Euclidean distance cost function. computes the log-sum over
the 2D spatial output of the half squared error between the output and
the prototype with the desired label. this does not generate gradients
on the prototypes
8.1.0.7.0.19.0. (new edist-cost classes si sj p)
|
[CLASS] (packages/gblearn2/gb-modules.lsh) |
make a new edist-cost. classes is an
integer vector which contains the labels associated with each output.
From that vector, the reverse table is constructed to map labels to
class indices. Elements in classes
must be positive or 0, and not be too large, as a table as large as the
maximum value is allocated. si and
sj are the expected maximum sizes in the spatial dimensions
(used for preallocation to prevent memory fragmentation).
p is an idx2 containing the prototype for each class label.
The first dimension of p should be
equal to the dimension of classes .
the second dimension of p should be
equal to the number of outputs of the previous module. The costs are
"log-summed" over spatial dimensions
8.1.0.7.0.20. wedist-cost
|
(packages/gblearn2/gb-modules.lsh) |
a replicable weighted Euclidean distance cost function. computes the
log-sum over the 2D spatial output of the weighted half squared error
between the output and the prototype with the desired label. this does
not generate gradients on the prototypes.
8.1.0.7.0.20.0. (new wedist-cost classes si sj p w)
|
[CLASS] (packages/gblearn2/gb-modules.lsh) |
make a new wedist-cost. classes is an
integer vector which contains the labels associated with each output.
From that vector, the reverse table is constructed to map labels to
class indices. Elements in classes
must be positive or 0, and not be too large, as a table as large as the
maximum value is allocated. si and
sj are the expected maximum sizes in the spatial dimensions
(used for preallocation to prevent memory fragmentation).
p is an idx2 containing the prototype for each class label,
and w is an idx2 with a single weight
for each of these prototype and each of its elements. The first
dimension of p (and
w ) should be equal to the dimension of
classes . the second dimension of p
(and w ) should be equal to the number
of outputs of the previous module. The costs are "log-summed" over
spatial dimensions
8.1.0.7.0.21. weighted-mse-cost
|
(packages/gblearn2/gb-modules.lsh) |
This is similar to wedist-cost but the weights matrix may run over
patterns. The desired output vector has size two: the first element
gives the class label, and the second element gives the position (row)
in the weights matrix to use for the weighted euclidean distance. It is
a replicable weighted Euclidean distance cost function. computes the
log-sum over the 2D spatial output of the weighted half squared error
between the output and the prototype with the desired label. this does
not generate gradients on the prototypes
8.1.0.7.0.21.0. (new weighted-mse-cost classes si sj p w)
|
[CLASS] (packages/gblearn2/gb-modules.lsh) |
make a new weighted-mse-cost. classes
is an integer vector which contains the labels associated with each
output. From that vector, the reverse table is constructed to map labels
to class indices. Elements in classes
must be positive or 0, and not be too large, as a table as large as the
maximum value is allocated. si and
sj are the expected maximum sizes in the spatial dimensions
(used for preallocation to prevent memory fragmentation).
p is an idx2 containing the prototype for each class label,
and w is an idx2 with a single weight
for each of these prototype and each of its elements. The first
dimension of p (and
w ) should be equal to the dimension of
classes . the second dimension of p
(and w ) should be equal to the number
of outputs of the previous module. The costs are "log-summed" over
spatial dimensions
8.1.0.7.0.22. ledist-cost
|
(packages/gblearn2/gb-modules.lsh) |
a replicable Euclidean distance cost function with LOCAL TARGETS at each
position. Target prototypes are associated to classes. The cost is the
sum over the 2D output of the half squared error between the local
output and the prototype with the desired label at that position. This
does not generate gradients on the prototypes
8.1.0.7.0.22.0. (new ledist-cost classes p)
|
[CLASS] (packages/gblearn2/gb-modules.lsh) |
make a new ledist-cost. classes is an
integer vector which contains the labels associated with each output.
From that vector, the reverse table is constructed to map labels to
class indices. Elements in classes
must be positive or 0, and not be too large, as a table as large as the
maximum value is allocated. p is an
idx2 containing the prototype for each class label. The first dimension
of p should be equal to the dimension
of classes . the second dimension of
p should be equal to the number of outputs of the previous
module. The costs are summed over spatial dimensions.
8.1.0.7.1. Classifiers
|
(packages/gblearn2/gb-modules.lsh) |
8.1.0.7.1.0. idx3-classifier
|
(packages/gblearn2/gb-modules.lsh) |
The idx3-classifier module take an
idx3-state as input and produce a
class-state on output. A class-state
is used to represent the output of classifiers with a discrete set of
class labels.
8.1.0.7.1.1. min-classer
|
(packages/gblearn2/gb-modules.lsh) |
a module that takes an idx3-state, finds the lowest value and output the
label associated with the index (in the first dimension of the state) of
this lowest value. It actually sorts the labels according to their score
(or costs) and outputs the sorted list.
8.1.0.7.1.1.0. (new min-classer classes)
|
[CLASS] (packages/gblearn2/gb-modules.lsh) |
makes a new min-classer. classes is an
integer vector which contains the labels associated with each output.
8.1.0.7.1.2. max-classer
|
(packages/gblearn2/gb-modules.lsh) |
a module that takes an idx3-state, finds the lowest value and output the
label associated with the index (in the first dimension of the state) of
this lowest value. It actually sorts the labels according to their score
(or costs) and outputs the sorted list.
8.1.0.7.1.2.0. (new max-classer classes)
|
[CLASS] (packages/gblearn2/gb-modules.lsh) |
makes a new max-classer. classes is an
integer vector which contains the labels associated with each output.
8.1.0.7.1.3. edist-classer
|
(packages/gblearn2/gb-modules.lsh) |
a replicable Euclidean distance pattern matcher, which finds the class
prototype "closest" to the output, where "close" is based on the
log-added euclidean distances between the prototype and the output at
various positions. This corresponds to finding the class whose
a-posteriori probability is largest, when P(c|data) = sum_[position=x]
P(c at x |data at x) / n_positions and the priors over classes are
uniform, and the local class likelihoods P(data at x | c at x) are
Gaussian with unit variance and mean = prototype(c).
8.1.0.7.1.4. ledist-classer
|
(packages/gblearn2/gb-modules.lsh) |
a replicable Euclidean distance pattern matcher, which finds the class
prototype closest to the output for the vectors at each position in the
output image.
8.1.0.7.1.4.0. (new ledist-classer classes p)
|
[CLASS] (packages/gblearn2/gb-modules.lsh) |
make a new ledist-classer. classes is
an integer vector which contains the labels associated with each
prototype. p is an idx2 containing the
prototype for each class label. The first dimension of
p should be equal to the dimension of
classes . the second dimension of p
should be equal to the number of outputs of the previous module.
8.1.0.7.1.5. mmi-classer
|
(packages/gblearn2/gb-modules.lsh) |
a classifier that computes class scores based on an MMI type criterion
(a kind of softmax in log) It gives scores (cost) for all classes
including junk. It should be used in conjunction with mmi-cost. This
assumes that the output of the previous module are costs, or negative
log likelihoods. this modules accepts spatially replicated inputs.
8.1.0.7.1.5.0. (new mmi-classer classes priors si sj prm)
|
[CLASS] (packages/gblearn2/gb-modules.lsh) |
makes a new mmi-classer. The arguments are identical to that of
mmi-cost. In fact if an mmi-classer is to used in conjunction with an
mmi-cost, they should share the prior vector and the parameter. sharing
the parameter can be done by first building the classer, then reducing
the size of the parameter by one, then creating the cost.
8.1.0.7.1.5.1. (==> mmi-classer set-junk-cost c)
|
[MSG] (packages/gblearn2/gb-modules.lsh) |
set the constant cost of the junk class to c
. the underlying parameter is given the value (sqrt (* 2
c )), so c must be
positive. BE CAREFUL that the junk parameter of an mmi-classer is
usually shared by an mmi-cost, changing one will change the other.
8.1.0.7.1.5.2. (build-ascii-proto targets charset)
|
(packages/gblearn2/gb-modules.lsh) |
8.1.0.7.2. idx3-supervised-module
|
(packages/gblearn2/gb-modules.lsh) |
a module that takes an idx3 as input, runs it through a machine, and
runs the output of the machine through a cost function whose second
output is the desired label stored in an idx0 of int.