8.1.0.7.0. modules
  | 
(packages/gblearn2/gb-modules.lsh) | 
Author(s):  Yann LeCun 
In Lush, building and training a complex system is done by assembling 
basic blocks, called modules. Modules are subclasses of the class 
gb-module. Though there are several predefined module classes, you can 
define your own pretty easily. modules must understand 2 basic methods, 
fprop and bprop, whose arguments are the inputs and outputs of the 
module. Optionally most module should understand a bbprop method for 
computing diagonal second derivatives. Modules can have as many 
input/output "ports" as desired. these "ports" are passed as arguments 
to the methods that require them, such as fprop and bprop. In most 
cases, these arguments belong to the class idx-state, or one of its 
subclasses. Some modules may have internal trainable parameters. When 
this is the case, an idx3-ddparam object must be passed to the 
constructor. Internal parameters will then be allocated in that param. 
the bprop and bbprop methods ACCUMULATE gradients in these parameters so 
multiple modules can share a single parameter and automatically compute 
the correct gradient. Gradients on input ports are NOT accumulated. 
A special class called trainer provides a convenient way to train and 
test a module combined with pre- and post-processors. Once a module has 
been created, inserting it in an instance of the trainer class is the 
easiest and fastest way to train it on a database and to measure its 
performance. the trainer class understands methods such as train, test 
etc... Most of these methods take instances of database as argument. 
They also take another argument called a meter. A meter is an object 
whose role is to keep track of the performance of the machine during a 
training or test session. trainers, meters, and databases can be put in 
an instance of workbench that handles standard learning sequences 
(estimate second derivatives, train, test....) 
A certain number of predefined basic modules are provided in modules.sn. 
This includes idx3-module, a "root" class of moduless with one input and 
one output, both of type idx3-state. Many predefined modules are 
subclasses of idx3-module. Also included are idx3-squasher (sigmoid 
layer), logadd-layer (transforms a idx3-state into an idx1-state by 
log-adding over the 2 spatial dimensions), mle-cost (cost module for 
minimizing the cost of the desired output). 
8.1.0.7.0.0. gb-module
  | 
(packages/gblearn2/gb-modules.lsh) | 
The class gb-module is the basic class for objects that can be used with 
the library of training routines. Specific trainable modules and cost 
functions etc... are subclasses of gb-module 
and can be combined to build complex adaptive machines. 
gb-module are expected to accept at least the methods 
fprop , bprop , and 
optionally the following methods: bbprop 
, load , and 
save . the external "plugs" of a 
gb-module are passed as argument to the methods. For example, 
the fprop method of a module with one 
input vector and one output vector, and one parameter vector can be 
called with 
 (==> <gb-module> fprop <input> <parameter> <output>)
where input , 
parameter and output are 
instances of the gb-state or one of 
its subclasses. As a convention, the methods 
fprop , bprop , and 
bbprop take the same arguments in the same order. Results of 
these methods are accumulated in the appropriate slot of the objects 
passed as paramters. This allows modules to share inputs and outputs 
while preserving the correctness of forward and backward propagations 
a few convenient subclasses of gb-module 
are predefined in the gblearn2 
library. This includes cost functions, classifiers, and others. 
8.1.0.7.0.0.0. (==> gb-module fprop [args])
  | 
[MSG] (packages/gblearn2/gb-modules.lsh) | 
performs a forward propagation on the gb-module 
. args are optional arguments which 
represent the external "plugs" of the module. When possible, modules 
with variable size outputs resize their output ports automatically. 
See: (==> gb-module 
bprop [ args ]) 
See: (==> gb-module 
bbprop [ args ]) 
8.1.0.7.0.0.1. (==> gb-module bprop [args])
  | 
[MSG] (packages/gblearn2/gb-modules.lsh) | 
performs a backward propagation on the gb-module 
(propagates derivatives). args are 
optional arguments which represent the external "plugs" of the module. 
By convention, the list of args is the 
same as for the fprop method. bprop assumes fprop has been called 
beforehand. If the module has internal parameters, the bprop method will 
ACCUMULATE the gradients in it, so that multiple modules can share the 
same parameters 
See: (==> gb-module 
fprop [ args ]) 
See: (==> gb-module 
bbprop [ args ]) 
8.1.0.7.0.0.2. (==> gb-module bbprop [args])
  | 
[MSG] (packages/gblearn2/gb-modules.lsh) | 
performs a backward propagation of second derivatives on the 
gb-module args are optional 
arguments which represent the external "plugs" of the module. By 
convention, the list of args is the 
same as for the fprop method. bbprop assumes fprop and bprop have been 
run beforehand. If the module has internal parameters, the bbprop method 
will ACCUMULATE second derivatives in it, so that multiple modules can 
share the same parameters 
See: (==> gb-module 
fprop [ args ]) 
See: (==> gb-module 
bprop [ args ]) 
8.1.0.7.0.1. noop-module
  | 
(packages/gblearn2/gb-modules.lsh) | 
a module that does not do anything (a place-holder). This is NOT an 
identity-function module not compilable 
See: id-module 
See: gb-module 
8.1.0.7.0.2. id-module
  | 
(packages/gblearn2/gb-modules.lsh) | 
identity function module. It's a straight pass-through forward and 
backward. arguments must be idx-ddstates non compilable. 
8.1.0.7.0.3. idx4-module
  | 
(packages/gblearn2/gb-modules.lsh) | 
a basic "root" class for modules that have one single idx-state input 
and one single idx4-state output. the fprop, bprop and bbprop methods of 
this root class merely act as identity functions 
8.1.0.7.0.4. idx3-module
  | 
(packages/gblearn2/gb-modules.lsh) | 
a basic "root" class for modules that have one single idx-state input 
and one single idx3-state output. the fprop, bprop and bbprop methods of 
this root class merely act as identity functions 
8.1.0.7.0.5. idx2-module
  | 
(packages/gblearn2/gb-modules.lsh) | 
a basic "root" class for modules that have one single idx-state input 
and one single idx2-state output. the fprop, bprop and bbprop methods of 
this root class merely act as identity functions 
8.1.0.7.0.6. idx1-module
  | 
(packages/gblearn2/gb-modules.lsh) | 
a basic "root" class for modules that have one single idx-state input 
and one single idx1-state output. the fprop, bprop and bbprop methods of 
this root class merely act as identity functions 
8.1.0.7.0.7. idx4-squasher
  | 
(packages/gblearn2/gb-modules.lsh) | 
a basic squashing function layer for idx4-state you can udefine 
subclasses of this to change the squashing function 
8.1.0.7.0.8. idx3-squasher
  | 
(packages/gblearn2/gb-modules.lsh) | 
a basic squashing function layer for idx3-state you can udefine 
subclasses of this to change the squashing function 
8.1.0.7.0.9. idx4-sqsquasher
  | 
(packages/gblearn2/gb-modules.lsh) | 
square of hyperbolic tangent (or a rational approximation to it). 
8.1.0.7.0.10. idx3-sqsquasher
  | 
(packages/gblearn2/gb-modules.lsh) | 
square of hyperbolic tangent (or a rational approximation to it). 
8.1.0.7.0.11. idx4-halfsquare
  | 
(packages/gblearn2/gb-modules.lsh) | 
takes half square of each component. 
8.1.0.7.0.12. idx3-halfsquare
  | 
(packages/gblearn2/gb-modules.lsh) | 
takes half square of each component. 
8.1.0.7.0.13. logadd-layer
  | 
(packages/gblearn2/gb-modules.lsh) | 
performs a log-add over spatial dimensions of an idx3-state output is an 
idx1-state 
8.1.0.7.0.13.0. cost
  | 
(packages/gblearn2/gb-modules.lsh) | 
costs are a special type of modules (although there is no definite 
subclass for them) with two inputs and one output. the output is an 
idx0-ddstate which stores a cost or energy. one of the inputs is meant 
to be the output of another module (e.g. a network), and the other input 
a desired output (or any kind of supervisor signal like a 
reinforcement). the gradient slot (dx) of the output state is generally 
filled with +1. That way, the bprop method of the cost module 
automatically computes the gradient. 
8.1.0.7.0.14. idx3-cost
  | 
(packages/gblearn2/gb-modules.lsh) | 
abstract class for a cost function that takes an idx3-state as input, an 
int as desired output, and an idx0-state as energy. 
8.1.0.7.0.15. mle-cost
  | 
(packages/gblearn2/gb-modules.lsh) | 
a cost module that propagates the output corresponding to the desired 
label. If the output is interpreted as a negative log likelihood, 
minimizing this output is equivalent to maximizing the likelihood. 
outputs are log-added over spatial dimensions in case of spatial 
replication. 
8.1.0.7.0.15.0. (new mle-cost classes si sj)
  | 
[CLASS] (packages/gblearn2/gb-modules.lsh) | 
make a new mle-cost. classes is an 
integer vector which contains the labels associated with each output. 
From that vector, the reverse table is constructed to map labels to 
class indices. Elements in classes 
must be positive or 0, and not be too large, as a table as large as the 
maximum value is allocated. si and 
sj are the expected maximum sizes in the spatial dimensions 
(used for preallocation to prevent memory fragmentation). 
8.1.0.7.0.16. mmi-cost
  | 
(packages/gblearn2/gb-modules.lsh) | 
a cost function that maximizes the mutual information between the actual 
output and the desired output. This assumes that the outputs are costs, 
or negative log likelihoods. this modules accepts spatially replicated 
inputs. 
8.1.0.7.0.16.0. (new mmi-cost classes priors si sj prm)
  | 
[CLASS] (packages/gblearn2/gb-modules.lsh) | 
make a new mmi-cost. classes is an 
integer vector which contains the labels associated with each output. 
From that vector, the reverse table is constructed to map labels to 
class indices. Elements in classes 
must be positive or 0, and not be too large, as a table as large as the 
maximum value is allocated. priors : 
an idx1 of gbtypes, whose size must be the size of 
classes +1. It specifies the prior probability for each 
class, and for the junk class. The prior for the junk class must be in 
the last element. In absence of a better guess, the prior vector should 
be filled with 1/n, where n is its size. si 
and sj are the expected maximum sizes 
in the spatial dimensions (used for preallocation to prevent memory 
fragmentation). prm is an idx1-ddparam 
in which the value that determines the constant cost of the junk class 
will be stored. If the system is to be trained without junk examples, 
this parameter can be set to a very large value, and not be trained. The 
effect of setting this parameter to a fixed value is to softly saturate 
the costs of all the class to the half-square of that value (the overall 
energy will never be significantly larger then the half-square of the 
set value), and to softly clip the gradients, i.e. the units whose cost 
is higher than the half-square of the set value will receive negligible 
gradients. The parameter can be learned ONLY IF junk examples (with 
label -1) are present in the training set. There is a method, called 
set-junk-cost that allows to directly set the value of the junk without 
having to figure out the half-square business. 
8.1.0.7.0.16.1. (==> mmi-cost set-junk-cost c)
  | 
[MSG] (packages/gblearn2/gb-modules.lsh) | 
set the constant cost of the junk class to c 
. the underlying parameter is given the value (sqrt (* 2 
c )), so c must be 
positive. 
8.1.0.7.0.17. fed-cost
  | 
(packages/gblearn2/gb-modules.lsh) | 
a replicable cost module that computes difference between the desired 
output (interpreted as a cost, log-summed over space) and the free 
energy of the set of outputs (i.e. the logsum of all the outputs over 
all locations). A label of -1 indicates that the sample is "junk" (none 
of the above). This cost module makes sense if it follows a an e-layer. 
FED stands for "free energy difference". 
8.1.0.7.0.18. crossentropy-cost
  | 
(packages/gblearn2/gb-modules.lsh) | 
a replicable cross-entropy cost function. computes the log-sum over the 
2D spatial output of the log cross-entropy between the desired 
distribution over output classes and the actual distribution over output 
classes produced by the network. This is designed to 
8.1.0.7.0.19. edist-cost
  | 
(packages/gblearn2/gb-modules.lsh) | 
a replicable Euclidean distance cost function. computes the log-sum over 
the 2D spatial output of the half squared error between the output and 
the prototype with the desired label. this does not generate gradients 
on the prototypes 
8.1.0.7.0.19.0. (new edist-cost classes si sj p)
  | 
[CLASS] (packages/gblearn2/gb-modules.lsh) | 
make a new edist-cost. classes is an 
integer vector which contains the labels associated with each output. 
From that vector, the reverse table is constructed to map labels to 
class indices. Elements in classes 
must be positive or 0, and not be too large, as a table as large as the 
maximum value is allocated. si and 
sj are the expected maximum sizes in the spatial dimensions 
(used for preallocation to prevent memory fragmentation). 
p is an idx2 containing the prototype for each class label. 
The first dimension of p should be 
equal to the dimension of classes . 
the second dimension of p should be 
equal to the number of outputs of the previous module. The costs are 
"log-summed" over spatial dimensions 
8.1.0.7.0.20. wedist-cost
  | 
(packages/gblearn2/gb-modules.lsh) | 
a replicable weighted Euclidean distance cost function. computes the 
log-sum over the 2D spatial output of the weighted half squared error 
between the output and the prototype with the desired label. this does 
not generate gradients on the prototypes. 
8.1.0.7.0.20.0. (new wedist-cost classes si sj p w)
  | 
[CLASS] (packages/gblearn2/gb-modules.lsh) | 
make a new wedist-cost. classes is an 
integer vector which contains the labels associated with each output. 
From that vector, the reverse table is constructed to map labels to 
class indices. Elements in classes 
must be positive or 0, and not be too large, as a table as large as the 
maximum value is allocated. si and 
sj are the expected maximum sizes in the spatial dimensions 
(used for preallocation to prevent memory fragmentation). 
p is an idx2 containing the prototype for each class label, 
and w is an idx2 with a single weight 
for each of these prototype and each of its elements. The first 
dimension of p (and 
w ) should be equal to the dimension of 
classes . the second dimension of p 
(and w ) should be equal to the number 
of outputs of the previous module. The costs are "log-summed" over 
spatial dimensions 
8.1.0.7.0.21. weighted-mse-cost
  | 
(packages/gblearn2/gb-modules.lsh) | 
This is similar to wedist-cost but the weights matrix may run over 
patterns. The desired output vector has size two: the first element 
gives the class label, and the second element gives the position (row) 
in the weights matrix to use for the weighted euclidean distance. It is 
a replicable weighted Euclidean distance cost function. computes the 
log-sum over the 2D spatial output of the weighted half squared error 
between the output and the prototype with the desired label. this does 
not generate gradients on the prototypes 
8.1.0.7.0.21.0. (new weighted-mse-cost classes si sj p w)
  | 
[CLASS] (packages/gblearn2/gb-modules.lsh) | 
make a new weighted-mse-cost. classes 
is an integer vector which contains the labels associated with each 
output. From that vector, the reverse table is constructed to map labels 
to class indices. Elements in classes 
must be positive or 0, and not be too large, as a table as large as the 
maximum value is allocated. si and 
sj are the expected maximum sizes in the spatial dimensions 
(used for preallocation to prevent memory fragmentation). 
p is an idx2 containing the prototype for each class label, 
and w is an idx2 with a single weight 
for each of these prototype and each of its elements. The first 
dimension of p (and 
w ) should be equal to the dimension of 
classes . the second dimension of p 
(and w ) should be equal to the number 
of outputs of the previous module. The costs are "log-summed" over 
spatial dimensions 
8.1.0.7.0.22. ledist-cost
  | 
(packages/gblearn2/gb-modules.lsh) | 
a replicable Euclidean distance cost function with LOCAL TARGETS at each 
position. Target prototypes are associated to classes. The cost is the 
sum over the 2D output of the half squared error between the local 
output and the prototype with the desired label at that position. This 
does not generate gradients on the prototypes 
8.1.0.7.0.22.0. (new ledist-cost classes p)
  | 
[CLASS] (packages/gblearn2/gb-modules.lsh) | 
make a new ledist-cost. classes is an 
integer vector which contains the labels associated with each output. 
From that vector, the reverse table is constructed to map labels to 
class indices. Elements in classes 
must be positive or 0, and not be too large, as a table as large as the 
maximum value is allocated. p is an 
idx2 containing the prototype for each class label. The first dimension 
of p should be equal to the dimension 
of classes . the second dimension of 
p should be equal to the number of outputs of the previous 
module. The costs are summed over spatial dimensions. 
8.1.0.7.1. Classifiers
  | 
(packages/gblearn2/gb-modules.lsh) | 
8.1.0.7.1.0. idx3-classifier
  | 
(packages/gblearn2/gb-modules.lsh) | 
The idx3-classifier module take an 
idx3-state as input and produce a 
class-state on output. A class-state 
is used to represent the output of classifiers with a discrete set of 
class labels. 
8.1.0.7.1.1. min-classer
  | 
(packages/gblearn2/gb-modules.lsh) | 
a module that takes an idx3-state, finds the lowest value and output the 
label associated with the index (in the first dimension of the state) of 
this lowest value. It actually sorts the labels according to their score 
(or costs) and outputs the sorted list. 
8.1.0.7.1.1.0. (new min-classer classes)
  | 
[CLASS] (packages/gblearn2/gb-modules.lsh) | 
makes a new min-classer. classes is an 
integer vector which contains the labels associated with each output. 
8.1.0.7.1.2. max-classer
  | 
(packages/gblearn2/gb-modules.lsh) | 
a module that takes an idx3-state, finds the lowest value and output the 
label associated with the index (in the first dimension of the state) of 
this lowest value. It actually sorts the labels according to their score 
(or costs) and outputs the sorted list. 
8.1.0.7.1.2.0. (new max-classer classes)
  | 
[CLASS] (packages/gblearn2/gb-modules.lsh) | 
makes a new max-classer. classes is an 
integer vector which contains the labels associated with each output. 
8.1.0.7.1.3. edist-classer
  | 
(packages/gblearn2/gb-modules.lsh) | 
a replicable Euclidean distance pattern matcher, which finds the class 
prototype "closest" to the output, where "close" is based on the 
log-added euclidean distances between the prototype and the output at 
various positions. This corresponds to finding the class whose 
a-posteriori probability is largest, when P(c|data) = sum_[position=x] 
P(c at x |data at x) / n_positions and the priors over classes are 
uniform, and the local class likelihoods P(data at x | c at x) are 
Gaussian with unit variance and mean = prototype(c). 
8.1.0.7.1.4. ledist-classer
  | 
(packages/gblearn2/gb-modules.lsh) | 
a replicable Euclidean distance pattern matcher, which finds the class 
prototype closest to the output for the vectors at each position in the 
output image. 
8.1.0.7.1.4.0. (new ledist-classer classes p)
  | 
[CLASS] (packages/gblearn2/gb-modules.lsh) | 
make a new ledist-classer. classes is 
an integer vector which contains the labels associated with each 
prototype. p is an idx2 containing the 
prototype for each class label. The first dimension of 
p should be equal to the dimension of 
classes . the second dimension of p 
should be equal to the number of outputs of the previous module. 
8.1.0.7.1.5. mmi-classer
  | 
(packages/gblearn2/gb-modules.lsh) | 
a classifier that computes class scores based on an MMI type criterion 
(a kind of softmax in log) It gives scores (cost) for all classes 
including junk. It should be used in conjunction with mmi-cost. This 
assumes that the output of the previous module are costs, or negative 
log likelihoods. this modules accepts spatially replicated inputs. 
8.1.0.7.1.5.0. (new mmi-classer classes priors si sj prm)
  | 
[CLASS] (packages/gblearn2/gb-modules.lsh) | 
makes a new mmi-classer. The arguments are identical to that of 
mmi-cost. In fact if an mmi-classer is to used in conjunction with an 
mmi-cost, they should share the prior vector and the parameter. sharing 
the parameter can be done by first building the classer, then reducing 
the size of the parameter by one, then creating the cost. 
8.1.0.7.1.5.1. (==> mmi-classer set-junk-cost c)
  | 
[MSG] (packages/gblearn2/gb-modules.lsh) | 
set the constant cost of the junk class to c 
. the underlying parameter is given the value (sqrt (* 2 
c )), so c must be 
positive. BE CAREFUL that the junk parameter of an mmi-classer is 
usually shared by an mmi-cost, changing one will change the other. 
8.1.0.7.1.5.2. (build-ascii-proto targets charset)
  | 
(packages/gblearn2/gb-modules.lsh) | 
8.1.0.7.2. idx3-supervised-module
  | 
(packages/gblearn2/gb-modules.lsh) | 
a module that takes an idx3 as input, runs it through a machine, and 
runs the output of the machine through a cost function whose second 
output is the desired label stored in an idx0 of int.