8.1.0.7.0. modules

(packages/gblearn2/gb-modules.lsh)

Author(s): Yann LeCun

In Lush, building and training a complex system is done by assembling basic blocks, called modules. Modules are subclasses of the class gb-module. Though there are several predefined module classes, you can define your own pretty easily. modules must understand 2 basic methods, fprop and bprop, whose arguments are the inputs and outputs of the module. Optionally most module should understand a bbprop method for computing diagonal second derivatives. Modules can have as many input/output "ports" as desired. these "ports" are passed as arguments to the methods that require them, such as fprop and bprop. In most cases, these arguments belong to the class idx-state, or one of its subclasses. Some modules may have internal trainable parameters. When this is the case, an idx3-ddparam object must be passed to the constructor. Internal parameters will then be allocated in that param. the bprop and bbprop methods ACCUMULATE gradients in these parameters so multiple modules can share a single parameter and automatically compute the correct gradient. Gradients on input ports are NOT accumulated.

A special class called trainer provides a convenient way to train and test a module combined with pre- and post-processors. Once a module has been created, inserting it in an instance of the trainer class is the easiest and fastest way to train it on a database and to measure its performance. the trainer class understands methods such as train, test etc... Most of these methods take instances of database as argument. They also take another argument called a meter. A meter is an object whose role is to keep track of the performance of the machine during a training or test session. trainers, meters, and databases can be put in an instance of workbench that handles standard learning sequences (estimate second derivatives, train, test....)

A certain number of predefined basic modules are provided in modules.sn. This includes idx3-module, a "root" class of moduless with one input and one output, both of type idx3-state. Many predefined modules are subclasses of idx3-module. Also included are idx3-squasher (sigmoid layer), logadd-layer (transforms a idx3-state into an idx1-state by log-adding over the 2 spatial dimensions), mle-cost (cost module for minimizing the cost of the desired output).

8.1.0.7.0.0. gb-module

(packages/gblearn2/gb-modules.lsh)

The class gb-module is the basic class for objects that can be used with the library of training routines. Specific trainable modules and cost functions etc... are subclasses of gb-module and can be combined to build complex adaptive machines. gb-module are expected to accept at least the methods fprop , bprop , and optionally the following methods: bbprop , load , and save . the external "plugs" of a gb-module are passed as argument to the methods. For example, the fprop method of a module with one input vector and one output vector, and one parameter vector can be called with

 (==> <gb-module> fprop <input> <parameter> <output>)

where input , parameter and output are instances of the gb-state or one of its subclasses. As a convention, the methods fprop , bprop , and bbprop take the same arguments in the same order. Results of these methods are accumulated in the appropriate slot of the objects passed as paramters. This allows modules to share inputs and outputs while preserving the correctness of forward and backward propagations

a few convenient subclasses of gb-module are predefined in the gblearn2 library. This includes cost functions, classifiers, and others.

8.1.0.7.0.0.0. (==> gb-module fprop [args])

[MSG] (packages/gblearn2/gb-modules.lsh)

performs a forward propagation on the gb-module . args are optional arguments which represent the external "plugs" of the module. When possible, modules with variable size outputs resize their output ports automatically.
See: (==> gb-module bprop [ args ])
See: (==> gb-module bbprop [ args ])

8.1.0.7.0.0.1. (==> gb-module bprop [args])

[MSG] (packages/gblearn2/gb-modules.lsh)

performs a backward propagation on the gb-module (propagates derivatives). args are optional arguments which represent the external "plugs" of the module. By convention, the list of args is the same as for the fprop method. bprop assumes fprop has been called beforehand. If the module has internal parameters, the bprop method will ACCUMULATE the gradients in it, so that multiple modules can share the same parameters
See: (==> gb-module fprop [ args ])
See: (==> gb-module bbprop [ args ])

8.1.0.7.0.0.2. (==> gb-module bbprop [args])

[MSG] (packages/gblearn2/gb-modules.lsh)

performs a backward propagation of second derivatives on the gb-module args are optional arguments which represent the external "plugs" of the module. By convention, the list of args is the same as for the fprop method. bbprop assumes fprop and bprop have been run beforehand. If the module has internal parameters, the bbprop method will ACCUMULATE second derivatives in it, so that multiple modules can share the same parameters
See: (==> gb-module fprop [ args ])
See: (==> gb-module bprop [ args ])

8.1.0.7.0.1. noop-module

(packages/gblearn2/gb-modules.lsh)

a module that does not do anything (a place-holder). This is NOT an identity-function module not compilable
See: id-module
See: gb-module

8.1.0.7.0.2. id-module

(packages/gblearn2/gb-modules.lsh)

identity function module. It's a straight pass-through forward and backward. arguments must be idx-ddstates non compilable.

8.1.0.7.0.3. idx4-module

(packages/gblearn2/gb-modules.lsh)

a basic "root" class for modules that have one single idx-state input and one single idx4-state output. the fprop, bprop and bbprop methods of this root class merely act as identity functions

8.1.0.7.0.4. idx3-module

(packages/gblearn2/gb-modules.lsh)

a basic "root" class for modules that have one single idx-state input and one single idx3-state output. the fprop, bprop and bbprop methods of this root class merely act as identity functions

8.1.0.7.0.5. idx2-module

(packages/gblearn2/gb-modules.lsh)

a basic "root" class for modules that have one single idx-state input and one single idx2-state output. the fprop, bprop and bbprop methods of this root class merely act as identity functions

8.1.0.7.0.6. idx1-module

(packages/gblearn2/gb-modules.lsh)

a basic "root" class for modules that have one single idx-state input and one single idx1-state output. the fprop, bprop and bbprop methods of this root class merely act as identity functions

8.1.0.7.0.7. idx4-squasher

(packages/gblearn2/gb-modules.lsh)

a basic squashing function layer for idx4-state you can udefine subclasses of this to change the squashing function

8.1.0.7.0.8. idx3-squasher

(packages/gblearn2/gb-modules.lsh)

a basic squashing function layer for idx3-state you can udefine subclasses of this to change the squashing function

8.1.0.7.0.9. idx4-sqsquasher

(packages/gblearn2/gb-modules.lsh)

square of hyperbolic tangent (or a rational approximation to it).

8.1.0.7.0.10. idx3-sqsquasher

(packages/gblearn2/gb-modules.lsh)

square of hyperbolic tangent (or a rational approximation to it).

8.1.0.7.0.11. idx4-halfsquare

(packages/gblearn2/gb-modules.lsh)

takes half square of each component.

8.1.0.7.0.12. idx3-halfsquare

(packages/gblearn2/gb-modules.lsh)

takes half square of each component.

8.1.0.7.0.13. logadd-layer

(packages/gblearn2/gb-modules.lsh)

performs a log-add over spatial dimensions of an idx3-state output is an idx1-state

8.1.0.7.0.13.0. cost

(packages/gblearn2/gb-modules.lsh)

costs are a special type of modules (although there is no definite subclass for them) with two inputs and one output. the output is an idx0-ddstate which stores a cost or energy. one of the inputs is meant to be the output of another module (e.g. a network), and the other input a desired output (or any kind of supervisor signal like a reinforcement). the gradient slot (dx) of the output state is generally filled with +1. That way, the bprop method of the cost module automatically computes the gradient.

8.1.0.7.0.14. idx3-cost

(packages/gblearn2/gb-modules.lsh)

abstract class for a cost function that takes an idx3-state as input, an int as desired output, and an idx0-state as energy.

8.1.0.7.0.15. mle-cost

(packages/gblearn2/gb-modules.lsh)

a cost module that propagates the output corresponding to the desired label. If the output is interpreted as a negative log likelihood, minimizing this output is equivalent to maximizing the likelihood. outputs are log-added over spatial dimensions in case of spatial replication.

8.1.0.7.0.15.0. (new mle-cost classes si sj)

[CLASS] (packages/gblearn2/gb-modules.lsh)

make a new mle-cost. classes is an integer vector which contains the labels associated with each output. From that vector, the reverse table is constructed to map labels to class indices. Elements in classes must be positive or 0, and not be too large, as a table as large as the maximum value is allocated. si and sj are the expected maximum sizes in the spatial dimensions (used for preallocation to prevent memory fragmentation).

8.1.0.7.0.16. mmi-cost

(packages/gblearn2/gb-modules.lsh)

a cost function that maximizes the mutual information between the actual output and the desired output. This assumes that the outputs are costs, or negative log likelihoods. this modules accepts spatially replicated inputs.

8.1.0.7.0.16.0. (new mmi-cost classes priors si sj prm)

[CLASS] (packages/gblearn2/gb-modules.lsh)

make a new mmi-cost. classes is an integer vector which contains the labels associated with each output. From that vector, the reverse table is constructed to map labels to class indices. Elements in classes must be positive or 0, and not be too large, as a table as large as the maximum value is allocated. priors : an idx1 of gbtypes, whose size must be the size of classes +1. It specifies the prior probability for each class, and for the junk class. The prior for the junk class must be in the last element. In absence of a better guess, the prior vector should be filled with 1/n, where n is its size. si and sj are the expected maximum sizes in the spatial dimensions (used for preallocation to prevent memory fragmentation). prm is an idx1-ddparam in which the value that determines the constant cost of the junk class will be stored. If the system is to be trained without junk examples, this parameter can be set to a very large value, and not be trained. The effect of setting this parameter to a fixed value is to softly saturate the costs of all the class to the half-square of that value (the overall energy will never be significantly larger then the half-square of the set value), and to softly clip the gradients, i.e. the units whose cost is higher than the half-square of the set value will receive negligible gradients. The parameter can be learned ONLY IF junk examples (with label -1) are present in the training set. There is a method, called set-junk-cost that allows to directly set the value of the junk without having to figure out the half-square business.

8.1.0.7.0.16.1. (==> mmi-cost set-junk-cost c)

[MSG] (packages/gblearn2/gb-modules.lsh)

set the constant cost of the junk class to c . the underlying parameter is given the value (sqrt (* 2 c )), so c must be positive.

8.1.0.7.0.17. fed-cost

(packages/gblearn2/gb-modules.lsh)

a replicable cost module that computes difference between the desired output (interpreted as a cost, log-summed over space) and the free energy of the set of outputs (i.e. the logsum of all the outputs over all locations). A label of -1 indicates that the sample is "junk" (none of the above). This cost module makes sense if it follows a an e-layer. FED stands for "free energy difference".

8.1.0.7.0.18. crossentropy-cost

(packages/gblearn2/gb-modules.lsh)

a replicable cross-entropy cost function. computes the log-sum over the 2D spatial output of the log cross-entropy between the desired distribution over output classes and the actual distribution over output classes produced by the network. This is designed to

8.1.0.7.0.19. edist-cost

(packages/gblearn2/gb-modules.lsh)

a replicable Euclidean distance cost function. computes the log-sum over the 2D spatial output of the half squared error between the output and the prototype with the desired label. this does not generate gradients on the prototypes

8.1.0.7.0.19.0. (new edist-cost classes si sj p)

[CLASS] (packages/gblearn2/gb-modules.lsh)

make a new edist-cost. classes is an integer vector which contains the labels associated with each output. From that vector, the reverse table is constructed to map labels to class indices. Elements in classes must be positive or 0, and not be too large, as a table as large as the maximum value is allocated. si and sj are the expected maximum sizes in the spatial dimensions (used for preallocation to prevent memory fragmentation). p is an idx2 containing the prototype for each class label. The first dimension of p should be equal to the dimension of classes . the second dimension of p should be equal to the number of outputs of the previous module. The costs are "log-summed" over spatial dimensions

8.1.0.7.0.20. wedist-cost

(packages/gblearn2/gb-modules.lsh)

a replicable weighted Euclidean distance cost function. computes the log-sum over the 2D spatial output of the weighted half squared error between the output and the prototype with the desired label. this does not generate gradients on the prototypes.

8.1.0.7.0.20.0. (new wedist-cost classes si sj p w)

[CLASS] (packages/gblearn2/gb-modules.lsh)

make a new wedist-cost. classes is an integer vector which contains the labels associated with each output. From that vector, the reverse table is constructed to map labels to class indices. Elements in classes must be positive or 0, and not be too large, as a table as large as the maximum value is allocated. si and sj are the expected maximum sizes in the spatial dimensions (used for preallocation to prevent memory fragmentation). p is an idx2 containing the prototype for each class label, and w is an idx2 with a single weight for each of these prototype and each of its elements. The first dimension of p (and w ) should be equal to the dimension of classes . the second dimension of p (and w ) should be equal to the number of outputs of the previous module. The costs are "log-summed" over spatial dimensions

8.1.0.7.0.21. weighted-mse-cost

(packages/gblearn2/gb-modules.lsh)

This is similar to wedist-cost but the weights matrix may run over patterns. The desired output vector has size two: the first element gives the class label, and the second element gives the position (row) in the weights matrix to use for the weighted euclidean distance. It is a replicable weighted Euclidean distance cost function. computes the log-sum over the 2D spatial output of the weighted half squared error between the output and the prototype with the desired label. this does not generate gradients on the prototypes

8.1.0.7.0.21.0. (new weighted-mse-cost classes si sj p w)

[CLASS] (packages/gblearn2/gb-modules.lsh)

make a new weighted-mse-cost. classes is an integer vector which contains the labels associated with each output. From that vector, the reverse table is constructed to map labels to class indices. Elements in classes must be positive or 0, and not be too large, as a table as large as the maximum value is allocated. si and sj are the expected maximum sizes in the spatial dimensions (used for preallocation to prevent memory fragmentation). p is an idx2 containing the prototype for each class label, and w is an idx2 with a single weight for each of these prototype and each of its elements. The first dimension of p (and w ) should be equal to the dimension of classes . the second dimension of p (and w ) should be equal to the number of outputs of the previous module. The costs are "log-summed" over spatial dimensions

8.1.0.7.0.22. ledist-cost

(packages/gblearn2/gb-modules.lsh)

a replicable Euclidean distance cost function with LOCAL TARGETS at each position. Target prototypes are associated to classes. The cost is the sum over the 2D output of the half squared error between the local output and the prototype with the desired label at that position. This does not generate gradients on the prototypes

8.1.0.7.0.22.0. (new ledist-cost classes p)

[CLASS] (packages/gblearn2/gb-modules.lsh)

make a new ledist-cost. classes is an integer vector which contains the labels associated with each output. From that vector, the reverse table is constructed to map labels to class indices. Elements in classes must be positive or 0, and not be too large, as a table as large as the maximum value is allocated. p is an idx2 containing the prototype for each class label. The first dimension of p should be equal to the dimension of classes . the second dimension of p should be equal to the number of outputs of the previous module. The costs are summed over spatial dimensions.

8.1.0.7.1. Classifiers

(packages/gblearn2/gb-modules.lsh)

8.1.0.7.1.0. idx3-classifier

(packages/gblearn2/gb-modules.lsh)

The idx3-classifier module take an idx3-state as input and produce a class-state on output. A class-state is used to represent the output of classifiers with a discrete set of class labels.

8.1.0.7.1.1. min-classer

(packages/gblearn2/gb-modules.lsh)

a module that takes an idx3-state, finds the lowest value and output the label associated with the index (in the first dimension of the state) of this lowest value. It actually sorts the labels according to their score (or costs) and outputs the sorted list.

8.1.0.7.1.1.0. (new min-classer classes)

[CLASS] (packages/gblearn2/gb-modules.lsh)

makes a new min-classer. classes is an integer vector which contains the labels associated with each output.

8.1.0.7.1.2. max-classer

(packages/gblearn2/gb-modules.lsh)

8.1.0.7.1.2.0. (new max-classer classes)

[CLASS] (packages/gblearn2/gb-modules.lsh)

makes a new max-classer. classes is an integer vector which contains the labels associated with each output.

8.1.0.7.1.3. edist-classer

(packages/gblearn2/gb-modules.lsh)

a replicable Euclidean distance pattern matcher, which finds the class prototype "closest" to the output, where "close" is based on the log-added euclidean distances between the prototype and the output at various positions. This corresponds to finding the class whose a-posteriori probability is largest, when P(c|data) = sum_[position=x] P(c at x |data at x) / n_positions and the priors over classes are uniform, and the local class likelihoods P(data at x | c at x) are Gaussian with unit variance and mean = prototype(c).

8.1.0.7.1.4. ledist-classer

(packages/gblearn2/gb-modules.lsh)

a replicable Euclidean distance pattern matcher, which finds the class prototype closest to the output for the vectors at each position in the output image.

8.1.0.7.1.4.0. (new ledist-classer classes p)

[CLASS] (packages/gblearn2/gb-modules.lsh)

make a new ledist-classer. classes is an integer vector which contains the labels associated with each prototype. p is an idx2 containing the prototype for each class label. The first dimension of p should be equal to the dimension of classes . the second dimension of p should be equal to the number of outputs of the previous module.

8.1.0.7.1.5. mmi-classer

(packages/gblearn2/gb-modules.lsh)

a classifier that computes class scores based on an MMI type criterion (a kind of softmax in log) It gives scores (cost) for all classes including junk. It should be used in conjunction with mmi-cost. This assumes that the output of the previous module are costs, or negative log likelihoods. this modules accepts spatially replicated inputs.

8.1.0.7.1.5.0. (new mmi-classer classes priors si sj prm)

[CLASS] (packages/gblearn2/gb-modules.lsh)

makes a new mmi-classer. The arguments are identical to that of mmi-cost. In fact if an mmi-classer is to used in conjunction with an mmi-cost, they should share the prior vector and the parameter. sharing the parameter can be done by first building the classer, then reducing the size of the parameter by one, then creating the cost.

8.1.0.7.1.5.1. (==> mmi-classer set-junk-cost c)

[MSG] (packages/gblearn2/gb-modules.lsh)

set the constant cost of the junk class to c . the underlying parameter is given the value (sqrt (* 2 c )), so c must be positive. BE CAREFUL that the junk parameter of an mmi-classer is usually shared by an mmi-cost, changing one will change the other.

8.1.0.7.1.5.2. (build-ascii-proto targets charset)

(packages/gblearn2/gb-modules.lsh)

8.1.0.7.2. idx3-supervised-module

(packages/gblearn2/gb-modules.lsh)

a module that takes an idx3 as input, runs it through a machine, and runs the output of the machine through a cost function whose second output is the desired label stored in an idx0 of int.