This package provides the functions of Neuristique's SN28 neural network
simulator. Paper documentation is available from Orlogia
mailto:info@orlogia.com or
http://www.orlogia.com .
8.1.2.0.0. Introduction to Netenv.
|
|
8.1.2.0.0.0. Overview of Netenv.
|
|
"Netenv" is the main component of the first generation libraries:
-
Primitive functions implement the elementary concepts of an unified
framework. There are primitive functions for allocating units, defining
connections and performing elementary operations on the units or the
connections.
- Connection functions are implemented in the file
"sn28/lib/connect.sn" . This library implement the most
common connection schemes: full connections, local connections, shared
weight connections, time delay connections,...
- Simulation functions are implemented in the file
"sn28/lib/netenv.sn" . This library calls the sequence of
primitive functions needed for training or testing a multilayer network
using various optimisation strategies. The user can program training and
test epochs, plot curves and adjust the various parameters of the
training algorithms.
- Example libraries take advantage of the Netenv mechanism for
implementing several neural network models. Since the sequencement is
performed in lisp, it is possible to build different algorithms by
writing new sequencement functions using the same primitives.
Curious readers might find Netenv based code for the following
algorithms:
- Kmeans and LVQ in directory
"sn28/examples/netlvq/" ,
- RBF networks in directory
"sn28/examples/netrbf/" ,
- Hopfield networks, Kanerva memories, Adalines and Rosenblatt's
perceptrons in directory "sn28/examples/netold/"
.
8.1.2.0.0.1. About this Netenv Manual.
|
|
This manual is a reference manual. It lists all the functions of Netenv
and describes their purpose. It also gives equivalent mathematical
formulas when such a description is adequate.
- Section 2
describes primitive functions directly implemented in C language. These
functions implement the elementary concepts of neural networks. There
are primitives for allocating units, defining connections and performing
elementary computations on units and connections.
- Section 3 introduces high level functions for training and testing
neural networks. These functions are implemented in file
"sn28/lib/netenv.sn" .
- Section 4 describes high level functions for creating standard
connection patterns between layers. These functions, located in file
"sn28/lib/connect.sn" , can handle full connections, local
connections, shared weight connections and time delay connections.
8.1.2.0.1. Low Level Functions in Netenv.
|
|
8.1.2.0.1.0. Introduction to Low Level Functions in Netenv.
|
|
In this section, we describe the primitive C functions of SN2.8 for the
simulation of neural networks. These functions usually are very simple
and thoroughly optimized.
Most of these functions, however, are seldom used directly: users call
instead high level functions which actually use the low level functions.
Primitive functions are the basic elements of such higher level
functions You may use them if you want to define a new algorithm and
thus write a new library.
Understanding these basic computation functions of the simulator,
however, is a very efficient way to use the full capacity of SN2.8.
8.1.2.0.1.1. Creating and Scaning Networks Architectures in Netenv.
|
|
The network is the main data struture of the simulator. This is an
arbitrary oriented graph. A number of units or neurons are linked by
oriented connections or synapses. The biological analogy stops here...
.IP A unit is identified by its number. .IP A layers is represented as a
list of numbers.
The special unit 0 is often referred
to as the bias unit. It is always allocated and its initial state is set
to 1 . Establishing connections from
this unit is a simple way to introduce an adaptable bias parameters.
8.1.2.0.1.1.0. Allocation of Neurons and Connections in Netenv.
|
|
8.1.2.0.1.1.0.0. (alloc-net units links [weights])
|
|
Allocates memory space for networks with at most units units and at most
link connections. The third argument weights is only allowed by kernels
sn28ite and sn28itenew. It defines the maximal number of independent
weights. If this third argument is not given, the number of weights
allocated is equal to the number of connections.
This function returns the number of bytes allocated.
8.1.2.0.1.1.0.1. (clear-net)
|
|
Destroy all units and connections created by previous calls to functions
newneurons or connect .
There is only one unit remaining after a call to
clear-net : the bias unit (unit number
0 ) whose state is set to 1
.
8.1.2.0.1.1.0.2. nmax
|
[VAR] |
Contains the maximum number of units, as set by function
alloc-net . This number includes the bias unit. The largest
available unit number thus is nmax-1 .
8.1.2.0.1.1.0.3. smax
|
[VAR] |
Contains the maximum number of connections, as set by function
alloc-net .
8.1.2.0.1.1.0.4. wmax
|
[VAR] |
With kernels sn28ite and sn28itenew only, this variable contains the
maximum number of independent parameters in a network, as set by
function alloc-net .
8.1.2.0.1.1.0.5. nnum
|
[VAR] |
This variable contains the number of units currently used.
8.1.2.0.1.1.0.6. snum
|
[VAR] |
This variable contains the number of connections currently used.
8.1.2.0.1.1.0.7. wnum
|
[VAR] |
With kernels sn28ite and sn28itenew only, this variable contains the
number of shared weights currently used.
8.1.2.0.1.1.0.8. (newneurons n)
|
|
Creates n new units in the network
memory and returns a list of numbers. A sufficient storage space should
have been previously allocated using alloc-net
. This function never connects the bias unit to the new units. This must
be done explicitly when required.
; create a set of 10 units and store the list of their indices in the variable my-layer
? (setq my-layer (newneurons 10))
= (5 6 7 8 9 10 11 12 13 14)
See: (alloc-net units
links [ weights ])
8.1.2.0.1.1.1. Connections in Netenv.
|
|
8.1.2.0.1.1.1.0. (connect presyn postsyn [val [eps]])
|
|
Creates a new connection from unit presyn
to unit postsyn . Arguments
presyn or postsyn may also
be lists of units. In this case, no optional parameters are allowed and
all units of presyn are connected to
all units of postsyn .
The optional argument val specifies an
(initial) weight value for this connection. With kernels sn28new and
sn28itenew, the optional argument eps
specifies the learning rate of the connection.
8.1.2.0.1.1.1.1. (dup-connection m1 m2 n1 n2 [val [eps]])
|
|
With kernels sn28ite and sn28itenew only, this function creates a new
connection from n1 to units
n2 which shares the same weight as the connection from
m1 to m2 . This function is
used for creating networks with equality constraints among the weights.
Like function connect , arguments
m1 , m2 ,
n1 and n2 may be lists of
cell indices.
Like function connect , function
dup-connection handles the optional arguments
val for setting the weight value and
eps for setting the learning rate.
8.1.2.0.1.1.1.2. (amont n)
|
(netenv.sn) |
This function returns the list of all units that have a connection to
unit n (a list of the pre-synaptic
cells of cell n ). The order of the
units in the returned list is the order of creation of the corresponding
connections.
8.1.2.0.1.1.1.3. (pre-syn n)
|
|
This function is a synonim for amont .
8.1.2.0.1.1.1.4. (aval n)
|
(netenv.sn) |
This function returns the list of all units which that a connection from
unit n (a list of the post-synaptic
cells of cell n ). The order of the
units in the returned list is the order of creation of the corresponding
connections.
8.1.2.0.1.1.1.5. (post-syn n)
|
|
This function is a synonim for amont .
8.1.2.0.1.1.1.6. (nfanin n)
|
|
Returns the number of units connected to unit n
. The fan-in of a unit is not stored in a predefined field; function
nfanin recomputes it each time.
8.1.2.0.1.1.1.7. (cut-connection from to)
|
|
Function cut-connection removes the
connection between cells from and
to from the internal lists used for performing the network
computations.
Cutting a connection does not free the corresponding weight slot. The
weight value however is cleared when you cut the last connection sharing
this weight. This property ensures that the weight vector generated by
BPtool after pruning a network can be used by the regular C code
generated by Nettool.
8.1.2.0.1.2. Accessing Internal Variables in Netenv.
|
|
When a network is created, space is allocated for recording numerical
values on the units and connections of the network graph.
Internal variables associated with the units are called ``nfields'', an
acronym for ``neuron fields''. Internal variables associated with the
connections are called ``sfields'', an acronym for ``synapse fields''.
Fields are highly specialized: in order to increase speed, many
computational functions are working on implicit fields. In addition, all
possible fields are not available on all versions of SN2.8. There is no
reason indeed to slow down a regular network by updating the fields
required by a shared weigths network.
8.1.2.0.1.2.0. Accessing the Unit Fields ``nfields''.
|
|
Here is a list of the fields allocated for each unit, as well as their
availability on the major kernels of SN2.8. Each nfield is identified by
a number. Symbolic names are defined by the
"netenv.sn" library.
Fields marked with a star (*) do not exist in sn28new and
sn28itenew.Accessing these fields however is simulated by writing to a
reading from the equivalent sfields s-epsilon
and s-sigma of the incoming
connections.
8.1.2.0.1.2.0.0. Convertion between Fields and Numbers or Lists of Numbers in Netenv.
|
|
8.1.2.0.1.2.0.0.0. (nfield n f [v])
|
|
Set/get field number f of unit
n . If the third optional argument is given, this function
sets the field f of unit
n to value v . This
function always returns the current value of field
f .
A group of functions have been defined for directly accessing most
fields.
8.1.2.0.1.2.0.0.1. (nval n [v])
|
|
Set/get the state of unit n (in nfield
n-val ).
8.1.2.0.1.2.0.0.2. (ngrad n [v])
|
|
Set/get the activation gradient attached to unit
n (in nfield n-grad ).
8.1.2.0.1.2.0.0.3. (nsum n [v])
|
|
Set/get the activation of unit n (in
nfield n-sum ).
8.1.2.0.1.2.0.0.4. (nbacksum n [v])
|
|
Set/get the output gradient of unit n
(in nfield n-backsum ).
8.1.2.0.1.2.0.0.5. (nepsilon n [v])
|
|
Set/get the learning rate of unit n
(in nfield n-epsilon ).
With kernels sn28ite and sn28itenew, this function sets the learning
rate of all the connections to cell n
and returns the mean of those learning rates.
8.1.2.0.1.2.0.0.6. (nggrad n [v])
|
|
Sets/gets the instantaneous second derivative of the cost function with
respect to the activation of unit n
(in nfield n-ggrad ).This is not the
real second derivative, but the contribution of the current pattern to
the overall second derivative.
This function is defined by kernels sn28new and sn28itenew only.
8.1.2.0.1.2.0.0.7. (nsqbacksum n [v])
|
|
Sets/gets the instantaneous second derivative of the cost function with
respect to the output of unit n (in
nfield n-sqbacksum ). This is not the
real second derivative, but the contribution of the current pattern to
the overall second derivative.
This function is defined by kernels sn28new and sn28itenew only.
8.1.2.0.1.2.0.0.8. (nsigma n)
|
|
Returns the moving average of the second derivative of the cost function
with respect to the connections arriving on unit
n . Since field n-sigma
does not exist, this function computes the average of the sfields
s-sigma of the incoming connections of unit
n .
This function is defined by kernels sn28new and sn28itenew only.
8.1.2.0.1.2.0.0.9. (nspare1 n [v]) (nspare2 n [v]) (nspare3 n [v])
|
|
Set/get one of the three spare fields attached to unit
n . These fields are not used by the simulator routines and
can be used freely by the user.
8.1.2.0.1.2.1. Accessing the Connection Fields: ``sfields''.
|
|
Similarly, fields associated to each connection are named
sfields . Here is a list of the fields in the connections.
Like nfields , the various
sfields are identified by a number.
Kernels sn28ite and sn28itenew implement shared weight networks by
setting up connections which physically share certain sfields: they
point to the same piece of memory. A star (*) in the above table
indicates those sfields which are shared when you set up a shared weight
connection with function dup-connection
.
8.1.2.0.1.2.1.0. (sval n1 n2 [v])
|
|
Set/get the value of the connection going from unit
n1 to unit n2 (in sfield
s-val ). This sfield usually contains the weight of the
connection. This function returns the empty list if the connection does
not exist. It produces an error if you attempt to set the value of a
non-existing connection.
8.1.2.0.1.2.1.1. (sdelta n1 n2 [v])
|
|
Set/get the weight change of the connection going from unit
n1 to unit n2 (in sfield
s-delta ). This function returns the empty list if the
connection does not exist. It produces an error if you attempt to set
the value of a non-existing connection.
8.1.2.0.1.2.1.2. (sepsilon n1 n2 [v])
|
|
Set/get the learning rate of the connection going from unit
n1 to unit n2 (in sfield
s-epsilon ). This function returns the empty list if the
connection does not exist. It produces an error if you attempt to set
the value of a non-existing connection.
This function is defined by kernels sn28new and sn28itenew only.
8.1.2.0.1.2.1.3. (sacc n1 n2 [v])
|
|
Set/get the accumulated gradient for the weight associated to the
connection from unit n1 to unit
n2 . This function returns the empty list if the connection
does not exist. It produces an error if you attempt to set the value of
a non-existing connection.
This function is defined by kernels sn28ite and sn28itenew only.
8.1.2.0.1.2.1.4. (ssigma n1 n2 [v])
|
|
Sets/gets the partial time average of the second derivative of the cost
function for the connection between unit n1
and n2 (in sfield
s-sigma ). This function returns the empty list if the
connection does not exist. It produces an error if you attempt to set
the value of a non-existing connection.
This function is defined by kernels sn28new and sn28itenew only.
8.1.2.0.1.2.1.5. (shess n1 n2 [v])
|
|
Set/get the estimate of the Hessian's diagonal term associated to the
weight of the connection between units n1
and n2 (in sfield
s-hess ). This function returns the empty list if the
connection does not exist. It produces an error if you attempt to set
the value of a non-existing connection.
This function is defined by kernels sn28new and sn28itenew only.
8.1.2.0.1.2.1.6. (mean-sqr-weight n)
|
|
Returns the mean squared value of the weights (in sfields
s-val ) of the connections arriving to cell
n .
8.1.2.0.1.2.1.7. (mean-sqr-delta n)
|
|
Returns the mean square value of the last weight change (in sfields
s-delta ) of the connections arriving to cell
n .
8.1.2.0.1.2.1.8. (scounter n1 n2)
|
|
Get the number of connections which share the same sfields as the
connection between units n1 and
n2 . This function returns 0
if the connection does not exist.
This function is defined by kernels sn28ite and sn28itenew only.
8.1.2.0.1.2.1.9. (sindex n1 n2 [v])
|
|
This function sets/gets the identification index of the weight
associated to the connection from unit n1
to unit n2 . Connections with the same
index actually share their sfields. Function sindex may be used either
to check or to modify the way connections are shared. It returns
() if the connection does not exist.
This function is defined by kernels sn28ite and sn28itenew only.
8.1.2.0.1.3. Fields Manipulation in Netenv.
|
|
Two kinds of functions operate on nfields and sfields. The first ones
are highly optimized algorithm-dependant functions. The second ones
implement several general purpose computations. We describe here these
latter functions.
8.1.2.0.1.3.0. Unit Fields Manipulation in Netenv.
|
|
8.1.2.0.1.3.0.0. (set-nfield l f v)
|
|
Sets the fields f of all units in
layer l to the value
v .
(set-nfield input n-val 0)
; clears all the states of layer input.
8.1.2.0.1.3.0.1. (gauss-state l sdev)
|
|
Adds a zero mean gaussian noise of standard deviation
sdev to the states of the cells specified by list
l .
8.1.2.0.1.3.0.2. (flip-state l p)
|
|
Changes the state sign with probability p
for all units in list l .
8.1.2.0.1.3.0.3. (find-winner l) (find-loser l)
|
|
These functions return the unit of list l
whose field n-val has the largest
(respectively smallest) value.
8.1.2.0.1.3.0.4. (copy-nfield l1 f1 [l2] f2)
|
|
Copies field f2 of units in list
l2 into field f1 of units
in list l1 . The fields are identified
by integers or by their symbolic names as described with function
nfield .
If argument l2 is not defined, list
l1 is assumed. Execution is slightly faster in this case.
See: (nfield n
f [ v ])
8.1.2.0.1.3.0.5. (op-nfield l f1 f2 g2 f3 g3)
|
|
Performs a linear combination of fields f2
and f3 of units in
l and puts the result in field f1
of the same unit. The operation performed is defined by the real numbers
g2 and g3 in the following
way:
f1 = g2*f2 + g3*f3
8.1.2.0.1.3.0.6. (mul-nfield l f1 f2 f3)
|
|
Computes the product of fields f2 and
f3 of units in l and puts
the result in field f1 of the same
unit. The operation performed is defined in the following way:
f1 = f2*f3
Function mul-nfield returns the sum of
all the computed terms i.e. the dot product of fields
f2 and f3 .
8.1.2.0.1.3.0.7. (invert-nfield l f1 f2 g)
|
|
Divides the scalar g by fields
f2 of units in l and puts
the result in field f1 of the same
unit. The operation performed is defined in the following way :
f1 = g / f2
Function invert-nfield returns the sum
of all the computed terms.
8.1.2.0.1.3.1. Connection Fields Manipulation in Netenv.
|
|
8.1.2.0.1.3.1.0. (set-sfield l <sf v)
|
|
Sets field sf of all incoming connections of layer
l to value v .
8.1.2.0.1.3.1.1. (copy-sfield l1 sf1 [l2] sf2)
|
|
Copies field sf2 of incoming
connections of layer l2 into field
sf1 of incoming connections of layer
l1 . If argument l2 is not
provided, l1 is assumed.
8.1.2.0.1.3.1.2. (op-sfield l sf1 sf2 g2 sf3 g3)
|
|
Performs a linear combination of fields sf2
and sf3 of all incoming connections of
layer l and puts the result into
sfield sf1 . The operation is defined
by the reals g2 ,
g3 :
sf1 = g2 * sf2 + g3 * sf3
8.1.2.0.1.3.1.3. (mul-sfield l sf1 sf2 sf3)
|
|
Computes the product of fields sf2 and
sf3 of all incoming connection of layer
l . The result is stored into field
sf1 .
8.1.2.0.1.3.1.4. (invert-sfield l sf1 sf2 g)
|
|
Divides g by the value of field
sf2 of all incoming connection of layer
l . The quotients are stored into field
sf1 .
8.1.2.0.1.3.1.5. (set-swn n sf sf)
|
|
Takes fields f from all units
connected to unit n and transfers them
into field sf of the corresponding
connections.
8.1.2.0.1.3.1.6. (acc-swn n sf f)
|
|
Takes field f from all units connected
to unit n and adds it to field
sf of the corresponding connections.
8.1.2.0.1.4. Non Linear Functions.
|
|
Non Linear Functions (NLF) are special lisp objects which describe a
numerical function f(x) of a real
variable. NLFs are used for specifying the activation functions of the
units and their derivatives.
8.1.2.0.1.4.0. (NLF x)
|
[NLF] |
A NLF object can be used like a Lisp function. This call return
f(x) , where f is the
function associated to the NLF NLF .
Returns the null NLF:
f(x) = 0
8.1.2.0.1.4.2. (nlf-f-tanh A B C D)
|
|
Returns a sigmoid NLF:
f(x) = A tanh (B x) + C x + D
8.1.2.0.1.4.3. (nlf-f-lin A B C D)
|
|
Returns a linear NLF:
f(x) = A B x + C x + D
8.1.2.0.1.4.4. (nlf-f-piecewise A B C D)
|
|
Returns a piecewise linear NLF:
f(x) = A g(Bx) + Cx + D
where
g(x) = -1 if x < -1
g(x) = 1 if x > +1
g(x) = x otherwise
8.1.2.0.1.4.5. (nlf-f-threshold A B C D)
|
|
Returns a threshold NLF:
f(x) = A g(Bx) + Cx + D
where
g(x) = 1 if x > 0
g(x) = -1 otherwise
8.1.2.0.1.4.6. (nlf-f-bell)
|
|
Returns a bell shaped NLF:
f(x) = 0 if x < -1
f(x) = 0 if x > +1
f(x) = (1 - x**2)**3 otherwise
8.1.2.0.1.4.7. (nlf-f-lisp f)
|
|
Returns a NLF associated to the lisp function f
. Applying this NLF to a number x
calls the lisp function f and returns
the result. This is especially fast if the lisp function is a DZ.
See: DZ
8.1.2.0.1.4.8. (nlf-f-spline xl yl)
|
|
Returns a tabulated NLF. The graph of this NLF is a smooth interpolation
of the points whose abcissas are specified by list
xl and which ordinates are specified by list
yl .
8.1.2.0.1.4.9. (nlf-df-all nlf)
|
|
Returns a NLF describing the derivative of NLF
nlf . This derivative is computed with a finite difference
and thus is not very accurate.
8.1.2.0.1.4.10. (nlf-ddf-all nlf)
|
|
Returns a NLF describing the second derivative of NLF
nlf . Again, this derivative is computed with finite
differences and is inaccurate.
8.1.2.0.1.4.11. (nlf-df-tanh A B C D)
|
|
Returns a NLF describing the derivative of a sigmo•d NLF. This NLF
computes the derivative faster and more accurately than (nlf-df-all
(nlf-tanh A B C D)). It actually computes:
df(x) = AB (1 + tanh2 (Bx)) + C
8.1.2.0.1.4.12. (nlf-df-bell)
|
|
Returns the derivative of the bell shaped NLF:
df(x) = 0 if x < -1
df(x) = 0 if x > +1
df(x) = -6 x (1 - x**2)**2 otherwise
8.1.2.0.1.4.13. (nlf-df-spline X Y)
|
|
This function returns NLFs describing the first derivative of a
tabulated NLF. As usual, computation is faster and more accurate than
(nlf-df-all (nlf-spline X Y))
8.1.2.0.1.4.14. (nlf-ddf-spline X Y)
|
|
This function returns NLFs describing the second derivative of a
tabulated NLF. As usual, computation is faster and more accurate than
(nlf-ddf-all (nlf-spline X Y))
8.1.2.0.1.5. Functions for Quasi-Linear Units.
|
|
8.1.2.0.1.5.0. Propagation Implementation for Quasi-Linear Units.
|
|
8.1.2.0.1.5.0.0. (update-weighted-sum [theta]...layers...)
|
|
Computes the weighted sum (stored in nfield n-sum
) of each cell in layers layers, according to the following formula:
nsum(i) = Sum on j ( sval(j,i) nval(j) )
The optional argument theta specifies
the standard deviation of a gaussian noise added to the weighted sum.
8.1.2.0.1.5.0.1. (update-state-only nlf...layers...)
|
|
Applies NLF nlf to the units in layers
layers.
nval(i) = NLF [nsum(i)]
8.1.2.0.1.5.0.2. (update-state [theta] nlf...layers...)
|
|
This function combines the two previous functions. It first computes the
weighted sum for each unit and then applies the NLF
nlf . This is the basic function for updating the state of
one or several layers.
See: (update-weighted-sum [ theta
]... layers ...)
See: (update-state-only nlf
... layers ...)
8.1.2.0.1.5.1. Online Gradient Implementation for Quasi-Linear Units.
|
|
The following functions are used for implementing ``online'' or
``stochastic'' gradient algorithms. In these algorithms, the weights are
updated after each pattern presentation.
8.1.2.0.1.5.1.0. (init-grad-lms dnlf output desired)
|
|
Computes the gradient of the following cost function (Mean Squared
Error):
Cost = 0.5 * Sum on i (nval(output i) - nval(desired i))**2
with respect to the weighted sum and state of units in layer output:
- Gradient of Cost on nval(outputi)
= nbacksum(output i) = nval(desired i) - nval(output i)
- Gradient of Cost on nsum(output i))
= ngrad(output i) = nbacksum(output i) DNLF[nsum(output i)]
In addition, kernels sn28new and sn28itenew set the second derivative to
1:
nsqbacksum(output i) = 1
The desired values for a network are usually stored in a pseudo layer
desired. The function init-grad-lms
compares these desired values with the states of the output layer
output and initializes the computation of the gradients.
Function init-grad-lms returns the
value of the cost function.
8.1.2.0.1.5.1.1. (init-grad-thlms dnlf output desired threshold)
|
|
Computes the gradient of a thresholded cost function defined as follows:
nbacksum(output i)
= 0 if nval(output i) > threshold and nval(desired i) > threshold,
= 0 if nval(output i) < threshold and nval(desired i) < threshold,
= nval(desired i) - nval(output i) otherwise.
- Gradient of Cost on nsum(outputi)
= ngrad(output i) = nbacksum(output i) * DNLF[nsum(output i)]
This cost function is useful for implementing Rosenblatt's perceptron
like algorithms. These algorithms however are often unstable when the
minimal cost is not zero.
8.1.2.0.1.5.1.2. (update-back-sum...l...)
|
|
Computes the derivatives of the cost with respect to the states of units
in layers l , given the derivatives
with respect to the weighted sum of the downstream units. .VP - Gradient
of Cost on nval(i) = nbacksum(i) = sum on j (sval(i, j) * ngrad(j)) .PP
8.1.2.0.1.5.1.3. (update-gradient-only dnlf...l...)
|
|
Computes the derivatives of the cost with respect to the activations
(nfields n-sum ) of units in layers
l , given the derivatives with respect to their states.
- Gradient of Cost on nsum(i)
= ngrad(i) = nbacksum(i) * DNLF[nsum(i)]
The NLF object dnlf must be the
derivative of the NLF object given as argument to the function
update-state-only called on that layer.
8.1.2.0.1.5.1.4. (update-gradient dnlf...l...)
|
|
Calls both update-backsum and
update-gradient-only . This is the basic step of the well
known back-propagation algorithm.
8.1.2.0.1.5.1.5. (update-weight alpha decay [...l...])
|
|
Updates the weights according to the gradient of the cost, the momentum
parameter alpha and the exponential
decay parameter decay .
sdelta(i, j) = alpha * sdelta(i, j) + nval(i) * ngrad(j) * nepsilon(j)
sval(i, j) = ( 1 - decay * nepsilon(j) ) * sval(i, j) + sdelta(i, j)
Kernels sn28new and sn28itenew use sepsilon(i,j)
instead of nepsilon(j) .
When layers l are specified, the
weight update is only performed on the incoming connections to layers
l . Otherwise, the weight update happens in all the network.
When arguments alpha and/or
decay are zero, a faster optimized code is used.
With kernels sn28ite and sn28itenew, the possible presence of shared
weights makes the computation more complex and slower. Function
update-weight then calls the functions
clear-acc , update-acc ,
update-delta and update-wght-only
described in the next section. Three differences with the non iterative
version are then important:
- The execution is 40% slower,
- Argument decay is no longer scaled by the value of field n-epsilon:
sval(i, j) = (1 - decay) * sval(i, j) + sdelta(i, j)
- Restricting the update to specific layers with an the optional
argument l has an ambiguous meaning.
Selected connections may shared weights with unselected connections.
Since the right computation is very slow, it is considerably more
efficient to use directly clear-acc
and update-acc as discussed in the
sequencement section.
8.1.2.0.1.5.2. Batch Gradient Implementation for Quasi-Linear Units.
|
|
Batch gradient (also called true gradient) consists in updating the
weights after the presentation of several patterns. A new sfield,
s-acc accumulates the contribution of each pattern. This
field is only present whith kernels sn28ite and sn28itenew.
The sfield s-acc is also used for
accumulating the contributions of several connections to the gradient of
shared weights. In fact, shared weights and batch gradient use the same
underlying mechanisms. Four functions then replace the usual call to
update-weight :
8.1.2.0.1.5.2.0. (clear-acc)
|
|
Sets the s-acc field of all weights to
zero.
sacc = 0
8.1.2.0.1.5.2.1. (update-acc [...l...])
|
|
Adds the contribution to the gradient of all incoming connections to
layer l . When argument
l is omitted, all existing connections are considered.
sacc(i, j) = sacc(i, j) + nval(i) * ngrad(j) * nepsilon(j)
Kernels sn28new and sn28itenew use sepsilon(i,j)
instead of nepsilon(j) .
8.1.2.0.1.5.2.2. (update-delta alpha)
|
|
Updates the sfields s-delta of all
weights, according to the value of the sfields
sacc field and to the momentum parameter
alpha :
sdelta = alpha * sdelta + sacc
8.1.2.0.1.5.2.3. (update-wghtonly decay)
|
|
Updates the weights, according to sfields s-delta
and to the decay parameter decay .
sval = (1 - decay) * sval + sdelta
8.1.2.0.1.5.3. Second Order Algorithms Implementation for Quasi-Linear Units.
|
|
Adjusting the learning rates in a neural network is sometimes difficult
and often tricky. A simple and efficient idea is used by the Newton
algorithm: relying on the curvature of the cost function. The stronger
the curvature, the smaller the learning rate in that direction.
The curvature is described by the matrix of the second derivatives of
the cost, the Hessian matrix. Since this matrix has a huge number of
coefficients, we only compute the diagonal coefficients.
The weight update equation then becomes
W = W - Gradient of Cost on W * Inverse( Hessian of Cost on W )
This simple approach, however, is plagued by several problems:
W = W - sepsilon * Gradient of Cost on W / (mu + |ssigma|)
We have replaced our learning rate problem by three new parameters.
These new parameters however are much more robust: we always use the
same values although more refined values may prove more efficient:
sepsilon = 0.0005
gamma = mu = 0.05
This scheme is further complexified by two important additional factors:
-
The contributions to the second derivatives of shared connections are
accumulated in a new sfield s-hess .
- It is empirically more efficient to use the Levemberg-Marquard
approximation of the second derivatives. This approximation ensures that
the value of field s-sigma is greater
or equal to zero!
These algorithms are implemented by kernels sn28new and sn28itenew. Two
sets of new functions are required for computing the derivatives and for
updating the weights.
8.1.2.0.1.5.3.0. (update-ggradient dnlf ddnlf gamma...layers...)
|
|
This function first computes the nfields n-ggrad
of units in layers layers . It then
computes the sfields s-sigma of all
incoming connections to layers layers. Argument
ddnlf is the second derivative of the NLF used for that
specific layer. Argument dnlf is the
first derivative of the NLF.
For each unit i in layers:
nsqbacksum(i) = Sum on k ([sval(i, k)]**2 * nggrad(k))
nggrad(i) = nbacksum(i) * DDNLF(nsum(i)) + nsqbacksum(i) * [ DNLF(nsum(i)) ]**2
ssigma(j, i) = gamma * nggrad(i) * [nval(j)]**2 + (1 - gamma) * ssigma (j, i)
This computation requires that the fields n-ggrad
of all downstream units are correct. All the fields
n-ggrad and s-sigma thus
are computed during a single backward reccurrence similar to the usual
back-propagation.
Remark:
- If a unit has no downstream connection, field
n-sqbacksum is left unchanged. Applying
update-ggradient to the output layer thus leaves the value
1 stored by init-grad in
that field and still computes fields n-ggrad
and s-sigma .
8.1.2.0.1.5.3.1. (update-lmggradient dnlf gamma...layers...)
|
|
This function computes the Levenberg-Marquardt approximation of
s-sigma . This function takes no argument
ddnlf for the second derivative of the NLF because the
Levenberg-Marquardt approximation consists in ignoring this second
derivative!
The computation is similar to that of
update-ggrad except that the nfield
n-ggrad is computed according to the following formula:
nggrad(i) = nsqbacksum(i) * [DNLF(nsum(i))]2
8.1.2.0.1.5.3.2. (clear-hess)
|
|
This function is similar to clear-acc
. It clears the sfield s-hess in each
weight.
shess = 0
8.1.2.0.1.5.3.3. (update-hess [...l...])
|
|
Accumulates the sfield s-sigma of each
connection into the sfield s-hess of
the corresponding weight. When arguments l
are present, only the incoming connections to layers
l are processed.
For each connection:
shess = shess + ssigma
8.1.2.0.1.5.3.4. (hessian-scale mu)
|
|
This function is usually inserted before
update-delta and update-wghtonly
. It scales field s-acc according to
the ``second derivative'' stored in field s-hess
and to the parameter mu . The usual
functions update-delta and
update-wghtonly then perform the desired weight
update .
For all weights:
sacc = sacc / (mu + |shess|)
8.1.2.0.1.5.3.5. (update-w-newton alpha decay mu [...layers...])
|
|
This function is similar to update-weight, but updates the weights
according to the quasi-Newton equation. Actually, this function calls in
sequence:
(clear-acc)
(clear-hess)
(update-acc...layers...)
(update-hess...layers...)
(hessian-scale mu)
(update-delta alpha)
(update-wghtonly decay)
The performance remarks discussed with function
update-weight function are even more valid. With kernel
sn28itenew, function update-w-newton
becomes very slow if you specify argument layers. It is much better to
use the basic functions themselves.
See: (update-weight alpha
decay [... l ...])
8.1.2.0.1.5.4. Conjugate Gradient Implementation for Quasi-Linear Units.
|
|
Conjugate Gradient is a powerful batch optimization algorithm which
outperforms all other algorithms implemented in SN when looking for a
high optimization accuracy. This algorithm is especially suitable to
function approximation tasks.
This state-of-the-art implementation of the conjugate gradient method
rely on partial line searches based on the analytical computation of the
cost function curvature. Such partial line searches are allowed by a
self-restarting Hestenes-Stiefel formula for computing the conjugate
directions.
8.1.2.0.1.5.4.0. (update-deltastate dnlf...layers...)
|
|
Function update-deltastate is the
basic component of the computation of the Gauss Newton approximation to
the curvature information. This computation is performed during a single
forward propagation. If the connection field
s-delta contains the search direction, function
update-deltastate stores in fields
n-grad the derivative of the cell state with respect to a
weight change along the search direction.
For each cell i in layers layers,
function update-deltastate updates the
gradient field n-grad according to the
following formula:
ngrad(i) = DNLF(nsum(i)) isu(j;; sdelta(j,i) nval(j) + sval(j,i) ngrad(j))
8.1.2.0.1.5.4.1. (update-wghtonly-conjgrad curvature)
|
|
Assuming that connection field s-acc
contains the gradient, that connection field
s-delta contains the search direction, and that argument
curvature is the second derivative along the search direction, function
update-wghtonly-conjgrad performs a Newton step along the
search direction. This is the main component of the partial line search.
This function updates the weight vector according to the following
formula:
sval = sval + f(sdelta . sacc;curvature) sdelta
8.1.2.0.1.6. Functions for Implementing Euclidian Units.
|
|
Instead of computing:
f( Sum on i (wi xi) )
euclidian unit compute a function of the distance between the vector
xi and the weights wi :
f( Sum on i (wi - xi)**2 )
Euclidian units are useful for implementing several algorithms, like
Kohonen's map, Learning Vector Quantization, Radial Basis Functions, K
means and Nearest Neighbour. This implementation in Netenv provides an
alternative to KNNenv.
A few additional functions whose name contains
"nn" implement these units in SN2.8. These functions just
replace their standard counterparts for quasi-linear units. An euclidian
unit in SN is the same object than a quasi-linear unit. The only
difference is the use of a different set of functions for computing
their states, gradients and weights.
8.1.2.0.1.6.0. Propagation Implementation for Euclidean Units.
|
|
8.1.2.0.1.6.0.0. (update-nn-sum [theta]...layers...)
|
|
This is the euclidian counterpart of
update-weighted-sum . It computes the nfield
n-sum of each unit in layers
according to the following formula:
nsum (i) = Sum on j ( sval(j, i) - nval(j) )**2
Argument theta is the standard
deviation of an optional gaussian noise added during the computation.
8.1.2.0.1.6.0.1. (update-nn-state [theta] nlf...layers...)
|
|
This is the counterpart of update-state
. It first calls update-nn-sum then
calls update-state-only computing thus
field n-val of each euclidian unit in
layer layers .
8.1.2.0.1.6.1. Gradient Descent Algorithms Implementation for Euclidean Units.
|
|
Back propagating through euclidian units is essentially identical to
back-propagating through quasi-linear units. The equations however are
slightly different and are implemented by new SN2.8 functions.
8.1.2.0.1.6.1.0. (update-nn-back-sum...layers...)
|
|
This is the counterpart of update-back-sum
. It computes the field n-backsum of
units in layers according to the value of field
n-grad of their downstream units. These downstream units are
assumed to be Euclidian.
nbacksum(i) = 2 * Sum on k ( ngrad(k) * [nval(i) - sval(i,k)] )
Units in layers do not need to be actually euclidian units. Their
downstream units however have to be euclidian units.
8.1.2.0.1.6.1.1. (update-nn-gradient dnlf...layers...)
|
|
This is the counterpart of update-gradient. It computes the field n-grad
by calling update-nn-back-sum and
update-gradient-only .
- Units in layers do not need
to be euclidian units. Their downstream units however have to be
euclidian units.
- Argument dnlf is the derivative
of the NLF used by update-state-only
for layers layers . It is not the
derivative of the NLF associated to the euclidian units which actually
do not belong to layers layers .
8.1.2.0.1.6.1.2. (update-nn-weight alpha decay...layers...)
|
|
This is the counter part of update-weight
. It updates the weights for all incoming connections of units in
layers . These units are assumed to be euclidian.
For each unit i in layers,
For each unit j connected to i:
sdelta (i,j) = alpha * sdelta(j,i) + (1 - alpha) * ngrad(i) * (sval(j,i) - nval(j)) * nepsilon(i)
sval(i,j) = (1 - nepsilon(i) * decay) * sval(j, i) + sdelta(j, i)
With kernels sn28ite or sn28itenew, this function has the same
limitations as function update-weight
. It is then better to use explicitely clear-acc
, update-nn-acc ,
update-delta and update-wghtonly
.
See: (update-weight alpha
decay [... l ...])
8.1.2.0.1.6.1.3. (update-nn-acc...layers...)
|
|
Adds the contribution of the incoming connections of units in
layers to the gradient according to the following formula.
This function is only available with kernels sn28ite and sn28itenew.
sacc(j,i) = sacc(j, i) + ngrad(i) * ( sval(j, i) - nval(j) ) * nepsilon(i)
8.1.2.0.1.6.2. Second Order Algorithms Implementation for Euclidean Units.
|
|
Similarly, a few functions allow the computation of the second
derivative in order to use the Quasi Newton or Levenberg Marquardt
algorithms.
8.1.2.0.1.6.2.0. (update-nn-ggradient dnlf ddnlf gamma...layers...)
|
|
This is the counterpart of update-ggradient
. It computes the field n-ggrad
according to the following formula:
nsqbacksum(i) = 2 * Sum on k ( 2 * (sval(i, k) - nval(i))**2 * nggrad(k) - ngrad(k) )
nggrad(i) = nsqbacksum(i) * [DNLF(nsum(i))]2 - nbacksum(i) * DDNLF(nsum(i))
The sfields s-sigma of each connection
are then updated:
ssigma(i,j) = (1 - gamma) * ssigma(i, j)
+ gamma[ 2 * nggrad(j) * (sval(i, j) - nval(i))**2 - ngrad(j) ]
8.1.2.0.1.6.2.1. (update-nn-lmggradient dnlf...layers...)
|
|
This is the counterpart of function
update-lmggradient . It behaves like function
update-nn-ggradient but performs the Levemberg-Marquardt
approximation:
nsqbacksum(i) = 2 * Sum on k ( 2 * (sval(i, k) - nval(i))**2 * nggrad(k) - ngrad(k) )
nggrad(i) = nsqbacksum(i) [DNLF(nsum(i))]**2
ssigma(i,j) = (1 - gamma) * ssigma(i, j) + gamma[ 2 * nggrad(j) * (sval(i, j) - nval(i))**2 - ngrad(j) ]
8.1.2.0.1.6.2.2. (update-nn-w-newton alpha decay mu...layers...)
|
|
This is the newton weight update for euclidian cells. It calls in
sequence:
(clear-acc)
(clear-hess)
(update-nn-acc...layers...)
(update-hess...layers...)
(hessian-scale mu)
(update-delta alpha)
(update-wghtonly decay)
Like update-weight this function is
much slower than calling the seven above functions.
See: (update-w-newton alpha
decay mu [...
layers ...])
See: (update-weight alpha
decay [... l ...])
8.1.2.0.1.7. Sequencement in Netenv.
|
|
In order to make clear how to use these functions, let us consider a
simple example. A network has 4 layers, layer1
to layer4 ; each layer uses its own
nlf2 to nlf4 activation
functions. In addition, layer3 is composed of euclidian units:
layer1 Input
layer2 Quasi Linear (nlf2, dnlf2, ddnlf2)
layer3 Euclidian (nlf3, dnlf3, ddnlf3)
layer4 Quasi Linear (nlf4, dnlf4, ddnlf4)
Let us consider an additional layer <des> which contains desired states for
layer <layer4>.
The standard backpropagation algorithm thus is
;present a pattern into layer layer1
;forward-prop
(update-state nlf2 layer2)
(update-nn-state nlf3 layer3)
(update-state nlf4)
; present a desired output into layer des
; backward-prop
(init-grad-lms dnlf4 layer4 des)
(update-gradient dnlf3 layer3)
(update-nn-gradient dnlf2 layer2)
; update-weights
(update-weight 0 0 layer4 layer2)
(update-nn-weight 0 0 layer3)
With kernels sn28ite and sn28itenew, functions
update-weight and update-nn-weight
are very inefficient when they do not operate on the whole network. This
is the case here. The last two lines thus are better replaced by:
; update-weights (sn28ite or sn28itenew)
(clear-acc)
(update-acc layer4 layer2)
(update-nn-acc layer3)
(update-delta 0)
(update-wghtonly 0)
And finally, if we use the Levemberg-Marquardt algorithm, these lines
become:
; second backward pass (newton)
(update-lmggradient dnlf4 0.05 layer4)
(update-lmggradient dnlf3 0.05 layer3)
(update-nn-lmggradient dnlf2 0.05 layer2)
; weight update (newton)
(clear-acc)
(clear-hess)
(update-acc layer4 layer2)
(update-nn-acc layer3)
(update-hess)
(hessian-scale)
(update-delta 0)
(update-wghtonly 0)
8.1.2.0.2. High Level Functions in Netenv.
|
|
This library of high level functions is automatically loaded on startup.
It provides an easier way to create and run networks in SN2.8.
Convenient functions are defined for quickly creating a network, running
the back-propagation algorithm and displaying the results. Hooks have
been managed for calling user-defined algorithm, data processing and
display routines.
8.1.2.0.2.0. Overview of High Level Functions in Netenv.
|
|
We suggest the reader to browse the file
"sn28/lib/netenv.sn" which defines most functions in the
library. This file is rather small and has a simple structure.
Understanding it is a good way to use SN2.8 more efficiently. There are
four families of functions in this file:
- Functions for
creating a network ( build-net ,
build-net-nobias ) and creating the internal data structures
used by the library ( define-net ).
- Functions for initializing the weights (
forget-XXX ) and setting the parameters (
epsi-XXX , nlf-XXX )).
- Functions for managing a data base of examples, loading these
examples into the network ( present-pattern
, present-desired ) and accessing
nfields layer per layer ( state ,
weighted-sum , etc...).
- Functions for running the simulation ( run
, trun , learn
) and testing the performance ( perf
).
These simulation functions are organised around a few key functions:
- Function test-pattern computes
the output of the network for a specific pattern and sets a couple of
global variables ( local-error ,
good-answer ) according to the result.
- Four functions implement the various training algorithms:
- Function basic-iteration
implements the `stochastic gradient back-propagation algorithm.
- Function batch-iteration
implements the batch gradient back-propagation algorithm.
- Function basic-newton-iteration
implements both Newton's and Levemberg-Marquardt's stochastic
back-propagation algorithms.
- Function batch-newton-iteration
implements both Newton's and Levemberg-Marquardt's batch
back-propagation algorithms.
These four functions in turn make heavy use of eight simpler functions:
forward-prop do-compute-err
backward-prop backward-2-prop
do-update-weight do-update-weight-newton
do-update-acc acc-update-hess
These functions are automatically defined when function
define-net is called, either manually by the user or
automatically during the network definition (the usual network
definition function, build-net , calls
this function).
By changing the definition of these functions, you can create new
connectionist models. For instance, both the kMeans and LVQ algorithms
have been implemented by redefining these functions (see file
"sn28/examples/netlvq/lvq.sn" ).
8.1.2.0.2.1. Creating a Network.
|
|
8.1.2.0.2.1.0. (build-net lunits llinks)
|
(netenv.sn) |
Function build-net is a high level
function for building networks. It provides an easy way for creating
multilayer perceptrons. It also sets up all the data structures needed
by the library.
- The first argument
lunits defines the units. Argument
lunits must be a list with the following structure:
( (name-1 num-1) (name-2 num-2).....(name-N
num-N) )
Each pair (name num) in list
lunits defines a layer named name with
num units. More precisely, build-net
stores in symbol name the list returned by
(newneurons num) .
The order of the layers in argument lunits
defines the order of the computation during the forward pass. The first
pair thus is always the input layer and the last pair is always the
output layer.
Function build-net also creates an
additional layer desired-layer which
has the same size as the last layer in lunits
. This additional layer is used to store the desired outputs of the
network.
- The second argument llinks is a
possibly empty list of the form:
( (from1 to1) (from2 to2)....(fromn ton) )
For each pair (from to) , function
build-net creates a full connection from layer
from to layer to . This
capability is handy for creating simple networks. Complex connection
patterns however are usually created by calling directly the functions
of library "connect.sn" once
build-net has returned.
In addition, function build-net
always connects the bias unit (unit 0
) to all units in all layers but the input layer.
Function build-net generates several
global variables. The global variables are assigned with functions
make-desired , make-output
and make-input . Finally, function
build-net calls define-net
for defining the eight forward and backward propagation functions.
Example: Creating a ``4-2-4 decoder''.
(alloc-net 50 100)
= 4400
? (build-net
; create the cells
'((input 4) ; input layer
(hidden 2) ; a hidden layer
(output 4) ) ; output layer
; make connections
'((input hidden) ; connect input layer to the hidden layer
(hidden output))) ; connect hidden layer to the output layer
= ( ( ) ( ) )
? input
= ( 1 2 3 4 )
? input-layer
= ( 1 2 3 4 )
? hidden
= ( 5 6 )
? nnum ; number of units
= 15
? snum ; number of connections
= 22
A variable net-struc is set for
backward compatibility purposes. It contains some additional information
about the structure of the network.
8.1.2.0.2.1.1. (build-net-nobias lunits llinks)
|
(netenv.sn) |
This function is quite similar to build-net
. It does not connect however the allocated cells to the bias unit. This
function is especially useful for creating shared weight nets with
constraints on the biases.
8.1.2.0.2.1.2. (make-input l)
|
(netenv.sn) |
This functions stores layer l into
variable input-layer . It should be
called whenever this layer are redefined or modified. This functions is
automatically called by build-net and
build-net-nobias .
8.1.2.0.2.1.3. (make-output l)
|
(netenv.sn) |
This functions stores layer l into
variable output-layer . It should be
called whenever this layer is redefined or modified. This functions is
automatically called by build-net and
build-net-nobias .
8.1.2.0.2.1.4. (make-desired l)
|
(netenv.sn) |
This functions stores layer l into
variable desired-layer . It should be
called whenever this layer is redefined or modified. This functions is
automatically called by build-net and
build-net-nobias .
8.1.2.0.2.1.5. (define-net arg)
|
(netenv.sn) |
This function defines the eight forward and backward propagation
functions. This function is automatically called by
build-net and build-net-nobias
. Argument arg usually is a list of
single element lists. Each single element list contains the name of a
layer.
Remark: A limited support is offered by build-net
, build-net-nobias and
define-net for defining specific activation functions per
layer. This feature is described in section 3.5.3.
8.1.2.0.2.2. Defining the Examples in Netenv.
|
|
The library handles the examples using functions
present-pattern and present-desired
. These functions store example number n
in the input layer and the desired layer. The user can redefine these
functions for implementing pre-processing or for calling a more refined
database management.
The default functions present-pattern and present-desired just store the
n -th row of matrices pattern-matrix
or desired-matrix into the input layer
or into the desired layer.
8.1.2.0.2.2.0. (ensemble pmin pmax)
|
(netenv.sn) |
Defines the training set as being composed of examples number
pmin to pmax inclusive.
These bounds are stored in the global variables
patt-min and patt-max .
See: patt-min
patt-max
8.1.2.0.2.2.1. <patt-min patt-max
|
[VAR] (netenv.sn) |
These two variables store the upper lower bounds of the training set.
You should call function ensemble
instead of modifying directly these variables.
8.1.2.0.2.2.2. (test-set pmin pmax)
|
(netenv.sn) |
Defines the test set as being composed of examples
pmin to pmax inclusive.
Examples in the training set are used for learning; examples in the test
set are only used for evaluating the performance of the neural network
on fresh data. This measure gives an idea of the generalization ability
of the network.
Function test-set also prints a
pessimistic evaluation of a 90% confidence interval on the accuracy of
the performance measure. This evaluation is computed with function
hoeffding according to the value of variable confidence.
The bounds of the test set are stored in the global variables
tpatt-min and tpatt-max .
See: tpatt-min
tpatt-max
8.1.2.0.2.2.3. tpatt-min tpatt-max
|
[VAR] (netenv.sn) |
You should call function test-set instead of directly modifying these
variables.
8.1.2.0.2.2.4. (hoeffding tau eta n)
|
(netenv.sn) |
Computes a bound epsilon of the
difference between the empirical average of n
independant identically distributed positive random variables
Xi and their mean m with
confidence 1 - h (1-eta) . Argument
tau is the maximal value of variables
Xi .
The bound is computed according to Hoeffding's formula:
P ( | mean - 1/n * Sum on i (Xi) | ) > epsilon )
< 2 * exp( -2 * n * e**2) = 1 - eta
8.1.2.0.2.2.5. (present-pattern l n)
|
(netenv.sn) |
This function is called whenever the library needs a new example. It
must load the fields n-val of layer
l with the input part of example number
n .
This function is often redefined by the user. The default function just
loads the n -th row of the
2-dimensionnal matrix pattern-matrix
into layer l using function
get-pattern and adds a gaussian noise of standard deviation
input-noise .
8.1.2.0.2.2.6. input-noise
|
[VAR] (netenv.sn) |
Noise level added to the input patterns before any network calculations.
The computation is faster when input-noise
is set to zero. This is the default value. A non zero value (0.1 for
example ) is sometimes used during training in order to get a better
generalization.
8.1.2.0.2.2.7. (present-desired l n)
|
(netenv.sn) |
This function is called by the library whenever a new example is needed.
It must load the fields n-val of layer
l with the desired output part of example number
n .
This function is often redefined by the user. The default function just
loads the n -th vector of the
2-dimensionnal matrix desired-matrix
into layer l using function
get-pattern .
8.1.2.0.2.3. Accessing Internal Variables in Netenv.
|
|
When a network is created, space is allocated for recording numerical
values on the units and connections of the network graph.
Internal variables associated with the units are called ``nfields'', an
acronym for ``neuron fields''. Internal variables associated with the
connections are called ``sfields'', an acronym for ``synapse fields''.
Fields are highly specialized: in order to increase speed, many
computational functions are working on implicit fields. In addition, all
possible fields are not available on all versions of SN2.8. There is no
reason indeed to slow down a regular network by updating the fields
required by a shared weigths network.
8.1.2.0.2.3.0. Accessing Neural Fields of Groups of Units.
|
|
8.1.2.0.2.3.0.0. Conversion between Neural Fields and Lists.
|
|
8.1.2.0.2.3.0.0.0. (mapncar f arg)
|
(netenv.sn) |
This is a general function used by other functions.
(de state l (mapncar 'nval l))
= state
we can easily set or get the state of a list of units:
? (setq foo (newneurons 4)) ; allocate 4 units
= ( 1 2 3 4 )
? (state) ; get their state (with the bias unit state)
= ( 1 0 0 0 0 )
? (state '(2 3) 0.5) ; set the state of units 2 and 3 to 0.5
= 0.5
? (state foo) ; display the states of layer foo
= ( 1 0 0.5 0.5 0 )
? (state 0.33) ; set the state of all the units to 0.33
= 0.33
? (state) ; display the network state
= ( 1 0.33 0.33 0.33 0.33 )
And in fact, the following functions are already defined:
8.1.2.0.2.3.0.0.1. (state args)
|
(netenv.sn) |
Set or get the states of a group of units (field
n-val ).
8.1.2.0.2.3.0.0.2. (gradient args)
|
(netenv.sn) |
Set or get the gradients of a group of units (field
n-grad ).
8.1.2.0.2.3.0.0.3. (weighted-sum args)
|
(netenv.sn) |
Set or get the total input of a group of units (field
n-sum ).
8.1.2.0.2.3.0.0.4. (back-sum args)
|
(netenv.sn) |
Set or get the backward sum of a group of units (field
n-backsum ).
8.1.2.0.2.3.0.0.5. (epsilon args)
|
(netenv.sn) |
Set or get the learning rate of a group of units (field
n-epsilon ). With kernels sn28new and sn28itenew, this
function returns the average of the learning rates attached to the
weights of each unit.
8.1.2.0.2.3.0.0.6. (real-epsilon args)
|
(netenv.sn) |
With kernels sn28new and sn28itenew only, this function gets the
effective learning rate of a group of units. The effective learning rate
is equal to:
effective_epsilon = epsilon / (mu + | sigma |)
where epsilon is the learning rate
attached to the unit (or the weight) and sigma
is the mean second derivative of the cost function with respect to the
weights of the unit extracted from the sfield
s-hess .
8.1.2.0.2.3.0.0.7. (sqback-sum args)
|
(netenv.sn) |
With kernels sn28new and sn28itenew only, sets or gets the instantaneous
second derivatives of the cost function with respect to the state of a
group of units (field n-sqbacksum ).
8.1.2.0.2.3.0.0.8. (ggradient args)
|
(netenv.sn) |
With kernels sn28new and sn28itenew only, sets or gets the instantaneous
second derivatives of the cost function with respect to the total input
of a group of units (field n-ggrad ).
8.1.2.0.2.3.0.0.9. (sigma args)
|
(netenv.sn) |
With kernels sn28new and sn28itenew only, sets or gets the average of
the second derivative of a group of units (field
n-sigma ).
8.1.2.0.2.3.1. Conversion between Neural Fields and Matrices.
|
|
Several functions transfer data from a matrix to the
n-val or sval field of
units.
- Functions get-pattern
and get-pattern-2 are used for loading
the network input with new data.
- Functions matrix-to-weight and
weight-to-matrix are used for saving or loading weights.
8.1.2.0.2.3.1.0. (get-pattern arr n l1...ln)
|
|
This function is used to transfer a pattern vector into the states of a
list of units. The units to be loaded are defined by the lists
l1 ... ln . Matrix
arr must have 2 dimensions; n
is the index of the vector that should be transfered. The sum of the
length of the lists l1 ...
ln should be equal to the length of the vector (i.e. the size
of the matrix in the second dimension)
(dim my-patterns 10 5) ; define a matrix with 10 vectors of dimension 5
(initialize-my-patterns) ; put interesting data into the matrix
; transfer elements (7 x) to units 1, 2, 3, 11, 12
(get-pattern my-pattern 7 '(1 2 3) '(11 12))
8.1.2.0.2.3.1.1. (get-pattern-2 arr l1...ln)
|
|
This function is also used to transfer a pattern vector into the states
of a list of units. However it works slightly differently than
get-pattern .
Matrix arr is transfered entirely into the units defined by
l1 ... ln . It may have any
number of dimensions. The sum of the length of the lists
l1 ... ln should be equal
to the number of elements in the matrix. This function is mainly
designed for dealing with submatrices.
Function get-pattern-2 is slightly
slower than get-pattern. However, it implements a more general way to
transfer data from a matrix to a layer in the network.
; get pattern is now defined as
(de get-pattern(arr n . l)
(apply get-pattern-2 (cons (submatrix arr n ()) l)) )
8.1.2.0.2.3.2. Accessing Synaptic Fields of Groups of Units.
|
|
8.1.2.0.2.3.2.0. Conversion between Synaptic Fields and Lists.
|
|
The following functions are not meant for saving or restoring weights on
disk but for temporarily saving a weight configuration in a lisp
variable. The content of this variable can be used to restore the weight
configuration afterwards. These functions are convenient for
experimenting with several sets of parameters or for comparing different
weight configurations.
8.1.2.0.2.3.2.0.0. (dump-weights)
|
(netenv.sn) |
Returns a list of lists. Each list contains the weights of a particular
unit. An empty list corresponds to unit with no incoming weights. The
units are ordered according to their number.
; save the weight configuration into variable a-good-solution
? (setq a-good-solution (dump-weights))
=...
8.1.2.0.2.3.2.0.1. (rest-weights l)
|
(netenv.sn) |
Loads the network weights with the content of the variable passed as
argument.
Caution: This function does not create connections but just sets the
weight values. The network structure must be exactly identical to the
structure used when dump-weights has
been called.
; restore the weights stored in variable a-good-solution
? (rest-weights a-good-solution)
= ()
8.1.2.0.2.3.2.0.2. (weight n)
|
(netenv.sn) |
Returns the list of the incoming weights to cell
n (field s-val ).
8.1.2.0.2.3.2.0.3. (set-weight n l)
|
(netenv.sn) |
Sets the weights of unit n to the
values contained in list l .
8.1.2.0.2.3.2.0.4. (deltaw n)
|
(netenv.sn) |
Returns the list of the weight change of the incoming weights to cell
n (field s-delta ).
8.1.2.0.2.3.2.0.5. (set-deltaw n l)
|
(netenv.sn) |
Sets the delta's of the weights coming into unit
n to the values contained in l
.
8.1.2.0.2.3.2.1. Conversion between Synaptic Fields and Matrices.
|
|
8.1.2.0.2.3.2.1.0. (matrix-to-weight mat)
|
|
Transfers the elements of a mono-dimensional matrix
mat into the network weights. This function is usually called
by load-net .
8.1.2.0.2.3.2.1.1. (weight-to-matrix [ mat ])
|
|
Transfers the weights of the networks into a mono-dimensional matrix
mat . When no suitable argument is supplied, a new matrix is
created. This function is now called by save-net
.
8.1.2.0.2.3.2.1.2. (matrix-to-acc mat)
|
|
Transfers the elements of a mono-dimensional matrix mat into the
gradient accumulator s-acc for each
weight in the network. This functions is available in kernels sn28ite
and sn28itenew.
8.1.2.0.2.3.2.1.3. (acc-to-matrix [ mat ])
|
|
Returns a mono-dimensional matrix containing the gradient accumulator
s-acc for each weight in the network. When provided, the
optional argument mat specifies a
destination matrix and saves the memory allocation time. This functions
is available in kernels sn28ite and sn28itenew.
8.1.2.0.2.3.2.1.4. (matrix-to-delta mat)
|
|
Transfers the elements of a mono-dimensional matrix
mat into the weight change field
s-delta for each weight in the network. This functions is
available in kernels sn28ite and sn28itenew.
8.1.2.0.2.3.2.1.5. (delta-to-matrix [ mat ])
|
|
Returns a mono-dimensional matrix containing the weight change field
s-delta for each weight in the network. When provided, the
optional argument mat specifies a
destination matrix and saves the memory allocation time. This functions
is available in kernels sn28ite and sn28itenew.
8.1.2.0.2.3.2.1.6. (matrix-to-hess mat)
|
|
Transfers the elements of a mono-dimensional matrix mat into the hessian
curvature field s-delta for each
weight in the network. This functions is available in sn28itenew only.
8.1.2.0.2.3.2.1.7. (hess-to-matrix [ mat ])
|
|
Returns a mono-dimensional matrix containing the hessian curvature field
s-hess for each weight in the network. When provided, the
optional argument mat specifies a destination matrix and saves the
memory allocation time. This functions is available in sn28itenew only.
8.1.2.0.2.4. Weight Files in Netenv.
|
|
The weights (sfield s-val ) may be
saved in files under several formats.
First, it is possible to store them under ascii or binary formats.
Secondly, it is possible to store them with (resp. without) information
related to the neural net architecture, leading to a explanative (resp.
blind) format.
Function load-net and
merge-net recognize all these formats.
8.1.2.0.2.4.0. (save-net/merge str [ l ])
|
|
Saves the weight values of connections involving any two units in list
l and connections from the bias unit (unit
0 ) to any unit in list l .
When argument l is omitted, all units
are considered. All the weights then are saved.
Function save-net stores the weights
into file str with a binary
explanative format. The first 12 bytes of the file are:
- A
``magic'' number on 4 bytes for identifying the file type.
- The number of units in list l ,
as a 4 bytes integer.
- The number of training iterations achieved so far, as a 4 bytes
integer. There is then one record per connection. Each record is
composed of 12 bytes.
- A 4 bytes integer gives the rank in list l
of the upstream unit.
- A 4 bytes integer gives the rank in list l
of the downstream unit.
- A 4 bytes floating point number gives the value of sfield
s-val for this connection.
8.1.2.0.2.4.1. (save-ascii-net/merge str [ l ])
|
|
This function is similar to save-net/merge
, but creates a human readable text file.
The first line of the file is a file header. It is composed of the
letters ".WEI" , of the numbers of
cells involved in the file and of the age of the network. Each
connection is described on one line, consisting in the number of the
upstream cell, the number of the downstream cell and the weight value.
Note that the celle 0 is always the
threshold cell. After 160 iterations, a three layers XOR network might
be saved as:
.WEI 6 160
0 3 -1.7688258
0 4 2.2176821
0 5 1.5779345
1 3 2.3539493
1 4 2.0589452
2 3 0.8007546
2 4 1.9538255
3 5 1.5308707
4 5 -1.4315528
In this file, unit 0 is the bias,
units 1 and 2
are the input units, units 3 and
4 are the hidden units and unit 5
is the output unit.
8.1.2.0.2.4.2. (save-net str [ l ])
|
(netenv.sn) |
When argument l is provided, this
function works like save-net/merge and
create an old format file. Argument l
indicates indeed that the weights are stored for a subsequent use by
function merge-net .
When argument l is omitted, this
function saves the weights in file str
as a regular mono-dimensional TLisp matrix using function
save-matrix . This matrix just contains the weights by their
order of creation.
8.1.2.0.2.4.3. (save-ascii-net str [ l ])
|
(netenv.sn) |
When argument l is provided, this
function works like save-ascii-net/merge
and create an explanative format file. Argument l
indicates indeed that the weights are stored for a subsequent use by
function merge-net .
When argument l is omitted, this
function saves the weights in file str
using blind format which means a regular mono-dimensional TLisp matrix
using function save-ascii-matrix .
This matrix just contains the weights by their order of creation.
8.1.2.0.2.4.4. (load-net str)
|
(netenv.sn) |
Loads a weight configuration saved with save-net
from file str .
See: (merge-net str [
l ])
8.1.2.0.2.4.5. (merge-net str [ l ])
|
|
When a list l is given as argument,
function merge-net loads the weights saved in file str and stores them
in the fields s-val of the
corresponding connections between units in list l
.
When argument l is omitted, function
merge-net returns the number of units involved by the weight
file str . When argument
l is omitted and when str
is not an explanative format weight file, function
merge-net returns () . This
feature is used for sensing the file format (explanative vs. blind).
Function merge-net is usually called
with the same list of units than the corresponding call to
save-net . Using a different list lets you load the weights
of a network into a part of another network and vice-versa.
Function merge-net never creates a
connection. Connections have to be created first.
;; The following function is defined in the file <"netenv.sn">.
;; It loads a weight file created by save-net without arguments
(de load-net (str)
(merge-net str (range 1 (1-num))) )
8.1.2.0.2.5. Setting the Parameters in Netenv.
|
|
8.1.2.0.2.5.0. Initial Weights in Netenv.
|
|
8.1.2.0.2.5.0.0. (forget x)
|
(netenv.sn) |
Sets all the weights of a network to a random value chosen between
- x and x according to a
uniform probability distribution.
8.1.2.0.2.5.0.1. (forget-inv x)
|
(netenv.sn) |
Like forget but divides the bounds of the uniform distribution by the
fanin of each unit.
8.1.2.0.2.5.0.2. (forget-sqrt x)
|
(netenv.sn) |
Like forget but divides the bounds of the uniform distribution by the
square root of the fanin of each unit.
8.1.2.0.2.5.0.3. (smartforget)
|
(netenv.sn) |
Equivalent to (forget-sqrt 1) . This
weight initialisation usually is a good choice with the default
non-linear function.
8.1.2.0.2.5.0.4. (forget-layer x l)
|
(netenv.sn) |
This function is essentially equivalent to forget
. It reinitialize only the weights of incoming connections to layer
l . The remaining weights are left unchanged.
8.1.2.0.2.5.0.5. (forget-inv-layer x l)
|
(netenv.sn) |
This function is essentially equivalent to
forget-inv . It reinitialize only the weights of incoming
connections to layer l . The remaining
weights are left unchanged.
8.1.2.0.2.5.0.6. (forget-sqrt-layer x l)
|
(netenv.sn) |
This function is essentially equivalent to
forget-sqrt . It reinitialize only the weights of incoming
connections to layer l . The remaining
weights are left unchanged.
8.1.2.0.2.5.1. Learning Parameters in Netenv.
|
|
It is rarely advisable to use the same ``learning rate'' in the whole
network. This is of course possible with function
epsilon . The following functions usually are a better
choice:
8.1.2.0.2.5.1.0. (epsi x)
|
(netenv.sn) |
Sets the learning rate of each unit (each connection with kernels
sn28new and sn28itenew) to x divided
by the number of inputs to the unit.
8.1.2.0.2.5.1.1. (epsi-sqrt x)
|
(netenv.sn) |
Sets the learning rate of each unit (each connection with kernels
sn28new and sn28itenew) to x divided
by the square root of the number of inputs to the unit.
8.1.2.0.2.5.1.2. (mask-epsi s)
|
(connect.sn) |
It is often useful to set small learning-rates
on shared connections. Using mask-epsi
rather than epsi-sqrt or
epsi is often a good choice for setting the learning rates in
a shared weights network.
Function mask-epsi sets the learning
rate for each unit to s divided by the
square root of the number of incoming connections and by the square root
of the sharing count of the incoming connections.
8.1.2.0.2.5.1.3. (epsi-layer x l)
|
(netenv.sn) |
This function is essentially similar to function
epsi . However it sets only the learning rates for the units
in layer l . The other learning rate
are kept unchanged.
8.1.2.0.2.5.1.4. (epsi-sqrt-layer x l)
|
(netenv.sn) |
This function is essentially similar to function
epsi-sqrt . However it sets only the learning rates for the
units in layer l . The other learning
rate are kept unchanged.
Most other learning parameters are constant for all the network. They
are stored in a couple of global variables.
8.1.2.0.2.5.1.5. decay
|
[VAR] (netenv.sn) |
Decay factor. In each learning iteration, the weights are multiplied by
( 1 - epsilon.decay ) . This computation is disabled when
decay is set to 0 . The simulation is
slightly faster in this case. The default value is
0 .
8.1.2.0.2.5.1.6. theta
|
[VAR] (netenv.sn) |
Noise level q . A zero-mean gaussian
random variable with standard deviation theta
is added to the total input of each unit after each state update. The
state of a unit is thus equal to:
X(i) = f(A(i) + N(q))
Where A(i) is the total input to unit
i and N(q) is a zero mean
gaussian random variable with standard deviation
q . This computation is disabled when theta is set to
0 . The simulation is slightly faster in this case. The
default value is 0 . A non zero value
can be used for:
- limiting the bandwidth of the units
- getting a more robust solution
8.1.2.0.2.5.1.7. alpha
|
[VAR] (netenv.sn) |
Momentum factor a for back-propagation. The weight increment is computed
with the following formula:
increment W(i,j) := increment W(i,j) - epsilon * gradient of Cost on W(i,j)
This computation is disabled when alpha is set to
0 . The simulation is slightly faster in this case. The
default value is 0 .
8.1.2.0.2.5.1.8. mu
|
[VAR] (netenv.sn) |
With kernels sn28new and sn28itenew, variable mu
is used for computing the effective learning rate in the approximated
newton method. The effective learning rate of a unit is defined by:
Effective epsilon(i,j) = epsilon(i,j) / (mu + | sigma(i,j) |)
8.1.2.0.2.5.1.9. gamma
|
[VAR] (netenv.sn) |
With kernels sn28new and sn28itenew, variable
gamma controls the time constant used for computing the
average second derivative sigma which
is updated according to the rule:
sigma(i) := (1 - gamma) * sigma(i) + gamma * (d2 E / d W 2) * X(i)**2
where X(i)**2 is the square of the state of unit i.
See: (update-state-only nlf
... layers ...)
See: (update-weight alpha
decay [... l ...])
See: (update-ggradient dnlf
ddnlf gamma ...
layers ...)
See: (update-w-newton alpha
decay mu [...
layers ...])
8.1.2.0.2.5.2. Choosing the Activation Functions in Netenv.
|
|
8.1.2.0.2.5.2.0. nlf, dnlf, ddnlf
|
[VAR] (netenv.sn) |
These three variables contain the default NLF used by the library for
all layers. They are set by the functions described below.
Since NLF objects have a functional behavior, writing
(nlf x) then returns the value of the default NLF in
x , writing (dnlf x)
returns the value of its derivative in x
and writing (ddnlf x) returns the
value of its second derivative in x .
8.1.2.0.2.5.2.1. (nlf-tanh mx dmin [scale [offset]])
|
(netenv.sn) |
Sets the non-linear function to a sigmoid of the following form:
f(x) = scale * tanh( mx * x ) + offset + dmin * x
Since the hyperbolic tangent is an odd function taking values in range
]-1,1[, argument scale determines the amplitude, argument
dmin the minimum value of the derivative (if
dmin is not 0 the
asymptotes are slanted), argument mx
defines the gain or the inverse of the ``temperature'' and argument
offset shifts the curve up or down.
Argument offset default to
0 . If argument scale is omitted, a scale is computed to
ensure that f(1) = 1 and
f(-1) = -1 .
The most common settings are the following:
-
mx=0.5 , dmin=0 ,
scale=0.5 , offset=0.5
corresponds to the standard logistic function going from
0 to 1 , with
f(0)=0.5 .
- mx=2/3 ,
dmin=0 gives a function which goes from
-1.715905 to +1.715905 .
This function fulfils the conditions f(1)=1
and f(-1)=-1 . Moreover, the maximum
of the second derivative of this function is at 1
. This is the default activation function of SN2.8.
The default non linear function for SN2.8 and its derivatives:
(nlf-tanh 0.666 0)
8.1.2.0.2.5.2.2. (nlf-lin dmin dmax [th])
|
(netenv.sn) |
Set the non-linear function to a piecewise linear function. This odd
function is made of three parts. The first part, between
-th and th , is a line
segment with a slope equal to dmax .
The remaining parts are straight lines of slope
dmin outside range [-th,th]
. The three parts form a continuous curve.
Although this function is not differentiable, the computation of its
derivative causes no problem except for two points. SN2.8 assumes that
the derivative at these two boundaries are those of the central part. A
typical piecewise non linear function: and its derivatives:
(nlf-lin 0.1 1.1)
8.1.2.0.2.5.2.3. (nlf-bell)
|
(netenv.sn) |
Set the non linear function to a bell shaped function whose equation is:
f(x) = (1-x**2)**3 if -1<x<1
f(x) = 0 otherwise
8.1.2.0.2.5.2.4. (nlf-bin mx dmin [scale [offset]])
|
(netenv.sn) |
Set the non linear function to a binary threshold with a smooth
pseudo-derivative similar to the derivative obtained by the
nlf-tanh function. The nlf-bin
function, here (nlf-bin 0.666 0 1) ,
is made of a binary step. However, a smooth derivative is used during
the backward pass.
8.1.2.0.2.5.2.5. (nlf-spline x y)
|
(netenv.sn) |
Set the default non-linear function (defined by
nlf , dnlf ,
ddnlf ) to a cubic spline interpolated between points whose
coordinates are in x and
y . Computing a spline interpolation is often faster than
computing a transcendent function. Here is a code for tabulating the
standard NLF as a spline:
? (nlf-tanh 0.6666 0) ; sets a standard function
= 1.14393
? (let* ((x (range -4 4 0.05))
(y (all ((i x)) (nlf i))))
(nlf-spline x y) ) ; copy it with a cubic spline
= 400
8.1.2.0.2.5.2.6. NLF support in build-net.
|
|
The library allows the use of different NLFs per layer. In the
build-net instruction, the layer specification may be
completed by a call to function of the type
nlf-XXX .
(build-net
'((input 4)
(hidden 2)
(output 4 (nlf-lin 1 1)))
'((input hidden)
(hidden output)) )
In that example, layer hidden uses the default NLF, defined by variables
nlf , dnlf ,
ddnlf . This default NLF can be changed by subsequent calls
to a function nlf-XXX . Layer output
however uses a fixed linear function, defined by
(nlf-lin 1 1) . Subsequent use of functions
nlf-XXX will never affect the activation function of layer
output.
8.1.2.0.2.6. Performance Evaluation in Netenv.
|
|
8.1.2.0.2.6.0. Single Example Performance in Netenv.
|
|
8.1.2.0.2.6.0.0. (test-pattern n)
|
(netenv.sn) |
Tests the response to pattern n . This
function presents the pattern number n
using present-pattern , propagates the
states forward using forward-prop ,
computes the error on that pattern, stores its average scalar value into
local-error using do-compute-error
, decides if the answer is correct using classify and finally calls a
display function.
Function test-pattern is defined as follows:
(de test-pattern (n)
; transfers pattern n into the input-layer
(present-pattern input-layer n)
; propagates the states forward: computes the outputs
(forward-prop net-struc)
; gets the desired outputs
(present-desired desired-layer n)
; computes the error for this pattern,
; and stores it into local-error
(do-compute-err output-layer desired-layer)
; decides if the result in good-answer
; and stores the result in good-answer
(setq good-answer (classify n)
; call displaying functions
(process-pending-events)
(disp-perf-iteration n)
; returns the error
local-error)
Functions forward-prop and
do-compute-error are defined at network creation time by the
function define-net . These functions
are described in the next section.
Function classify is defined by the set-class-XXX
functions. It defines how to measure the quality of the systems. The
default classify function returns t if
the outputs and the desired outputs have the same sign. This is too
restrictive for most applications: you should always call one of the
set-class-XXX functions in order to define your
classification criterion.
8.1.2.0.2.6.0.1. local-error
|
[VAR] (netenv.sn) |
This global variable contains a flag indicating the success of the
network on the current pattern. It is set by functions
test-pattern , basic-iteration
and basic-newton-iteration .
8.1.2.0.2.6.0.2. good-answer
|
[VAR] (netenv.sn) |
This global variable contains the average scalar training error for the
current pattern. It is set by functions
test-pattern , basic-iteration
and basic-newton-iteration .
8.1.2.0.2.6.1. Multiple Example Performance in Netenv.
|
|
8.1.2.0.2.6.1.0. (perf [n1 n2])
|
(netenv.sn) |
Without arguments, this function computes the performance of the network
on the training set. When integer arguments n1
and n2 are given, the performance is
evaluated on the patterns whose indices are between
n1 and n2 inclusive.
Function perf temporary cancels the
input-noise settings.
Function perf prints the average error
over the examples and over the outputs as well as the percentage of good
recognition (as defined by the function classify). This function also
calls the function save-perf wich
saves the result of the measure in the file whose name is contained in
the global variable perf-file .
8.1.2.0.2.6.1.1. (performance n1 n2)
|
(netenv.sn) |
Lower level performance evaluation function called by
perf . Function performance
does not affect the input-noise
settings. If input-noise is not
0 , the measured performance may thus be different if you
call performance twice.
This function computes the average error over patterns between
n1 and n2 and stores the
value into global variable global-error
. It also computes the percentage of good answers and stores it into
global variable good-an-percent . The
value of global-error is returned.
8.1.2.0.2.6.1.2. global-error, good-an-percent
|
[VAR] (netenv.sn) |
These variables contain the results of the last call to performance.
8.1.2.0.2.6.2. Performance File in Netenv.
|
|
8.1.2.0.2.6.2.0. perf-file
|
[VAR] (netenv.sn) |
This variable defines an optional performance file.
This file is used for storing information about the main neural function
calls and results.
perf,
epsi-layer, epsi-sqrt-layer, forget-layer, forget-sqrt-layer,
epsi, epsi-sqrt, mu, forget, forget-sqrt, ensemble, test-set.
Example:
;;; ============ NEW PERFORMANCE
;;; ============ NEW NETWORK
;;; (forget-sqrt 1)
;;; (epsi-sqrt 0.1)
;;; (ensemble 0 319)
;;; (test-set 320 479)
;;; trun-action using function 'learn'...
0 0.53568070 9.69 ;{0-319}
0 0.53332978 10.00 ;{320-479}
320 0.07470918 93.75 ;{0-319}
320 0.08033946 91.25 ;{320-479}
640 0.05402946 97.50 ;{0-319}
640 0.06082892 95.62 ;{320-479}
960 0.04978141 98.75 ;{0-319}
960 0.05853230 96.88 ;{320-479}
1280 0.04746896 99.06 ;{0-319}
1280 0.05897417 97.50 ;{320-479}
;;; (epsi-sqrt 0.05)
1600 0.03919556 100.00 ;{0-319}
1600 0.05168496 97.50 ;{320-479}
1920 0.03774762 100.00 ;{0-319}
1920 0.05145141 97.50 ;{320-479}
;;; (epsi-sqrt 0.025)
2240 0.03479438 100.00 ;{0-319}
2240 0.04916938 98.12 ;{320-479}
2560 0.03418482 100.00 ;{0-319}
2560 0.04903031 98.12 ;{320-479}
2880 0.03369421 100.00 ;{0-319}
2880 0.04896870 98.12 ;{320-479}
3200 0.03326655 100.00 ;{0-319}
3200 0.04892774 98.12 ;{320-479}
;;; break
8.1.2.0.2.6.2.1. (save-perf [n1 [n2...]])
|
(netenv.sn) |
This function takes numeric arguments which will be written on a single
line in a file whose name is in global variable
perf-file .
- If variable
perf-file contains the empty list or an empty string,
function save-perf does nothing.
- If variable perf-file contains
the name of an existing file, the values of n1
, n2 , etc... are appended on a single
line at the end of the file.
Function save-perf is called by
function perf .
8.1.2.0.2.6.2.2. (plot-perf-file fname)
|
[VAR] (netenv.sn) |
Function plot-perf-file creates a
performance plot and an error plot of the informations stored in
performance file fname .
8.1.2.0.2.6.3. Classification Modes.
|
|
8.1.2.0.2.6.3.0. (class-lms pn margin)
|
(netenv.sn) |
Tests if the mean square distance between the state of
desired-layer and the state of
output-layer is smaller than margin. Argument
pn is not used.
8.1.2.0.2.6.3.1. (set-class-lms margin)
|
(netenv.sn) |
Defines the current classification function to be
class-lms with margin margin
.
8.1.2.0.2.6.3.2. (class-sgn <pn tmin tmax)
|
(netenv.sn) |
Decides of the correctness of the output based on the region where the
output state is compared to the desired output. Arguments
tmin and tmax are two
thresholds which defines three regions: less than
tmin , between tmin and
tmax and above tmax .
The output is considered correct if all the states of the units in
output-layer are in the same region as the states of units in
desired-layer . This lisp function is slower than
class-quadrant . Argument pn
is not used.
8.1.2.0.2.6.3.3. (set-class-sgn tmin tmax)
|
(netenv.sn) |
Set the classification function to class-sgn
with thresholds tmin and
tmax . If both tmin and
tmax are 0 , the faster
function class-quadrant is used.
8.1.2.0.2.6.3.4. (class-max pn)
|
(netenv.sn) |
Tests if the most active unit has the same rank in
output-layer and desired-layer
. Argument pn is not used.
8.1.2.0.2.6.3.5. (set-class-max)
|
(netenv.sn) |
Set the classification function to class-max
.
8.1.2.0.2.6.3.6. (class-hamming pn margin)
|
(netenv.sn) |
Tests if all the differences between the states of the output layer and
the corresponding desired states are less than margin. Argument
pn is not used.
8.1.2.0.2.6.3.7. (set-class-hamming margin)
|
(netenv.sn) |
Set the classification function to class-hamming
.
8.1.2.0.2.6.3.8. (class-quadrant pn)
|
(netenv.sn) |
Tests if the output vector and the desired vector are in the same
quadrant. Argument pn is not used.
8.1.2.0.2.6.3.9. (set-class-quadrant)
|
(netenv.sn) |
Set the classification function to class-quadrant
.
8.1.2.0.2.6.3.10. (set-class-nil)
|
(netenv.sn) |
Sets the classification function to an empty function. This should be
used whenever possible to speed up the simulation.
8.1.2.0.2.6.3.11. (classify pn)
|
(netenv.sn) |
Main classification function. Function classify
calls one of the previous classification functions. It can be user
redefined.
8.1.2.0.2.7. Training the Network with Online Algorithms in Netenv.
|
|
8.1.2.0.2.7.0. Choice of the Patterns in Netenv.
|
|
On line algorithms update the weights after each pattern presentation.
The order of presentation thus becomes important. The following
functions define how the library presents the patterns.
Learning functions learn and run use the function next-pattern to choose
the next pattern to present to the network. The user can redefine
next-pattern to fit his needs or select a predefined method.
8.1.2.0.2.7.0.0. (next-pattern n)
|
(netenv.sn) |
Choose a new pattern in the training set and returns its index. Argument
n is the index of the current pattern.
8.1.2.0.2.7.0.1. current-pattern
|
[VAR] (netenv.sn) |
Contains the index of the current pattern.
8.1.2.0.2.7.0.2. (set-nextpat-seq)
|
(netenv.sn) |
Redefines next-pattern to be
equivalent to nextpat-seq . The
following pattern is chosen regardless of the result of the previous
classification.
8.1.2.0.2.7.0.3. (set-nextpat-stay)
|
(netenv.sn) |
Redefines next-pattern to be
equivalent to nextpat-stay .
The same pattern is kept until the network returns the right answer
(until function classify returns a positive result). The next pattern is
then chosen using function next-choice
.
8.1.2.0.2.7.0.4. (next-choice n)
|
(netenv.sn) |
This function chooses the next pattern when a new pattern is needed.
8.1.2.0.2.7.0.5. (set-nextcho-seq)
|
(netenv.sn) |
This function redefines next-choice so
that the patterns are chosen sequentially in the training set.
8.1.2.0.2.7.0.6. (set-nextcho-rnd)
|
(netenv.sn) |
This function redefines next-choice so that the next pattern is chosen
at random in training set with uniform probability.
8.1.2.0.2.7.1. Forward and Backward Propagation Functions in Netenv.
|
|
Like test-pattern , the learning
functions make heavy use of eight elementary functions defined at
network creation time by define-net .
Tampering with the definition of these functions is the preferred way to
explore new algorithms.
8.1.2.0.2.7.1.0. (forward-prop ns)
|
[DE] |
This function propagates the states forward in the network. It is
usually defined as a sequence of calls to
update-state . Argument ns
is an inheritance of primitive versions of SN2.8 and is never used.
8.1.2.0.2.7.1.1. (forward-2-prop ns)
|
(netenv.sn) |
Function forward-2-prop is defined at
network creation time by define-net
and plays a role similar to that of functions
forward-prop , backward-prop
, backward-2-prop , etc... Function
forward-2-prop just calls function
update-deltastate on each layers of the network.
This function is used by the conjugate gradient algorithms.
8.1.2.0.2.7.1.2. (do-compute-err <out des)
|
[DE] |
This function computes the training error between the output layer
out and the desired layer des
, stores the average scalar error in the global variable
local-error and initializes the gradients by using
init-grad .
8.1.2.0.2.7.1.3. (backward-prop ns)
|
[DE] |
This function propagates the gradients backward in the network. It is
usually defined as a sequence of calls to
update-gradient . Argument ns
is unused.
8.1.2.0.2.7.1.4. (do-update-weight ns)
|
[DE] |
Given the gradients and the states, this function performs the weight
update. It is usually defined as a single call to
update-weights . With kernels sn28ite and sn28itenew, this
function may be defined as:
(de do-update-weight (ns)
(clear-acc)
(do-update-acc ns)
(update-delta alpha)
(update-wght only decay))
(do-update-acc ns)
With kernels sn28new and sn28itenew only, this function loops over the
connections and adds their contribution to the gradient in the field
s-acc of the weights. Usually defined as a single call to
update-acc , this function may be redefined as a sequence of
calls to update-acc .
8.1.2.0.2.7.1.5. (backward-2-prop ns)
|
[DE] |
With kernels sn28new and sn28itenew only, this function propagates
backward the second derivatives of the cost (fields
n-sqbacksum and n-ggradient
). This function is usually defined as a sequence of calls to
update-gradient or update-lmggradient
according to the status of the flag lev-mar
.
8.1.2.0.2.7.1.6. (do-update-weight-newton m)
|
[DE] |
With kernels sn28new and sn28itenew only, this function updates the
weights given the states, gradients and second derivatives of the cost
function. It is usually defined as a single call to
update-w-newton .
With kernel sn28itenew however, this function might be redefined as:
(de do-update-weight (ns)
(clear-acc)
(clear-hess)
(do-update-acc ns)
(do-update-hess ns)
(hessian-scale mu)
(update-delta alpha)
(update-wghtonly decay))
(do-update-hess ns)
Accumulates the contributions of each connection to the second
derivatives (in fields s-sigma ) into
the field s-hess of the weights. This
is usually done by a single call to function
update-hess .
8.1.2.0.2.7.2. Online Gradient.
|
|
8.1.2.0.2.7.2.0. (init-grad out des)
|
(netenv.sn) |
Sets the gradients (fields n-backsum
and n-grad ) of units in out using the
desired state stored in units of list des
. This function uses either init-grad-lms
or init-grad-thlms depending on the
active mode. It returns the total output error.
8.1.2.0.2.7.2.1. (set-cost-lms)
|
(netenv.sn) |
Defines the current error measure to be the classical mean-square:
C = 0.5 * | DesiredOutput - ActualOutput |**2
Function init-grad then calls
init-grad-lms .
8.1.2.0.2.7.2.2. (set-cost-thlms threshold)
|
(netenv.sn) |
Defines the current error measure to be the thresholded mean-square.
If the actual output is larger than the desired output and the desired
output is larger than threshold or if
the actual output is smaller than the desired output and the desired
output is smaller than threshold then
the error is set to 0 and no gradient
is propagated. Otherwise the error is the classical mean-square error.
Function init-grad then calls init-grad-thlms.
8.1.2.0.2.7.2.3. (basic-iteration n)
|
(netenv.sn) |
This function performs an elementary learning iteration. It first gets
an input pattern using the redefinable function
present-pattern . Then it propagates the states using
forward-prop , gets the desired values using
present-desired and sets the variables
local-error and good-answer
using functions do-compute-err and
classify .
The gradients then are propagated backward using function
backward-prop and the weights are updated with function
do-update-weight .
Function basic-iteration also is in charge of incrementing the variable
age and calling a few display functions. It is defined as follows:
(de basic-iteration (n)
; get a pattern
(present-pattern input-layer n)
; propagates the states
(forward-prop net-struc)
; get the desired output
(present-desired desired-layer n)
; computes local-error and initializes the gradients
(do-compute-err output-layer desired-layer)
; propagates the gradients
(backward-prop net-struc)
; update the weights
(do-update-weight net-struc)
; increments "age"
(incr age)
; computes "good answer"
(setq good-answer (classify n))
; display functions
(process-pending-events)
(disp-basic-iteration)
local-error)
The following learning function provides convenient interfaces for
calling function basic-iteration .
8.1.2.0.2.7.2.4. (learn n)
|
(netenv.sn) |
Performs n learning iterations. This
function modifies the variable current-pattern
after each learning iteration and uses the pattern-choosing function
next-pattern .
8.1.2.0.2.7.2.5. (learn-cycle n)
|
(netenv.sn) |
Performs n sweeps through the training
set as defined by ensemble. The patterns are taken in sequence,
regardless of the pattern-choosing function. This function does not
modify the variable current-pattern .
8.1.2.0.2.7.2.6. (run cyc-length cyc-num)
|
(netenv.sn) |
This is the main learning function. Function run performs
cyc-length learning iterations, starting at pattern
current-pattern and choosing the next pattern with the
function next-pattern . Then it
measures the performance on the training set (i.e the patterns whose
index are between patt-min and
patt-max as defined by the function
ensemble ). This whole sequence is repeated
cyc-num times .
8.1.2.0.2.7.2.7. (trun cyc-length cyc-num)
|
(netenv.sn) |
This function behaves like function run
. But it also measures the global performance on the test set specified
by the function test-set .
8.1.2.0.2.7.3. Online Second Order Algorithms in Netenv.
|
|
A set of functions are provided for implementing second order stochastic
back-propagation algorithms. The functions described in this section are
only available with kernels sn28new and sn28itenew.
8.1.2.0.2.7.3.0. lev-mar
|
[VAR] (netenv.sn) |
This flag controls the behavior of
backward-2-prop and thus the behavior of
basic-newton-iteration .
- When flag
lev-mar is the empty list, the Quasi Newton algorithm is
applied.
- When flag lev-mar is
t , the Quasi Levemberg Marquardt algorithm is applied.
8.1.2.0.2.7.3.1. (basic-newton-iteration n)
|
(netenv.sn) |
This function closely looks like basic-iteration
. It however calls functions backward-2-prop
and do-update-weight-newton . It
applies thus the Quasi Newton or the Quasi Levemberg Marquardt algorithm
according to the status of the flag lev-mar
.
8.1.2.0.2.7.3.2. (learn-newton n)
|
(netenv.sn) |
This function closely looks like learn
. However it clears the flag lev-mar
and calls basic-newton-iteration ,
applying thus the Quasi Newton algorithm.
8.1.2.0.2.7.3.3. (learn-cycle-newton n)
|
(netenv.sn) |
This function closely looks like learn-cycle
. However it clears the flag lev-mar
and calls basic-newton-iteration ,
applying thus the Quasi Newton algorithm.
8.1.2.0.2.7.3.4. (run-newton cyc-length cyc-num)
|
(netenv.sn) |
This function closely looks like run .
However it clears the flag lev-mar and
calls basic-newton-iteration ,
applying thus the Quasi Newton algorithm.
8.1.2.0.2.7.3.5. (trun-newton cyc-length cyc-num)
|
(netenv.sn) |
This function closely looks like trun
. However it clears the flag lev-mar
and calls basic-newton-iteration ,
applying thus the Quasi Newton algorithm.
8.1.2.0.2.7.3.6. (learn-lm n)
|
(netenv.sn) |
This function closely looks like learn
. However it temporarily sets the flag lev-mar
and calls basic-newton-iteration ,
applying thus the Quasi Levemberg Marquardt algorithm.
8.1.2.0.2.7.3.7. (learn-cycle-lm n)
|
(netenv.sn) |
This function closely looks like learn-cycle
. However it temporarily sets the flag lev-mar
and calls basic-newton-iteration ,
applying thus the Quasi Levemberg Marquardt algorithm.
8.1.2.0.2.7.3.8. (run-lm cyc-length cyc-num)
|
(netenv.sn) |
This function closely looks like run .
However it temporarily sets the flag lev-mar
and calls basic-newton-iteration ,
applying thus the Quasi Levemberg Marquardt algorithm.
8.1.2.0.2.7.3.9. (trun-lm cyc-length cyc-num)
|
(netenv.sn) |
This function closely looks like learn
, learn-cycle ,
run and trun . However it
temporarily sets the flag lev-mar and
calls basic-newton-iteration ,
applying thus the Quasi Levemberg Marquardt algorithm.
8.1.2.0.2.8. Training the Network with Batch Algorithms in Netenv.
|
|
These ``batch'' algorithms update the weights after the presentation of
several patterns.
8.1.2.0.2.8.0. Batch Gradient in Netenv.
|
|
A set of functions are provided for implementing the batch version of
back-propagation (among other learning algorithms). The functions
described in this section are only available with kernels sn28ite and
sn28itenew.
The batch version accumulates the gradients over a set of patterns
before performing a weight update. The batch version is much slower than
the on-line version (it usually takes much more forward and backward
passes to learn a given problem). It is sometimes useful for analyzing
the error surface and the behavior of an algorithm.
8.1.2.0.2.8.0.0. (run-batch cyc-length cyc-num)
|
(netenv.sn) |
This is the main simulation function for the batch version.
Function run-batch performs
cyc-length batch learning iterations. Each iteration consist
in a sweep through the entire training set (i.e. patterns between
patt-min and patt-max ).
Then it measures the performance on this training set. This entire
sequence is repeated cyc-num times.
8.1.2.0.2.8.0.1. (trun-batch cyc-length cyc-num)
|
(netenv.sn) |
This is the main simulation function for the batch version.
Function trun-batch performs
cyc-length batch learning iterations. Each iteration consist
in a sweep through the entire training set (i.e. patterns between
patt-min and patt-max ).
Then it measures the performance on this training set and on the test
set (i.e. patterns between tpatt-min
and tpatt-max ). This entire sequence
is repeated cyc-num times.
8.1.2.0.2.8.0.2. (learn-batch n)
|
(netenv.sn) |
Performs n batch learning iterations.
Each iteration consist in a sweep through the entire training set (i.e.
patterns between patt-min and
patt-max ).
8.1.2.0.2.8.0.3. (batch-iteration pmin pmax)
|
(netenv.sn) |
This function implements the batch version of back-propagation.
Function batch-iteration accumulates
the gradients and the errors of the training set patterns and then
performs a weight update. The global variable
global-error is set to the average output error over the set
of patterns and its value is returned. The global variable age is
incremented after each pattern presentation in order to remain
consistent with the stochastic version.
8.1.2.0.2.8.1. Batch Conjugate Gradient in Netenv.
|
|
Function learn-batch-cg performs the
conjugate gradient optimisation using two auxiliary function
batch-cg1-iteration and
batch-cg1-iteration . An elementary propagation function,
forward-2-prop , is also defined at network creation time by
function define-net . You can of
course find more information about our implementation of the conjugate
gradient by looking at these functions in file
".../sn28/lib/netenv.sn" .
8.1.2.0.2.8.1.0. (learn-batch-cg ns [grad1] [grad2] [gradc])
|
(netenv.sn) |
Function learn-batch-cg performs
ns epochs of the batch conjugate gradient algorithm. Each
epoch consists in a complete pass over the training set. There is no
need to set the learning rates or any other parameters before applying
the batch conjugate gradient algorithm. As a side effect, function
learn-batch-cg sets all learning rates to
1 . You must therefore reset the learning rates before
applying another algorithm.
The optional arguments grad1 ,
grad2 and gradc are three
matrices used for continuing the conjugate gradient algorithm without
loosing the information accumulated during the previous epochs. Function
learn-batch-cg indeed returns a list
(it grad1 grad2 gradc) that you can use as argument list for
the next call to function learn-batch-cg
.
;; WRONG: Restarts the conjugate gradient every other epoch!
(repeat n
(learn-batch-cg 2)
(perf) )
;; RIGHT: Runs the conjugate gradient during 2*n epochs!
(let ((cgs (list 2)))
(repeat n
(setq cgs (apply learn-batch-cg cgs))
(perf) ) )
8.1.2.0.2.8.1.1. (run-batch-cg cyc-length cyc-num)
|
(netenv.sn) |
Function run-batch-cg repeats
cyc-num times the following operations:
- Performing
cyc-length epochs of batch conjugate gradient. Each epoch
consist in a sweep through the entire training set,
- Measuring and displaying the performance on the training set.
This function is simular to function run usually used for the online
gradient algorithm.
8.1.2.0.2.8.1.2. (trun-batch-cg cyc-length cyc-num)
|
(netenv.sn) |
Function trun-batch-cg repeats
cyc-num times the following operations:
- Performing
cyc-length epochs of batch conjugate gradient. Each epoch
consist in a sweep through the entire training set,
- Measuring and displaying the performance on the training set,
- Measuring and displaying the performance on the test set.
This function is similar to function trun used for the online gradient
algorithm.
8.1.2.0.2.8.1.3. (batch-cg1-iteration from to)
|
(netenv.sn) |
Function batch-cg1-iteration computes
the gradient of the cost function. It performs a complete pass of the
training set, accumulates the gradient of the cost function over all
examples, and stores this gradient in connection field
s-acc . This function is close to function
batch-iteration but does not modify the weights.
8.1.2.0.2.8.1.4. (batch-cg2-iteration from to)
|
(netenv.sn) |
Function batch-cg2-iteration returns
the Gauss-Newton approximation to the curvature of the cost function
along a search direction stored in connection field
s-delta . It performs a second pass of the training set and
computes the curvature using function
forward-2-prop which is defined at network creation time by
define-net .
8.1.2.0.2.8.2. Batch Second Order Algorithm in Netenv.
|
|
Finally, a set of functions are provided for implementing the batch
version of back-propagation (among other learning algorithms). The
functions described in this section are only available with kernel
sn28itenew.
8.1.2.0.2.8.2.0. (batch-newton-iteration pmin pmax)
|
(netenv.sn) |
This function is similar to batch-iteration
. It however applies the Quasi Newton or the Quasi Levemberg Marquardt
algorithm according to flag lev-mar.
8.1.2.0.2.8.2.1. (learn-batch-newton n)
|
(netenv.sn) |
This functions is similar to learn-batch
. However it clears the flag lev-mar
and calls batch-newton-iteration ,
applying thus the Quasi Newton Algorithm.
8.1.2.0.2.8.2.2. (learn-batch-newton cyc-length cyc-num)
|
(netenv.sn) |
This functions is similar to run-batch
. However it clears the flag lev-mar
and calls batch-newton-iteration ,
applying thus the Quasi Newton Algorithm.
8.1.2.0.2.8.2.3. (trun-batch-newton cyc-length cyc-num)
|
(netenv.sn) |
This functions is similar to trun-batch
. However it clears the flag lev-mar
and calls batch-newton-iteration ,
applying thus the Quasi Newton Algorithm.
8.1.2.0.2.8.2.4. (learn-batch-lm n)
|
(netenv.sn) |
This functions are similar to learn-batch
. However it sets the flag lev-mar and
calls batch-newton-iteration ,
applying thus the Quasi Levemberg Marquardt Algorithm.
8.1.2.0.2.8.2.5. (learn-batch-lm cyc-length cyc-num)
|
(netenv.sn) |
This functions are similar to run-batch
. However it sets the flag lev-mar and
calls batch-newton-iteration ,
applying thus the Quasi Levemberg Marquardt Algorithm.
8.1.2.0.2.8.2.6. (trun-batch-lm cyc-length cyc-num)
|
(netenv.sn) |
This functions are similar to trun-batch
. However it sets the flag lev-mar and
calls batch-newton-iteration ,
applying thus the Quasi Levemberg Marquardt Algorithm.
8.1.2.0.2.9. Pruning Connections in Netenv: Optimal Brain Damage (OBD).
|
|
Optimal Brain Damage (OBD) is a weight pruning method. It consists in
computing a saliency criterion based on the curvature of the cost
function.
This method sometimes improves the generalization performances by
adjusting the effective number of parameters. The simulation speed is
also improved. The optimised runtime speed however may remain identical
because modern computers spend more time to perform a test than to
perform a multiplication by zero.
Using pruning methods is quite a long process because the mathematics of
these method usually require that the weight saliencies are computed on
an optimum of the cost function. The best results are obtained after
training the network during a long time before applying OBD.
Two library functions of "netenv.sn"
implement OBD. These functions require kernels sn28new or sn28itenew
because the computation of the saliency criterion requires the
evaluation of the second derivatives of the cost function.
A typical use of OBD consists in the following steps:
- 1
Training thoroughly a network until it reaches an optimum.
- 2 Computing the curvature information using
obd-compute-curvature .
- 3 Pruning a small proportion of weights (such as 10%) using
obd-prune .
- 4 Retraining the network until it reaches a new minimum.
- 5 Repeating steps 2, 3 and 4 until enough weights have been
removed.
An example file is provided in directory
"sn28/examples/obd.sn" .
8.1.2.0.2.9.0. (obd-compute-curvature pmin pmax )
|
(netenv.sn) |
Computes the diagonal of the hessian matrix of the cost function
measured on patterns pmin to
pmax . These diagonal values are left in field
s-hess of the weights.
8.1.2.0.2.9.1. (obd-prune fraction)
|
(netenv.sn) |
Removes a proportion fraction of the
connections with the lowest saliency using function
cut-connection and returns the exact number of connections
removed. Argument fraction must be a number between
0 (prune nothing) and 1
(prune all).
When pruning a shared weight network (using sn28itenew) you must
remember that the saliency criterion is a characteristic of the weight
and not of the connection. All the connections sharing a same weight
will be either kept or removed as a whole.
8.1.2.0.2.10. Displaying and Plotting in Netenv.
|
|
8.1.2.0.2.10.0. Netenv Displaying Modes.
|
|
8.1.2.0.2.10.0.0. (disp-basic-iteration)
|
(netenv.sn) |
This is the general display function which is called after each learning
iteration. This function can be dynamically redefined to one of the
built-in display functions or to a user defined function.
See: (basic-iteration n )
8.1.2.0.2.10.0.1. (disp-perf-iteration)
|
(netenv.sn) |
This is the general display function which is called after each pattern
testing. This function can be dynamically redefined to one of the built
in display functions or to a user defined function.
See: (test-pattern n )
8.1.2.0.2.10.0.2. (set-disp-everything)
|
(netenv.sn) |
Makes disp-basic-iteration equivalent
to disp-everything and
disp-perf-iteration equivalent to
disp-nil .
8.1.2.0.2.10.0.3. (set-disp-error)
|
(netenv.sn) |
Makes disp-basic-iteration equivalent
to disp-error and
disp-perf-iteration equivalent to
disp-nil . This function also creates a window
disp-error-window and a plot-port
disp-error-port for plotting the error.
8.1.2.0.2.10.0.4. (set-disp-net)
|
(netenv.sn) |
Makes disp-basic-iteration and
disp-perf-iteration equivalent to
disp-net .
8.1.2.0.2.10.0.5. (set-disp-net-and-error)
|
(netenv.sn) |
Makes disp-basic-iteration equivalent
to disp-error and
disp-perf-iteration equivalent to
disp-net . SN2.8 will thus plot the instantaneous error
during the training phase and draw the network states during the
performance evaluation.
8.1.2.0.2.10.0.6. (set-disp-text)
|
(netenv.sn) |
Makes both disp-basic-iteration and
disp-perf-iteration equivalent to
disp-text .
8.1.2.0.2.10.0.7. (set-disp-nil)
|
(netenv.sn) |
Makes both disp-basic-iteration and
disp-perf-iteration equivalent to
disp-nil .
8.1.2.0.2.10.0.8. (disp-everything n)
|
(netenv.sn) |
Function disp-everything can be
redefined by the user. However, its default definition is:
(de disp-everything (patt-num)
(print-layer output-layer)
(printf "age=%d pattern=%d error=%9.5f %s\n"
age patt-num local-error
(if good-answer " ok " "**arrgh**"))
(plot-error age local-error)
(draw-net net-struc patt-num) )
8.1.2.0.2.10.0.9. (disp-error n)
|
(netenv.sn) |
Function disp-error can be redefined
by the user. However, its default definition is:
(de disp-error (patt-num)
(plot-error age local-error) )
8.1.2.0.2.10.0.10. (disp-net n)
|
(netenv.sn) |
Function disp-net can be redefined by
the user. However, its default definition is:
(de disp-net (patt-num)
(draw-net net-struc patt-num) )
8.1.2.0.2.10.0.11. (disp-text n)
|
(netenv.sn) |
Function disp-text can be redefined by
the user. However, its default definition is:
(de disp-text (patt-num)
(printf "age=%d pattern=%d error=%9.5f %s\n"
age patt-num local-error
(if good-answer " ok " "**arrgh**")) )
8.1.2.0.2.10.0.12. (draw-net struc n)
|
(netenv.sn) |
The user defined function draw-net
must produce a graphic display of the network state on the current
graphic window. This function is initially defined as an empty function.
Function draw-net is called by certain
functions disp-XXX . The first
argument is now obsolete. The second argument is the index of the
current pattern.
It is easy to write a display function using function
draw-list . Here is an example of how to define a
draw-net function for a small network:
; Define the network as in the "getting started" 4-2-4 encoder example
; then define a display function for the 4-2-4 encoder
(de draw-net (a b) ; a and b are dummy arguments, we won't use them
(draw-list 100 100 (state output) 4 30 50 48)
(draw-list 150 170 (state hid) 2 30 50 48)
(draw-list 100 240 (state input) 4 30 50 48) )
(new-window) ; now open a graphic window
(set-disp-net) ; enable display
(learn 16) ; do a couple iterations, look at the graphic screen.
The network editor ``Nettool'' generates automatically
draw-net functions. These functions create an interactive
display window. By clicking outside the network, the window is refreshed
and displays the states of the network. By clicking on a neuron, the
weights of the connections leading to the target neuron are displayed
and the other neurons are turned to color gray (corresponding to weight
0).
8.1.2.0.2.10.0.13. (plot-error n err)
|
(netenv.sn) |
Function plot-error plots the output
error for the current pattern on a graph. It is called by certain
functions disp-XXX . The default
function plots the instantaneous error in the plot-port
disp-error-port .
8.1.2.0.2.10.0.14. (print-layer l)
|
(netenv.sn) |
Very simple textual description of the states of a layer.
8.1.2.0.2.10.0.15. (prsyn)
|
(netenv.sn) |
Very simple textual description of the connections of the network.
8.1.2.0.2.10.0.16. (prneur)
|
(netenv.sn) |
Very simple textual description of the states of the network.
8.1.2.0.2.10.0.17. (prnet)
|
(netenv.sn) |
Very simple textual description of a network.
8.1.2.0.2.10.1. Monitoring Error and Performance in Netenv.
|
|
8.1.2.0.2.10.1.0. (init-error-plotting nsweep maxerr)
|
(netenv.sn) |
This function creates window
error-plotting-window if it does not exist already. It
creates then two plot ports for plotting the mean squared error. These
plot ports are named training-error-port
and test-error-port .
As soon as these plot ports exist, functions run
and trun plot an error curve in this
window.
8.1.2.0.2.10.1.1. (init-perf-plotting nsweep)
|
(netenv.sn) |
This functions creates window
perf-plotting-window if it does not exist already. It creates
then two plot ports for plotting the performance. These plot ports are
named training-perf-port and
test-perf-port .
As soon as these plot ports are defined, functions
run and trun plot a
performance curve in this window.
Remark: You can destroy these error and performance windows, either by
using your window manager or by typing:
(delete perf-plotting-window)
(delete error-plotting-window)
Plotting is then cancelled.
8.1.2.0.2.10.2. Netenv Miscellaneous Functions.
|
|
8.1.2.0.2.10.2.0. (all-neuron (i) body)
|
(netenv.sn) |
Evaluates body with i taking all
possible unit indices.
8.1.2.0.2.10.2.1. (all-synapse (i j) body)
|
(netenv.sn) |
Evaluates body with i and
j taking all possible values for the upstream and downstream
units of a connection.
8.1.2.0.3. Building Connections in Netenv.
|
|
8.1.2.0.3.0. Introduction to Building Connections in Netenv.
|
|
The functions introduced in this section are located in the file
"connect.sn" which is automatically loaded on startup. They
make the creation of complex networks easier by providing lisp functions
which connect layers according to various connectivity patterns.
These functions have been designed for both efficiency and simplicity.
They only use the two low-level functions connect
and dup-connection .
See: Connections in Netenv.
8.1.2.0.3.1. Functions for Locally Connecting Layers in Netenv.
|
|
A first class of functions creates local connections. The first
arguments always describe the upstream and downstream layers and their
sizes.
8.1.2.0.3.1.0. (local-1d-connect layer1 n1 layer2 n2 <step size)
|
(connect.sn) |
This function creates local connections between
layer1 (a list of n1 cells)
and layer2 (a list of
n2 cells). The cells of layer2
are connected to a window of size cells from
layer1 , stepped by step cells.
Function local-1d-connect complains if
n1 != step*n2+size-step .
8.1.2.0.3.1.1. (local-2d-connect layer1 row1 col1 <layer2 row2 col2 xstep ystep xsize <ysize)
|
(connect.sn) |
This function creates local connections between
layer1 (a list of row1*col1
cells) and layer2 (a list of row2*col2
cells). Cells in layer1 and
layer2 are listed by row order:
(r1c1 r1c2 r1c3...r1cN r2c1 r2c2...rMcN).
Cells of layer layer2 are connected to
a window of xsize*ysize cells from
layer1 , stepped by xstep
and ystep cells. Function
local-2d-connect complains if row1 !=
ystep*row2+ysize-ystep or col1 !=
xstep*col2+xsize-xstep .
8.1.2.0.3.1.2. (local-toric-connect layer1 row1 col1 layer2 row2 col2 xstep ystep xsize ysize)
|
(connect.sn) |
This function is almost identical to function
local-2d-connect : it creates local connections between
layer1 (a list of row1*col1
cells) and layer2 (a list of
row2*col2 cells). Cells of layer
layer2 are connected to a window of
xsize*ysize cells from layer1
, stepped by xstep and
ystep cells.
However, side effects are handled in a different way: when using the
function local-2d-connect , cells in
the periphery of layer1 have less
downstream connections than the central cells. Input data thus are best
located in the center of the input layer.
In certain cases, it is easier to use the function
local-toric-connect whose sliding window wraps around
layer1 .
Layer layer1 always has
col1*xstep columns and row1*ystep
rows. The input window of the rightmost cells of
layer2 wraps thus on the left of
layer1 . The input window of the bottommost cells similarly
wraps on the top.
8.1.2.0.3.2. Functions for Creating Shared Weights Masks in Netenv.
|
|
All the local connection functions have an equivalent function for
performing shared weights connections. Of course, these functions
require kernel sn28ite or sn28itenew.
8.1.2.0.3.2.0. (mask-1d-connect layer1 n1 layer2 n2 step size)
|
(connect.sn) |
This function creates shared weight connections between
layer1 (a list of n1 cells)
and layer2 (a list of
n2 cells). Cells of layer2
are connected to a window of size cells from
layer1 , stepped by step cells, always using the same set of
shared weights.
8.1.2.0.3.2.1. (mask-2d-connect layer1 row1 col1 layer2 row2 col2 xstep ystep xsize ysize)
|
(connect.sn) |
This function creates shared weights connections between
layer1 (a list of row1*col1 cells) and
layer2 (a list of row2*col2
cells). Cells of layer layer2 are
connected to a window of xsize*ysize
cells from layer1 , stepped by
xstep and ystep cells,
always using the same set of shared weights.
8.1.2.0.3.2.2. (mask-toric-connect layer1 row1 col1 layer2 row2 col2 xstep ystep xsize ysize)
|
(connect.sn) |
This function is almost identical to mask-2d-connect: it creates shared
weights connections between layer1 (a
list of row1*col1 cells) and
layer2 (a list of row2*col2
cells). Cells of layer layer2 are
connected to a window of xsize*ysize
cells from layer1 , stepped by
xstep and ystep cells.
However, side effects are handled like function
local-toric-connect .
8.1.2.0.3.2.3. (equal-mask-1d-connect layer1 n1 layer2 n2 step size)
|
(connect.sn) |
This function is basically similar to the
mask-1d-connect function, except that all the connections
share a same single weight. Function
equal-mask-1d-connect might be used for implementing
averaging/smoothing operations on signal data.
8.1.2.0.3.2.4. (equal-mask-2d-connect layer1 row1 col1 layer2 row2 col2 xstep ystep xsize ysize)
|
(connect.sn) |
This function is basically similar to the
mask-2d-connect function, except that all the connections
share a same single weight. Function
equal-mask-2d-connect might be used for implementing
averaging/smoothing operations on image data.
Example:
; divide the resolution of a 16x16 layer 'layer1' by 2 into a 8x8 layer 'layer2'
(equal-mask-connect layer1 16 16
layer2 8 8
2 2 2 2 )
8.1.2.0.3.2.5. (tdnn-connect layer1 nframes1 ntraits1 layer2 nframes2 ntraits2 step size)
|
(connect.sn) |
This function helps implementing Time Delay Neural Networks. Cells in
lists layer1 and
layer2 are ordered by trait order:
(t1f1 t2f1 t3f1...t1f2 t2f2...).
Each frame of layer2 is connected to a
sliding window in layer1 of
size frames, stepped by step
frames, always using the same set of weights.
8.1.2.0.3.2.6. (mask-2d-reverse layer1 row1 col1 layer2 row2 col2 xstep ystep xsize ysize)
|
(connect.sn) |
This function creates shared weights connections between
layer1 (a list of row1*col1
cells) and layer2 (a list of
row2*col2 cells). Cells of layer
layer1 are connected to a window of
xsize*ysize cells from layer2
, stepped by xstep and
ystep cells, always using the same set of shared weights.
Function mask-2d-reverse complains if
row2 != ystep*row1+ysize-ystep or col2
!= xstep*col1+xsize-xstep .
8.1.2.0.3.3. Functions for Creating Shared Biases in Netenv.
|
|
Shared weights usually are useful for designing translation invariant
networks. Such networks also require the biases to be invariant.
Function build-net , however, always
creates a separate bias per unit. The solution consist in using function
build-net-nobias instead of function
build-net . It uses the same arguments, but no biases are
created.
The following functions let the user create biases by hand.
8.1.2.0.3.3.0. (bias-connect layer)
|
(connect.sn) |
Creates a separate bias for each cell in list layer.
8.1.2.0.3.3.1. (shared-bias-connect layer)
|
(connect.sn) |
Creates a single shared bias for each cell in list layer.
8.1.2.0.3.3.2. (tdnn-bias-connect layer nframes ntraits)
|
(connect.sn) |
Creates a shared bias for each cell in each trait of layer layer which
is composed of nframes*ntraits units.
8.1.2.1.0. Introduction to SN Tools.
|
|
The Netenv library is mainly used for simulating multilayer networks.
Indeed, SN2.8 contains efficient heuristics for training large
multilayer networks using several variants of the back-propagation
algorithm. This task is made even easier by two dedicated graphical
interfaces, ``NetTool`` and ``BPTool``.
- NetTool is a graphical
network editor. Using NetTool you can build a network using all the
connection patterns defined in the library
"connect.sn" . The program then generates lisp functions for
creating and drawing the network.
- BPTool is a graphical interface for controlling the training
process. BPTool allows you to perform the most common actions allowed by
the library "netenv.sn" . It provides
sensible default values for all the convergence parameters.
- A few other tools ``StatTool``, ``NormTool`` and ``ImageTool`` are
useful for browsing and preprocessing the data. These tools are briefly
discussed in the last chapter.
This manual describes all the functionalities of these tools.
8.1.2.1.1.0. Overview of NetTool.
|
|
NetTool is a graphic network editor dedicated to multilayer networks.
The NetTool window appears when function NetTool
is runned. If a function NetTool-net
is defined, the corresponding network is loaded into the editor.
The NetTool window includes a menu bar and a workspace controlled by two
scrollbars.
Using NetTool you can build a network using all the connection patterns
defined in the library "connect.sn" .
NetTool deals with two kind of objects: layers and layer-to-layer
connections.
- Layers are just a collection of units or neurons.
Both 1 dimensional and 2 dimensional layers are supported. As a
convention, the leftmost layer is always the network input layer.
Similarly, the rightmost layer is always the output layer.
- Layer-to-layer connections define the connection pattern between
the units of two layers. These connection patterns are defined using the
library "connect.sn" . Connections
always link layers from left to right. It is thus not possible to edit
recurrent networks with NetTool.
The NetTool editor generates three lisp functions:
- Function
create-net creates a network with the specified architecture.
This function calls functions defined in the libraries
"netenv.sn" and "connect.sn"
.
- Function draw-net opens a graphic
window and displays the activation state of each layer. This function is
used in conjunction with the set-disp-net mode of library Netenv.
- Function NetTool-net encodes the
structure of the network for the next runs of NetTool. You should not
call this function yourself.
These functions can be used by the interpreter or saved into a file.
8.1.2.1.1.0.0. (NetTool)
|
(NetTool.sn) |
This autoload function loads once the libraries
"ogre.sn" and "NetTool.sn"
and create the NetTool interface.
8.1.2.1.1.1. Loading and Saving a Network Architecture in NetTool.
|
|
There are four items in menu Network .
Item "New Network" clears the
workspace and initializes the network editor. If the current network has
not been saved into a file, a confirmation dialog is displayed.
Item "Load Network" pops up a dialog
for loading a network description file. A mouse click on button
"Load Network" of the requester reads the specified network
description file. A network description file is just a file containing
three SN functions: NetTool-net ,
create-net and draw-net .
NetTool extracts the network architecture from the first function,
NetTool-net .
Item "Save or Create Network" pops up
a dialog for saving a network description file or creating a network.
-
The network description may be either saved into a file or used for
creating a network.
When item "Create network in Memory"
is selected, the network editor creates the description functions in the
interpreter memory. Then it calls function
create-net in order to actually create the network.
When item "Save network in File" is
selected, the network editor saves the description functions in the
specified file.
- Network descriptions are composed of three functions.
Function NetTool-net encodes the
structure of the network for subsequent runs of the network editor. It
is always generated.
Function create-net builds the
network using the Netenv functions. It is generated only When item
Generate create-net is checked.
Function draw-net draws the network
in connection with the Netenv library. It is generated only When item
Generate draw-net is checked.
- Function create-net usually
allocates memory for the exact number of units or connections required
by the network. The user can however ask for additionnal space using the
fields "Allocate more cells" and
"Allocate more connections" .
- A mouse click on button "Compile & Save"
creates a network description file.
Item "Quit" closes the network editor
window. If unsaved changes have been performed on the current network, a
confirmation dialog is displayed.
8.1.2.1.1.2. Layers in NetTool.
|
|
Layers are managed from menu Layer .
There are also four items in this menu for creating, editing,
duplicating and deleting layers:
- Item
"New" . Selecting this item or typing
n in the workspace pops up a dialog for creating a new layer
with various characteristics.
- Item "Edit" . Selecting this item
or typing e in the workspace pops up a
dialog for modifying the characteristics of the selected layer.
- Item "Duplicate" Selecting this
item or typing d in the workspace make
copies of the selected layers and of their connections.
- Item "Delete" Deletes the
selected layers.
8.1.2.1.1.2.0. Layers Creation in NetTool.
|
|
Selecting New in menu
Layers pops up the layer editing dialog.
Several parameters can be specified.
The name of the layer is controled by an editstring. A default unique
name is provided.
The size of the layer is controled by an editstring.
- A single
number defines a 1-dimensional layer. For instance a size of
"40" indicates a one dimension layer with 40 units.
- Two numbers define a 2-dimensional layer. For instance a size of
"6x10" defines a matrix of 160 units with 16 rows and 10
columns.
The biases of the units in the layer is controled by four exclusive
buttons.
- Item "None"
specifies that no units in this layer are connected to the bias unit.
- Item "Full" Every unit is
connected to the bias unit with a different weight.
- Item "Shared" specifies that
every unit is connected to the bias unit with the same weight. This
function generates a network with shared weight. Only the kernels
sn28ite and sn28itenew can simulate such networks.
- Item "Column shared" specifies
that every unit in a column of the layer is connected to the bias unit
with the same weight. There is a different weight per column. This
option is useful for defining time-delay networks. This function
generates a network with shared weight. Only the kernels sn28ite and
sn28itenew can simulate such networks.
The display mode of the layer is controled by three exclusive buttons
This mode is only useful for generating function draw-net.
-
Item "None" specifies that function
draw-net will not display this layer.
- Item "Hinton" specifies that
function draw-net will display this
layer as small squares using function draw-list
. This display style has been popularized by several famous papers
written by Geoffrey Hinton.
- Item "Gray Levels" specifies that
function draw-net will display this
layer as gray levels using function
gray-draw-list .
The new layer appears in the workspace when you click button
"Ok" .
8.1.2.1.1.2.1. Layers Manipulation in NetTool.
|
|
Layers are symbolized by a decorated rectangle and a label. The size and
the location of the rectangle denotes the size and location of the layer
drawn by function draw-net .
Four zones can be activated:
- The layer label is a selection
zone. A first click in the layer label selects the layer. A second click
on the layer label calls the layer edition popup requester. A click
outside a layer deselects all layers. The label of selected layers is
rendered in white on a black background.
- The gray triangle on the right side of the drag area is a
connection handle. It is used for creating a new connection.
- You can resize a layer using the resize box. The form factor of the
layer is constrained by the size and the display mode of the layer.
- You can press the mouse on the gray rectangle and move the layer
around. You will notice that an invisible grid helps you to align the
layers nicely.
8.1.2.1.1.2.2. Layers Edition in Netttol.
|
|
Layers are edited with the layer editing dialog, which is also used for
creating layers.
This dialog can be poped up and handle a given layer when the layer to
edit has been selected by clicking on its label and either item
Edit of menu Layer is
called or the label is clicked on again.
See: Creating Layers in NetTool.
8.1.2.1.1.2.3. Layers Deletion in NetTool.
|
|
Layers are deleted after they have been selected by choosing item
Delete of menu Layer . A
confirmation dialog is poped up before effective deletion.
8.1.2.1.1.2.4. Layers Duplication in NetTool.
|
|
Layers are duplicated after they have been selected by choosing item
Duplicate of menu Layer or
typing key d .
The name of the new layers is derived from the name of the original
layers. The parameters of the new layer are the same as the original
layers. New connections are created similarily with the connections with
the original layers.
8.1.2.1.1.3. Connections in NetTool.
|
|
Layer-to-layer connections are created, deleted and edited with the
connection edition dialog.
8.1.2.1.1.3.0. Creating a Connection.
|
|
A connection is created by clicking in the connection handle of the
upstream layer (the gray triangle on the right side of the layer layout)
and draging the mouse to the downstream layer.
After the initial click in the connection handle, while the mouse is
moved but its button is not released yet, a dashed line is drawn between
the connection handle and the mouse position.
When the mouse button is released on the downstream layer, the
connection edition dialog appears and lets you specify a pattern of
connections among those offered by the functions of library
"connect.sn" .
Menu Full / Local proposes full and
local connection patterns.
- Item
"connect" specifies a full connection between the layers:
every unit in the upstream layer is connected to every unit in the
downstream layer.
- Items "local-1d-connect" ,
local-2d-connect and
local-toric-connect specify local connections identical to
the local connections established by the corresponding functions of
library "connect.sn" . See the
documentation this library in manual "Netenv" for more details.
Menu Shared proposes shared weight
connection patterns. Networks using these connection patterns require
kernels sn28ite or sn28itenew.
- Items
"mask-1d-connect" , "mask-2d-connect"
and "mask-toric-connect" specify
convolutional connection patterns identical to the patterns established
by the corresponding functions of library
"connect.sn" . See this library documentation in manual
"Netenv" for more details.
- Items "equal-mask-1d-connect" and
"equal-mask-2d-connect" specify subsampling connection
patterns identical to the patterns established by the corresponding
functions of library "connect.sn" .
See the documentation of this library in manual "Netenv" for more
details.
- Item "tdnn-connect" specifies a
time-delay connection pattern identical to the pattern established by
function tdnn-connect of library
"connect.sn" . See the documentation of this function in
manual "Netenv" for more details.
When you select certain patterns of connection, you must enter
additional parameters in the fields named "Size"
and "Step" . Each parameter can be a
single digit, like "4" or a pair of
digits, like "3x3" . You can find more
details about the standard connection patterns in the chapter describing
the library "connect.sn" of the manual
"Netenv".
Parameters "From" and
"To" enable the user to connect a part of the upstream layer
to the downstream layer. They defaults so that the whole upstream layer
is connected.
The connection is created when you click on button
"Ok" .
See: Building Connections in Netenv.
8.1.2.1.1.3.1. Editing and Deleting Connections.
|
|
Connections are symbolized by a line linking both layers. In the first
half of the line, there is a small black square which is in fact a small
button.
This button becomes white if the parameters of this connection do not
match the sizes of the layers. Some connection patterns indeed require
arithmetic relations between the sizes of the layers and the parameters
of the connection pattern.
Clicking on this button pops up the connection edition dialog.
-
You can modify the parameters or the type of the connection and validate
the modification by clicking on button "Ok"
.
- You can press button "Delete" to
remove the connection.
8.1.2.1.1.4. Conclusion to NetTool.
|
|
Both beginners and advanced programmers can take advantage of this
network editor.
- Using NetTool, a beginner to SN2.8 can easily
define complex networks.
- Advanced programmers can use the files produced by NetTool as a
template for defining even more complex network architectures, like
recurrent networks.
8.1.2.1.2.0. Overview of BPTool.
|
|
BPTool is a graphical interface for controlling the training of
multilayer networks.
The BPTool window appears when function BPTool
is runned.
It allows you to perform the most common actions allowed by the library
"netenv.sn" . It provides sensible default values for all the
convergence parameters.
More precisely, BPTool provides mouse-driven interface to:
-
define a network using NetTool,
- load the patterns,
- define a training set and a test set,
- select a training algorithm,
- select the initial weights,
- select the parameters of the training algorithm,
- select the performance criterion,
- train the network,
- measure the performance on both the training set and the test set,
- plot the evolution of these performances during training,
- display the network during training or measurement.
BPTool relies on the underlying mechanisms of library Netenv. Moreover,
the various menus and requesters of BPTool are equivalent to the most
common functions of this library. You can even mix BPTool actions and
TLisp commands.
When called the BPTool interface first proposes a file requester for
setting the performance file.
BPTool contains a menu bar with three menus.
- Menu
File lets you define a network, load the databases, load or
save the weights.
- Menu Parameters lets you select
the parameters of the training algorithm.
- Menu Settings lets you choose a
performance criterion and select display options.
BPTool contains several logical zones.
- The ``data selection
zone'' lets you select the boundaries of the training and test sets.
- The ``algorithm selection zone'' lets you choose a proper training
algorithm.
- The ``learning rates control zone'' lets you change the learning
rates used for the weight updates.
- The ``training and control zone'' lets you start and stop the
training algorithms.
- The ``Message zone'' displays messages such as the age of the
current weights and the number of allocated neurons.
8.1.2.1.2.0.0. (BPTool)
|
(BPTool.sn) |
This autoload function loads once the libraries
"ogre.sn" and "BPTool.sn"
and create the BPTool interface.
8.1.2.1.2.1. BPTool Menus.
|
|
Menus are somewhat arranged in logical order. You can use the menu items
in order: start with menu File to load
a network and a database, then specify the parameters with menu
Parameters and request some information display using menu
Settings .
8.1.2.1.2.1.0. BPTool File Menu.
|
|
Menu File lets you define a network,
load the databases, load or save the weights.
Item "Define Network" starts the
network editor NetTool. Using NetTool you can define or load a network
and build it by checking item Create to Memory
when you save the network description. Once the network has been
created, the Network Status String displays the number of units,
connections and weights in this network.
Item "Automatic Network" computes an
optimized simple network with one hidden layer and full connections.
This computation is made as follows:
- 1 Until an overfitting
criteria is reached, a new architecture is chosen.
- 2 For this architecture, a training is carried out and is stopped
using an overfitting criteria based on the test-set. The best weight set
is kept.
- 3 Goto step 1 or save the network architecture and weights.
Item "Load Patterns" pops up a
requester for loading the example files. You must specify the names of
two files. When you press button "Load"
, both files are loaded into the variables
pattern-matrix and desired-matrix
using function load-patterns of the
Netenv library.
- The "pattern-matrix"
file must contain a 2-dimensional matrix in TLisp format. Each row of
this matrix contains the input vector for one example.
- The "desired-matrix" file must
contain a 2-dimensional matrix in TLisp format. Each row of this matrix
contains the output vector for one example.
Item "Load Weights" pops up a
requester for loading the weights from a file using function
load-net of the Netenv library.
Item "Save Weights" pops up a
requester for saving the weight into a file using function
save-net of the Netenv library.
Item "Quit" closes the BPTool window.
8.1.2.1.2.1.1. BPTool Parameters Menu.
|
|
Menu "Parameters" lets you select the
parameters of the training algorithm.
- Item
"Initial weights" pops up a requester for setting initial
weights.
- Item "Learning rates" pops up a
requester for setting initial learning rates.
- Item "Alpha & Decay" pops up a
requester for setting momentum and decay.
- Item "Mu & Gamma" pops up a
requester for setting initial second-order parameters.
- Item "Transfert Function" pops up
a requester for setting the global transfert function.
8.1.2.1.2.1.1.0. Setting Initial Weights with BPTool.
|
|
Item "Initial weights" of menu
"Parameters" pops up a requester for initializing the weights
in the network.
The weights are initialized using uniform random values in a given
interval which depends on a parameter x
and the fanin of units (i.e. the number of incoming connections to a
downstream cell).
- Exclusive button
"random[-x,x]" refers to function
forget of the Netenv library.
- Exclusive button "random[-x/fanin,+x/fanin]"
refers to function forget-inv of the
Netenv library.
- Exclusive button
"random[-x/sqrt(fanin),+x/sqrt(fanin)]" refers to function
forget-sqrt of the Netenv library.
Button "Default" sets the default
initialization which uses interval
random[-1/sqrt(fanin),+1/sqrt(fanin)].
Button "Ok" actually initializes the
weights and pops the requester down.
8.1.2.1.2.1.1.1. Setting Initial Learning Rates with BPTool.
|
|
Item "Learning rate" of menu
"Parameters" pops up a requester for setting the learning
rates in the network.
Four methods are provided which depend on a parameter
x , on the fanin (i.e. the number of incoming connections to
a downstream cell) and share of units (the number of connections sharing
this weight).
- Exclusive button "x"
refers to function epsilon of the
Netenv library.
- Exclusive button "x/fanin" refers
to function epsi of the Netenv
library.
- Exclusive button "x/sqrt(fanin)"
refers to function epsi-sqrt of the
Netenv library.
- Exclusive button "x/sqrt(fanin*share)"
refers to function mask-epsi of the
Netenv library. This is dedicated to shared weights networks and is
available only in versions sn28ite and sn28itenew.
Button "Default" sets the default
value 0.1 / sqrt(fanin)) .
Button "Ok" actually sets the learning
rates.
8.1.2.1.2.1.1.2. Setting Momentums and Decays with BPTool.
|
|
Item "Momentum & Decay" of menu
"Parameters" lets the user define two parameters used by the
first order weight update rule.
This menu item is disabled when a second order algorithm is selected,
like Newton or Levemberg-Marquardt.
Using this menu merely sets the variables alpha
and decay which are used by the Netenv
library. Both values default to 0.
See function update-weight of the
Netenv library for more details.
8.1.2.1.2.1.1.3. Setting Second Order Parameters with BPTool.
|
|
Item "Mu & Gamma" of menu
Parameters lets the user define two parameters used by the
second order weight update rules.
This menu item is only enabled if a first order algorithm is selected,
like Newton or Levemberg-Marquardt.
Using this menu merely sets the variables gamma
and mu which are used by the Netenv
library. Both values default to 0.05.
See function update-ggradient and
hessian-scale of the Netenv library for more details.
8.1.2.1.2.1.1.4. Setting the Global Transfer Function with BPTool.
|
|
Item "Transfer function" lets the user
choose a global transfer function. It calls a dialog providing
sigmoidal, piecewise linear and binary activation functions. BPTool
handles only the (single) global activation function.
The default function is a sigmoid.
- Checking item
"Sigmoid" sets a sigmoidal activation function using the
function nlf-tanh of the Netenv
library.
- Checking item "Piecewise linear"
sets a piecewise linear activation function using function
nlf-lin of the Netenv library.
- Checking item "Binary threshold"
sets a binary activation function using function
nlf-bin of the Netenv library.
Button "Default" sets the default
sigmoid.
Button "Draw" draws the current
activation function.
8.1.2.1.2.1.2. BPTool Settings Menu.
|
|
Menu Settings lets you choose a
performance criterion and select display options.
8.1.2.1.2.1.2.0. Plotting Global Performances with BPTool.
|
|
Item "Plotting" pops up a requester
for setting up windows for plotting the mean squared error and the
network performance.
- Checking Error
Plotting sets up a window for plotting the mean square error
on the training set and on the test set (white and black circles
respectively) versus the number of training iteration. This is achieved
using Netenv library function init-error-plotting
.
- Checking Perf Plotting sets up a
window for plotting the performance on the training set (white circles)
and on the test set (black circles) versus the number of training
iteration. This is achieved using function init-perf-plotting of the
Netenv library. Performance is measured according to a criterion
selected with menu item "Classify" .
Field Number of sweeps specifies the
length of the plot as a number of passes of the training set.
Pressing button "Initialize Plotting"
actually creates the windows and recomputes the axes using the current
number of iterations and the current size of the training set.
8.1.2.1.2.1.2.1. Classification criterion in BPTool..
|
|
Item "Classify" lets the user define
how the performance is computed.
- Checking
None disables the performance computation. See function
set-class-nil of the Netenv library.
- Checking Max states that a
pattern is well classified if the most active unit is the right unit.
See function set-class-max of the Netenv library.
- Checking Quadrant states that a
pattern is well classified if the state of each output unit has the
right sign. See Netenv library function set-class-quadrant.
- Checking Sign states that a
pattern is well classified if the state of each output unit has the
right sign with a given margin. See function set-class-sgn of the Netenv
library.
- Checking Lms states that a
pattern is well classified if the mean square error is less than a given
margin. See function set-class-lms of the Netenv library.
- Checking Hamming states that a
pattern is well classified if the states of the output units are within
a given margin of the desired values. See function set-class-hamming of
the Netenv library.
8.1.2.1.2.1.2.2. Display Options in BPTool.
|
|
Item "Display" of menu
Settings lets the user set up various displays updated during
every training iterations or every measurement iterations. This is
similar to the functions set-disp of
the Netenv library.
- Checking Text
outputs extensive information to the interpreter window whenever an
example is processed.
- Checking Error plots the
instantaneous error versus the number of training iterations whenever an
example is processed. This plot is created in a special window.
- Checking Network draws the
network whenever an example is processed. This picture is created in a
new window using the function draw-net defined by NetTool.
8.1.2.1.2.2. Training with BPTool.
|
|
8.1.2.1.2.2.0. Defining the test set and the training set with BPTool.
|
|
Four fields in the ``Data Selection Zone'' let you enter the boundaries
of the training set and the test set. These values are passed to
functions ensemble and test-set of the
Netenv library.
8.1.2.1.2.2.1. Choosing the back-propagation version with BPTool.
|
|
You must also select a training algorithm using the ``Algorithm
Selection Zone''. The algorithm selection zone offers four choices of an
optimisation algorithm: Online Gradient, Online Levemberg Marquardt,
Batch Gradient and Batch Conjugate Gradient.
Certain items are disabled when the current kernel does not support such
training algorithm.
The default training algorithm is the so-called
"Online Gradient Algorithm" (i.e. stochastic gradient) which
is faster when you train your network on extremely redundant data, like
handwritten digits.
If you are using sn28ite or sn28itenew, you can also select the
"Batch Gradient Algorithm" . This algorithm is sometimes
better for finishing the training and tuning the weights of a network.
You must progressively decrease the learning rate when learning
imporves.
If you are using sn28new or sn28itenew, you can use second order
optimization of the cost function using the stochastic
Levemberg-Marquardt's algorithm. These methods can be used with both the
stochastic and batch gradient methods.
You can just specify a constant learning rate of about 0.0005 and let
the algorithm adjust the step sizes. Although slower than well tuned
parameters, these algorithms put the strain onto your computer rather
than you.
The "Conjugate Gradient Method" is
useful for deep optimization of small networks on medium size databases.
8.1.2.1.2.2.2. Controlling the training with BPTool.
|
|
Finally, the ``Training control Zone'' and ``Learning rate control
zone'' let you control the training.
The editable field labelled "Test performance
every --- iterations" controls the number of pattern
presentation between two performance measurement. When a batch algorithm
is selected, this number is rounded to the next multiple of the training
set size. By default, performance measurements are done after every pass
of the training set.
Three buttons let the user start the training algorithm.
-
Button "Learn" starts the training
algorithm until you press button "Break"
. No measurements are performed during training.
- Button "Run" starts the training
algorithm until you press button "Break"
. Measurement are performed on the training set every n iterations. The
results are plotted according to the current settings and displayed in
the message string.
- Button "Trun" starts the training
algorithm until you press button "Break"
. Measurement are performed on both the training set and the test set
every n iterations. The results are plotted according to the current
settings and displayed in the message string.
On the first click button "Break"
tells BPTool to stop training at the end of the current pass of the
training set. This condition is displayed in the message string. On the
second click it stops training immediatly.
The BPTool window remains active while the training algorithm is
running. In particular, you can use the File
, Parameters and
Settings menus.
Two buttons control the learning rates.
- Button
"Divide epsilon by 2" divides all learning rates by 2.
- Button "Multiply epsilon by 2"
multiplies all learning rates by 2.
At any time you can start a measurement pass on the training set or on
the test set using the following buttons.
- Button
"Training Set Perf" Performs a measurement pass on the
training set. Results are plotted according to the current settings and
displayed in the message string.
- Button "Test Set Perf" Performs a
measurement pass on the test set. Results are plotted according to the
current settings and displayed in the message string.
8.1.2.1.3. Other SN Tools.
|
|
Three tools are available for data preprocessing or postprocessing.
-
StatTool computes simple statistics on the values of a 2-dimensional
matrix.
- NormTool performs a linear normalization on a 2-dimensional matrix.
- ImageTool allows you to browse and preprocess a matrix containing
several images
This tool computes simple statistics on the values of a 2-dimensional
matrix.
Item "Load" in menu
File lets you load a new matrix from a file or from a lisp
variable.
Button "Statistics" computes simple
statistics.
Button "Histogram" draws an histogram.
The histogram routine however is written in Lisp and may be
disappointingly slow.
8.1.2.1.3.0.0. (StatTool [mat])
|
(matrixtool.sn) |
This autoload function invokes the Statool interface.
This tool performs a linear normalization on a 2-dimensional matrix.
There are several choices for entering the coefficients of the linear
normalization.
- Selecting Min/max
lets you enter the desired maximal and minimal values of the elements of
the matrix.
- Selecting Mean/Sdev lets you
enter the desired mean and standard deviation of the elements of the
matrix.
- Selecting AX+B lets you enter
directly the coefficients.
If Column by column is checked, these
normalization coefficients are computed independently for each column of
the matrix.
8.1.2.1.3.1.0. (NormTool)
|
(matrixtool.sn) |
This autoload function invokes the NormTool interface.
8.1.2.2.0. Introduction to Knnenv.
|
|
8.1.2.2.0.0. Overview of Knnenv.
|
|
This manual discusses a family of algorithms known as ``codebook based
algorithms'' for pattern recognition. SN2.8 offers turnkey routines for:
-
nearest neighbor algorithms (kNN),
- codebook editing algorithms (Multiedit/Condense),
- clustering algorithms (kMeans and LVQ),
- network of locally tuned units (RBF),
- topological maps (TMaps).
It also contains a set of lower level routines for manipulating
codebooks and performing the essential computations of the above
algorithms. In fact, all these algorithms are partly implemented by
primitives hardcoded in the kernel, partly implemented as lisp programs
located in file ".../sn2.8/lib/knnenv.sn"
.
There is a major prerequisite for using codebook based algorithms:
codebook based algorithms rely on a simple euclidian topology which is
often too simple for handling raw data. They usually perform much better
on adequately preprocessed data.
Preprocessing methods include a lot of techniques:
- statistical
techniques like scaling the principal axes like a Mahalanobis distance,
- spectral analysis using Fourier transforms, wavelet methods or LPC,
- image analysis techniques, like histogram equalization or Hough
transform,
- or even multilayer neural networks.
The algorithms themselves have been implemented with numerically robust
routines whose convergence is based on well understood properties.
8.1.2.2.0.1. About this Knnenv Manual.
|
|
This manual is divided in five major parts:
- Knnenv relies
mainly on the codebook data structure. Knnenv provides several
fundamental functions for creating, accessing, modifying, merging and
splitting codebooks.
- Codebooks are basicaly used for quantization and classification
tasks. Knnenv provides several basic operators for this purpose.
- Codebooks may be ressource-consuming when the number of data
increases. Knnenv provides three main families of functions aiming at
getting reasonable codebooks: editing algorithms like multiedit and
condense, clustering algorithms like kMeans and discriminant algorithms
like LVQ.
- Radial Basis Functions (RBF) are implemented as an hybrid algorithm
combining a Knnenv codebook and a Netenv network.
- Topological Maps are implemented as codebooks extended with
topological information.
8.1.2.2.1.0. The Codebook Data Structure.
|
|
A codebook is a set of pairs composed of a vector and of a numeric label
which encodes the class of the corresponding vector. Codebooks serve two
purposes:
- A codebook can store examples of a pattern
recognition problem. Each element of the codebook describes a pattern
(the vector) and its class (the label).
- A codebook can store prototypes extracted by a pattern recognition
algorithm. Each element of the codebook describes a typical pattern (the
vector) of a given class (the label). Such prototypes are then used for
recognizing unknown patterns.
SN2.8 provides a number of fast primitives for (a) creating and
accessing codebooks, (b) merging and splitting codebooks, (c) returning
the closest elements of a codebook to a given pattern and (d) creating
or improving a codebook of prototypes using a codebook of examples.
8.1.2.2.1.1. Codebook Creation and Access.
|
|
Codebooks are lisp objects of class |CODEBOOK|.
The patterns of a codebook are in fact stored in the rows of a matrix
(the ``wordmatrix'') associated to the codebook. You can obtain this
matrix either with function codebook-word
or by evaluating the codebook with function eval
.
The labels are stored in the codebook itself. Labels are small positive
integer numbers. You can access these labels with function
codebook-label .
8.1.2.2.1.1.0. (load-iris)
|
(knnenv.sn) |
This function creates two codebooks containing Fisher's iris database.
Codebook iris-a contains a training set of 100 examples. Codebook iris-t
contains the total set of 150 examples (including the 100 training
example for historical reasons only!)
This database contains examples of three species of iris.
- Each
flower is described by four numerical measurements (the pattern)
concerning the size and color of the flowers petals and sepals. These
four numbers are stored on each row of the wordmatrix of the codebooks
- The specie of the flower, i.e. 0
, 1 or 2
, is stored in the label part of the codebook.
8.1.2.2.1.1.1. (new-codebook <ncode | wordmatrix> <ndim | labelvector>)
|
|
When function new-codebook is called
with numerical arguments, it returns a new codebook object able to
contain ncode patterns of dimension ndim. Both the pattern and the
labels are initialized with zeroes.
? (setq a (new-codebook 30 4))
= ::CODEBOOK:30x4
? (eval a)
= ::MAT:30x4
? (bound (eval a))
= (30 4)
When function new-codebook is called
with matricial arguments, it returns an initialized codebook object.
Argument wordmatrix is a 2-dimensional fmatrix whose rows contain the
patterns. Argument labelvector is a
1-dimensionnal containing the labels of the patterns stored in every row
of wordmatrix .
The resulting codebook just points to the patterns stored in matrix
wordmatrix . Modifying this matrix thus modifies the codebook
patterns.
8.1.2.2.1.1.2. (codebook-word cb [ n [ word ]] )
|
|
With one argument, this function returns the wordmatrix of codebook
cb . This matrix addresses the same elements than the
codebook. Modifying this matrix thus modify the codebook.
With two arguments, this function returns the n
-th pattern of codebook cb as a
vector. This vector addresses the same elements than the codebook.
Modifying this vector thus modify the n
-th pattern of the codebook.
With three arguments, this function copies vector word into the
n -th pattern of codebook cb
. It returns then the new pattern.
? (load-iris)
= ::CODEBOOK:150x4
? (codebook-word iris-a 2)))
= ::MAT:4
8.1.2.2.1.1.3. (codebook-label cb [ n [ label ]] )
|
|
With one argument, this function returns a new vector containing the
labels of each example of codebook cb
. Unlike the matrices returned by function
codebook-word , modifying this vector does not modify the
codebook.
With two arguments, this function returns the label of the
n -th example of cb .
With three arguments, this function makes the label of the
n -th example of codebook cb
equal to integer label. It returns then the new label.
? (load-iris)
= ::CODEBOOK:150x4
? (for (i 0 4) (print (codebook-label iris-a i)))
0
2
2
0
2
= 2
8.1.2.2.1.1.4. (codebook-labels cb)
|
|
This function returns a list of all the labels present in codebook
cb .
? (load-iris)
= ::CODEBOOK:150x4
? (codebook-labels iris-t)
= (0 1 2)
8.1.2.2.1.2. Codebook Operations.
|
|
A few operations are extremely useful for manipulating codebooks as a
whole.
8.1.2.2.1.2.0. (codebook-split cb min max )
|
|
This function returns a codebook composed of examples
min to max of codebook
cb . Note that this new codebook addresses the same patterns
than cb . Modifying a pattern in the
new codebook thus modifies a pattern in codebook
cb .
? (load-iris)
= ::CODEBOOK:150x4
? (codebook-split iris-t 25 74)
= ::CODEBOOK:50x4
8.1.2.2.1.2.1. (codebook-merge cb-1...cb-n)
|
|
This function combines the example of codebooks
cb-1 to cb-n and returns a
new codebook containing copies of these examples.
? (load-iris)
= ::CODEBOOK:150x4
? (codebook-merge iris-a iris-t)
= ::CODEBOOK:250x4
8.1.2.2.1.2.2. (codebook-select condition cb-1 ... cb-n)
|
|
This function extracts the examples of codebooks
cb-1 to cb-n whose labels
fulfil condition condition. It returns a new codebook containing copies
of these examples.
Argument condition is a function of one argument (the label) which
returns a boolean value. An example is selected if a call to this
function with the example label as argument returns a non nil result.
? (load-iris)
= ::CODEBOOK:150x4
;; Select all iris of first class in codebooks iris-a and iris-t
? (codebook-select (lambda(c) (= c 1)) iris-a iris-t)
= ::CODEBOOK:81x4
8.1.2.2.1.3. Codebook and Netenv.
|
|
If you develop an application which merges a classical network and a
codebook-based algorithm, you may wish to transfer data from a codebook
to a network or vice versa.
8.1.2.2.1.3.0. Convertions with the Netenv Format.
|
|
The codebook data structure is handy for storing or retrieving pattern
recognition data. A single data structure contains both the patterns and
the class.
The natural format used by the Netenv library is slightly different.
Patterns are stored in a first matrix (the pattern matrix) identical to
the wordmatrix of a codebook. Labels however are stored in a second
matrix (the desired matrix) whose columns correspond to each class. The
largest value in a row indicates the class of the corresponding example.
8.1.2.2.1.3.0.0. (codebook-to-patdes cb)
|
(knnenv.sn) |
This function takes a codebook cb and
returns a list of the two matrices containing the same examples under
the Netenv natural format.
8.1.2.2.1.3.0.1. (patdes-to-codebook pattern-matrix desired-matrix)
|
(knnenv.sn) |
This function takes a pattern matrix
pattern-matrix and a desired matrix
desired-matrix and returns a codebook containing the same
examples.
8.1.2.2.1.3.1. Using Codebooks with Netenv Algorithms.
|
|
Besides these conversion functions, you might redefine functions
present-pattern and present-desired
of the Netenv library for handling the codebook format. These functions
indeed are always called for accessing the examples.
For instance, the following functions extract the example data of a
back-propagation network from a codebook pattern-codebook rather than
using the classical matrices pattern-matrix
and desired-matrix .
(de present-pattern(layer pnum)
(get_pattern (eval pattern-codebook) pnum layer)
(if (0<> input-noise)
(gauss-state layer input-noise) ) )
(de present-desired(layer pnum)
(let ((cl (codebook-label pattern-codebook pnum)))
(each ((c desired-layer))
(if (0= cl)
(nval c 1)
(nval c -1) )
(incr cl -1) ) ) )
8.1.2.2.1.4. Codebook Files.
|
|
8.1.2.2.1.4.0. Codebook File Format.
|
|
There are two formats for codebook files.
The ascii format records the codebook data in text files. The first two
lines contain the number of examples ncode
and the size of the patterns ndim .
Every other line contains the patterns y
as a collection of numbers separated by blanks. The corresponding labels
are recorded as integer numbers on the intermediate lines.
<ncode>
<ndim>
<y00> <y01> <y02> ...
<label-0>
<y10> <y11> <y12> ...
<label-1>
...
The binary format is almost identical. The number of examples <ncode>, the
size of the patterns <ndim> and the labels are stored as four byte integers
(int). The patterns are stored as vectors of single precision numbers (float).
CODEBOOK FILE:
int ncode;
int ndim;
REPEAT ncode TIMES:
float pattern[NDIM];
int labels;
8.1.2.2.1.4.1. Codebook I/O Functions.
|
|
8.1.2.2.1.4.1.0. (save-codebook cb filename)
|
|
This function saves a codebook cb into
a file named filename under the binary codebook format.
8.1.2.2.1.4.1.1. (save-ascii-codebook cb filename)
|
|
This function saves a codebook cb into
a file named filename under the ascii codebook format.
8.1.2.2.1.4.1.2. (load-codebook filename)
|
|
This function loads and returns a codebook from file named filename
using the binary codebook format. Unlike function
load-matrix , these functions never check the type of the
data files.
8.1.2.2.1.4.1.3. (load-ascii-codebook filename)
|
|
This function loads and returns a codebook from file named filename
using the ascii codebook format. Unlike function load-matrix, these
functions never check the type of the data files.
8.1.2.2.2. k-Nearest Neighbors.
|
|
In this chapter, we describe elementary functions for using codebooks
for quantization and classification task. The basic algorithm consists
in extracting the k closest examples of a codebook to a given pattern.
8.1.2.2.2.0. Distances in Knnenv.
|
|
The very notion of closest examples depends on the selected distance.
For efficiency reasons, we have implemented functions using the
Euclidian distance. In many cases a non Euclidian topology can be mapped
to a Euclidian topology by an adequate preprocessing of the patterns.
Consider for instance the Mahalanobis distance:
Mahalanobis(x,y) = (x-y)T A (x-y)
where matrix A is the inverse of the
definite positive covariance matrix of the examples. By diagonalizing
matrix A , we obtain
A = (P)T L P where L is a
diagonal matrix with positive terms. If we denote as
R(L) the diagonal matrix whose terms are the square root of
the terms of matrix L , we have:
Mahalanobis(x,y) = Euclidian( R(L) P x, R(L) P y)
Actually A is often the inverse of the
covariance matrix plus a small regularizing term.
8.1.2.2.2.1. KNN for Vector Quantization.
|
|
Vector quantization consists in replacing a pattern by its closest
pattern in a codebook. This technique changes a continuous problem
(working on a continuous pattern space) into a discrete problem (working
on a finite set of codes). It has been used extensively in several area
like speech processing.
8.1.2.2.2.1.0. (codebook-distances vector cbref)
|
|
This function returns a 1-dimensional matrix containing the Euclidian
distances between the vector vector
and all patterns in codebook cbref .
Argument vector must be a
1-dimensionnal matrix whose size matches the size of the example in
codebook cbref .
8.1.2.2.2.1.1. (knn vector cbref k)
|
|
This function returns a list containing the order number of the
k closest patterns of codebook cbref
to vector vector . Argument
vector must be a 1-dimensionnal matrix whose size matches the
size of the example in codebook cbref
.
? (load-iris)
= ::CODEBOOK:150x4
? (knn (codebook-word iris-t 0) iris-a 5)
= (0 73 59 56 87)
? (all ((n result)) (codebook-label iris-a n))
= (0 0 0 0 0)
8.1.2.2.2.2. KNN for Pattern Recognition.
|
|
As suggested by the previous example, we can assign a class to a new
pattern by selecting the most frequent label among the labels of the
k closest examples of a codebook. This is easily achieved by
the following functions:
8.1.2.2.2.2.0. (codebook-dist-and-labels vector cbref)
|
(knnenv.sn) |
This function returns a 2-dimensional matrix with 2 colums. The first
column contains the Euclidian distances between the vector
vector and all patterns in codebook
cbref . The second column contains the labels of the
corresponding patterns in codebook cbref
. Argument vector must be a 1-dimensionnal matrix whose size matches the
size of the example in codebook cbref
.
8.1.2.2.2.2.1. (knn-class vector cbref k)
|
|
This function retrieves in codebook cbref
the k closest examples to vector
vector and returns the most frequent label. If two labels are
equally represented, this function returns -1.
? (load-iris)
= ::CODEBOOK:150x4
? (knn-class (codebook-word iris-t 0) iris-a 5)
= 0
This pattern recognition method is referred to as ``the kNN algorithm''.
8.1.2.2.2.2.2. (test-knn cbdata n cbref <k.)
|
(knnenv.sn) |
This function extracts the n -th
pattern of codebook cbdata , retrieves
in codebook cbref the
k closest examples to vector vector
and tests if the most frequent label is equal to the
n -th label of codebook cbdata
. In this latter case, it returns t ;
otherwise () is returned.
? (load-iris)
= ::CODEBOOK:150x4
? (test-knn iris-t 0 iris-a 5)
= t
8.1.2.2.2.2.3. (perf-knn cbdata cbref k [flag])
|
|
This function applies the kNN algorithm to every pattern of codebook
cbdata using the prototype codebook
cbref . It compares then the result with the label stored in
codebook cbdata and returns the
percentage of correct classification.
? (load-iris)
= ::CODEBOOK:150x4
? (perf-knn iris-t iris-a 2)
= 94
When the optional argument flag is t ,
this functions returns a list (mis rej)
which indicates the misclassification rate mis
and the rejection rate rej . A
rejection occurs when two labels are equally represented among the
k closest examples.
? (perf-knn iris-t iris-a 2 t)
= (1.33333 4.66667)
The actual performance of the kNN algorithm improves when the number of
references increases. Given a reference codebook size, the optimal value
for parameter k is probably related to
the amount of noise present in the data.
8.1.2.2.2.2.4. (confusion-knn cbdata cbref k)
|
(knnenv.sn) |
This function applies the kNN algorithm to every pattern of codebook
cbdata using the prototype codebook
cbref . It compares then the result with the label stored in
codebook cbdata and returns a matrix
containing the number of examples of a given class recognized as members
of a second class..
? (load-iris)
= ::CODEBOOK:150x4
? (confusion-knn iris-t iris-a 1)
= [[50.00 0.00 0.00]
[ 0.00 48.00 2.00]
[ 0.00 0.00 50.00]]
In this matrix, each row denotes the actual class and each column
denotes the class returned by the kNN algorithm.
8.1.2.2.3. Codebook Algorithms.
|
|
Several issues must be addressed for using efficiently the nearest
neighbour methods. On the first hand, the performance of these
algorithms depends crucially on the size of the codebook. On the second
hand, large codebooks make slower the search for the nearest neighbours.
This chapter present several algorithms for computing small codebooks
with good properties using large codebooks. There are several families
of algorithms:
- Editing algorithms, like multiedit and
condense, select a limited number of examples from the large codebook.
- Clustering algorithms, like k-Means, compute a fixed number of
protoypical examples
- Iterative algorithms, like LVQ, improve the patterns of a small
codebook for increasing the pattern recognition performance.
8.1.2.2.3.0. Codebook Editing Algorithms.
|
|
Editing algorithms select a limited number of examples from a large
codebook in order to maintain or improve the rate of correct
classifications. We must then select patterns which implement a good
piecewise linear approximation of the Bayes boundary.
Certain patterns indeed seem less informative:
- Ambiguous
patterns just introduce noise effects on the decision boundary. In
practice, this noise is cancelled by selecting a larger number of
neighbours. The multiedit algorithm builds a smaller codebook with less
ambiguous patterns. This codebook achieves comparable performance using
a smaller number of nearest neighbours.
- Patterns deeply imbedded within clusters bring little contribution
to the classification boundaries. The condense algorithm builds a
smaller codebook by removing these patterns. This codebook achieves a
comparable performance with less computation.
Here are a few remarks for using these algorithms:
- Each
algorithm discards the patterns that the other algorithm would keep. The
multiedit algorithm keeps only the typical patterns of a class and
remove the ambiguous and noisy patterns which are closer to the decision
boundary. The condense algorithm would discard typical patterns and keep
the few limiting patterns which actually define the boundary.
- The multiedit algorithm is useful when the data are noisy for
reducing the number k of nearest neighbours with a limited impact on the
performance.
- The condense algorithm usually makes a strong reduction on the size
of a multiedited codebook. It however performs poorly on a noisy
codebook because it keeps a large number of ambiguous patterns.
8.1.2.2.3.0.0. (multiedit cb n [verbose])
|
(knnenv.sn) |
This function applies algorithm multiedit to codebook
cb and returns the multiedited codebook. Here is the
algorithm multiedit :
- 1 Divide the initial codebook in
n subsets.
- 2 Classify the examples of subset i
using the nearest neighbour in the next subset.
- 3 Discard all the misclassified examples.
- 4 Iterate steps 1, 2 and 3 until no editing is done.
The parameter n controls the grain of
the multiedit algorithm. It must be included between 3 and the square
root of the codebook size. Using larger n
eliminates more examples. We suggest starting with the smallest legal
value: 3.
When the flag verbose is t , a message
is displayed before each iteration. This message indicates the current
size of the multiedited codebook.
? (load-iris)
= ::CODEBOOK:150x4
? (perf-knn iris-t iris-a 1)
= 98.6667
? (setq miris-a (multiedit iris-a 3))
= ::CODEBOOK:92x4
? (perf-knn iris-t miris-a 1)
= 96.6667
8.1.2.2.3.0.1. (condense cb [verbose])
|
(knnenv.sn) |
This function applies the condense algorithm to codebook
cb and returns the condensed codebook. Here is the condense
algorithm:
- 1 Create an empty codebook.
- 2 Classify every example of the initial codebook using the nearest
neighbour rule in the new codebook. Insert every misclassified example
into the new codebook.
- 3 Iterate steps 1 and 2 until no editing is done.
When the flag verbose is t , a message
is displayed before each iteration. This message indicates the current
size of the multiedited codebook.
? (setq ciris-a (condense iris-a))
= ::CODEBOOK:17x4
? (perf-knn iris-t ciris-a 1)
= 98.6667
When the data sets are noisy, you should use multiedit before condense.
Algorithm condense indeed would keep a lot of noisy patterns. The
situation is less obvious on clean data sets. For example, 17 examples
selected by the condense algorithm achieve the full performance of the
iris training set..
8.1.2.2.3.1. Codebook Clustering Algorithms.
|
|
Clustering algorithms compute a codebook whose protoypes
xi fulfil an optimal criterion. The main clustering algorithm
in SN2.8 is the k-Means algorithm which minimizes the quantification
error:
Q = Expectation( Min (x-xi)2 )
This quantity measures the average distance between a pattern and the
closest prototype in the reference codebook.
8.1.2.2.3.1.0. (codebook-distorsion cbdata cbref)
|
|
This function returns the average distance between a pattern of codebook
cbdata and the closest pattern in codebook
cbref . This is exactly the quantity Q
described above.
8.1.2.2.3.1.1. (k-means cbdata k)
|
|
This function returns a codebook of k
prototypes which minimizes the quantization distorsion
Q computed over codebook cbdata
. This codebook is computed using the stochastic k-Means algorithm:
-
1 Fill the protoype codebook with the k
first patterns.
- 2 Take a random pattern in the example codebook and pull the
closest pattern in the prototype codebook towards the example pattern.
- 3 Repeat steps 1 and 2 until the algorithm converges.
The k-Means algorithm does not depend on the labels of the example
codebook.The default labels of the k
references of the resulting codebook are those of the
k first references of cbdata
. It is thus useful to compute optimal labels using the function
codebook-assign-labels .
8.1.2.2.3.1.2. (codebook-assign-labels cbdata cbref [k])
|
(knnenv.sn) |
This function assigns labels to the examples of codebook
cbref in order to minimize the misclassification rate of the
kNN algorithm applied to the examples of codebook cbdata. This is
achieved by a voting procedure. The default value for
k is 1.
? (load-iris)
= ::CODEBOOK:150x4
? (setq kiris-a (k-means iris-a 12))
= ::CODEBOOK:12x4
? (codebook-assign-labels iris-a kiris-a 1)
= ::CODEBOOK:12x4
? (perf-knn iris-t kiris-a 1)
= 96
8.1.2.2.3.1.3. (k-means-class cbdata kpc [classes])
|
(knnenv.sn) |
There is however a much better alternative for optimizing the pattern
recognition performance. Function k-means-class
compute a codebook containing kpc
prototypes per class. These prototypes are computed by running algorithm
k-Means on codebook cbdata within each
class.
? (load-iris)
= ::CODEBOOK:150x4
? (setq kiris-a (k-means-class iris-a 2))
= ::CODEBOOK:6x4
? (perf-knn iris-t kiris-a 1)
= 96.6667
The optional argument classes restricts this algorithm to the examples
whose labels belong to a given list. If this argument is omitted, all
the examples are considered. Important note: you should always test this
astounding ``supervised k-Means'' algorithm.
8.1.2.2.3.2. Codebook Iterative Algorithms.
|
|
Iterative algorithms try to improve the patterns of a small codebook for
decreasing the misclassification rate. The main iterative algorithms are
the Learning Vector Quantization (LVQ) algorithms introduced by Kohonen
in the middle of the eighties.
The LVQ algorithms always require a good initial codebook of protypes
returned by the k-Means or supervised k-Means algorithms. When a large
number of clean examples are available, a few iterations of the LVQ
algorithms improve the pattern recognition performance.
8.1.2.2.3.2.0. Initializing a Codebook.
|
|
8.1.2.2.3.2.0.0. (init-lvq cbdata k)
|
(knnenv.sn) |
This function returns a good initial codebook with
k examples for the LVQ algorithms. This codebook is computed
by running the k-Means algorithm (see function
k-means ) on the example codebook
cbdata . The labels are assigned using a voting procedure
(see function codebook-assign-labels
).
8.1.2.2.3.2.0.1. (init-lvq-class cbdata kpc)
|
(knnenv.sn) |
This function returns an even better initial codebook with
kpc examples per class by running the supervised k-Means
algorithm (see function k-means-class
above) on the example codebook cbdata
. This initial codebook is so good that LVQ algorithms seldom improve
the misclassification rate...
8.1.2.2.3.2.1. Codebook Learning Vector Quantization Algorithms.
|
|
8.1.2.2.3.2.1.0. (learn-lvq version cbdata cbref niter [ a0 [ win ]])
|
|
This single functions implements the three major LVQ algorithms for
improving the prototype codebook cbref
using the example codebook cbdata .
Argument niter specifies the number of
iterations of the LVQ algorithm to perform. Argument version must be 1,
2 or 3; it selects the LVQ1, LVQ2 or LVQ3 algorithms.
Algorithm LVQ1 iterates niter times
over the examples of codebook cbdata .
For each example x , it retrieves the
closest pattern x* in codebook
cbref .
- If the prototype and example labels are
equal, the prototype pattern xi is
moved towards the example pattern x by
quantity epsi * (x-x*) .
- If the labels are different, the prototype pattern
x* is pushed away from the example pattern
x by the same amount.
For each example x , algorithm LVQ2
retrieves the two closest patterns x*
and x** in the prototype codebook.
-
If the label of prototype x* is
different from the label of example x
,
- if the label of prototype x** is
equal to the label of example x ,
- and if the projection of the example x
on the line [x*,x**] falls in a window
of size win * ||x**-x*|| located in
the middle of segment [x*,x**] ,
then the closest prototype pattern x*
is pushed away from the example pattern x
by quantity epsi * (x-x*) and the
second closest prototype pattern x**
is pulled towards the example pattern by quantity
epsi * (x-x**) .
For each example x , algorithm LVQ3
retrieves the closest pattern x* and
the closest pattern x+ with the right
label the prototype codebook.
- If the label of the closest
prototype x* is different from the
label of example x ,
- and if the projection of the example x
on the line [x*,x+] falls in a window
of size win * ||x+-x*|| located in the
middle of segment [x*,x+] ,
- then prototype pattern x* is
pushed away from the example pattern x
by quantity epsi * (x-x*) and
prototype pattern x+ is pulled towards
the example pattern by quantity epsi * (x-x+)
.
It has been proven that algorithm LVQ3 minimizes the misclassification
error on the example codebook when the window size win decreases slowly
towards 0 .
Argument a0 indicates the initial
value of the step size. The step size et decreases linearly from value
a0 to 0 . The initial step
size default to 0.1 which is usually a
good value for all problems.
Argument win specifies the ``window'' size for the LVQ2 and LVQ3
algorithms. The default window size 0.01
however is too small to allow significant modification of the prototype
patterns. You must either select a higher window size and use the LVQ3
algorithm for fine tuning the patterns.
? (load-iris)
= ::CODEBOOK:150x4
? (setq liris-a (init-lvq-class iris-a 2))
= ::CODEBOOK:6x4
? (perf-knn iris-t liris-a)
= 96.6667
? (learn-lvq 3 iris-a liris-a 20 0.1 0.1)
= ()
? (perf-knn iris-t liris-a)
= 97.3333
8.1.2.2.4. Radial Basis Functions in Knnenv.
|
|
8.1.2.2.4.0. Overview of Radial Basis Functions in Knnenv.
|
|
Radial Basis Function networks have been shown quite efficient for
multivariate regressions and classifications.
They have been introduced by Broomhead and Loewe in 1988 and popularized
later by Moody.
A RBF net is made of three layers:
- an input layer,
- a hidden layer made of k
euclidian units. These units compute a non linear function (usually a
bell shaped function) of the distances between the input
x and the weights w :
fk(x) = g b( f((x-wk)2;s2) )
8.1.2.2.4.1. Implementation of Radial Basis Functions as an Hybrid Algorithm.
|
|
This chapter presents an example of neural network algorithm using both
codebooks and standard networks.
In order to take advantage of the capabilities of both the Netenv and
Knnenv libraries, we have designed a hybrid system. The computation of
the distances ((x-wi)**2) is
implemented by a codebook. The bell shaped function and the output units
are implemented as a multilayer network.
The glue between both libraries is implemented by redefining locally the
functions present-pattern and
present-desired .
These functions extract an example from a codebook
cbdata , compute the distances to the centers of the
euclidian units (see function codebook-distances
) and enter these distances as input to the multilayer network.
The first layer of the multilayer network multiplies the distances by
the inverse squared widths 1/s2 and
applies the bell shaped function. The second layer of the multilayer
network computes the linear outputs.
The corresponding lisp code is located in file
"sn2.8/lib/knnenv.sn" .
8.1.2.2.4.1.0. (init-rbf cbdata kpc)
|
(knnenv.sn) |
This function computes an initial codebook containing kpc prototypes per
class by running the supervised k-Means algorithm on the examples set
cbdata. Function init-rbf then builds
the multilayer network using the Netenv functions and initializes the
weights and the learning rates.
In order to implement the connection between the functions of Knnenv and
the functions of NetEnv, this function generates function
rbf-present-pattern and
rbf-present-desired and variables
rbf-codebook and rbf-classes
.
8.1.2.2.4.1.1. rbf-codebook
|
[VAR] (knnenv.sn) |
The global variable rbf-codebook
contains the codebook containing the weights of the first layer of the
RBF network.
It is created by the function init-rbf
for implementing the glue between the functions of Knnenv and the
functions of NetEnv. It is automatically handled by
learn-rbf and perf-rbf .
8.1.2.2.4.1.2. rbf-classes
|
[VAR] (knnenv.sn) |
The global variable rbf-classes
contains the list of labels present in the example codebook. The
multilayer network contains one output per class in this list.
It is created by the function init-rbf
for implementing the glue between the functions of Knnenv and the
functions of Netenv. It is automatically handled by
learn-rbf and perf-rbf .
8.1.2.2.4.1.3. (rbf-present-pattern layer pnum)
|
(knnenv.sn) |
Function rbf-present-pattern extracts
the pnum-th pattern of codebook cbdata
, computes the distances to the centers of the euclidian units and
enters these distances into the states of the input layer
layer .
It is created by the function init-rbf
for implementing the glue between the functions of Knnenv and the
functions of Netenv. It is automatically handled by
learn-rbf and perf-rbf .
8.1.2.2.4.1.4. (rbf-present-desired layer pnum)
|
(knnenv.sn) |
Function rbf-present-desired extracts
the pnum-th label of codebook
cbdata and stores the corresponding desired output vector
into the states of the desired layer layer
.
It is created by the function init-rbf
for implementing the glue between the functions of Knnenv and the
functions of Netenv. It is automatically handled by
learn-rbf and perf-rbf .
8.1.2.2.4.1.5. (learn-rbf cbdata niter [epssigma] [epsclass])
|
(knnenv.sn) |
This function performs niter
back-propagation epochs on the multilayter network using the examples
stored in codebook cbdata . The
optional arguments epssigma and
epsclass let the user enter learning rates for the two layers
of the multilayer network.
8.1.2.2.4.1.6. (perf-rbf cbtest)
|
(knnenv.sn) |
This function tests the current RBF network against the examples of
codebook cbtest . It prints the mean
squared error and the misclassification rate.
? (load-iris)
= ::CODEBOOK:150x4
? (init-rbf iris-a 4)
= t
? (learn-rbf iris-a 20)
= t
? (perf-rbf iris-t)
------------------------------------------------------
patterns{0-149}, age=2000, error=0.0895082, performance=92
= 92
8.1.2.2.5. Topological Maps in Knnenv.
|
|
A generic Topological Maps package has been added to the Knnenv library.
The following sections provide
- the basic principles of this
implementation of TMaps,
- the functions for running topological map algorithms,
- a simple example is then discussed in Section 4.3.
More details on the Topological Map algorithms are given by the
following texts:
- Kohonen T.
Self organisation and associative memories, Springer Series in
Information Sciences, vol. 8, Springer Verlag, (1984).
- Kohonen T. The neural phonetic typewriter, IEEE Computer, March-88,
(1988).
8.1.2.2.5.0. Overview of Topological Maps in Knnenv.
|
|
Like the other algorithms of the Knnenv library, both the databases and
the Topological Maps reference points are stored as codebooks. There are
mostly three differences between the k-Means and the Topological Map
algorithms:
- The Topological Map algorithm introduces a
topology on the reference nodes. Reference nodes might be organized on a
line, on a grid, on a ring or on whatever structure pleases us.
- When a reference point is updated, all neighbouring reference
points in the structure are also updated with a similar term.
- Both the learning rate and the neighborhood size decrease during
the training procedure.
A single training function controls all these parameters. In fact this
training function takes various hook functions as arguments. These hooks
functions are called for finding the neighboring references, for
scheduling the decrease of the learning rate and the neighborhood size
and for producing various displays. Several predefined hook functions
are available.
Topological Maps thus interact with all features of Knnenv. For
instance, function codebook-assign-labels
assigns classes to the references of a Topological Map. You can then use
functions knn and
perf-knn for a pattern recognition task. Although using a
Topological Map for pattern recognition is a common practice, you should
consider that Topological Maps is a non supervised algorithm. It is
usually less efficient for pattern recognition than a supervised
algorithm like k-means-class or
learn-lvq .
8.1.2.2.5.1. Programming Topological Maps in Knnenv.
|
|
Programming Topological Maps requires four steps:
- Obtain a
codebook for the data points,
- Create and initialize a codebook for the reference points,
- Define the characteristics of the map using predefined or custom
hook functions.
- Train the map using the Topological Map learning function.
8.1.2.2.5.1.0. Creating and Initializing the Codebooks for 2D Topological Maps.
|
|
The Topological Map databases are stored in a codebook. This codebook
might be created with new-codebook or
loaded from a file with load-codebook
.
8.1.2.2.5.1.0.0. (bidim-query string [xmin ymin xmax ymax])
|
(knnenv.sn) |
This function displays string string
on top of the current window. It builds a codebook by gathering the
coordinates of the mouse clicks. These coordinates are scaled to fit in
range [xmin,xmax]x[ymin,ymax] . If the
range is not specified, the coordinates are scaled between
-1 and +1 . This function
returns the newly created codebook when the user clicks in the title
string.
The Topological Maps reference points are also stored in a codebook.
Such a codebook is easily created with the function
new-codebook and might be initialized randomly. It is however
more efficient to initialize the reference points using predefined
patterns. The following functions lets you initialize a codebook using
standard patterns defined in a bi-dimensional space.
These functions only set the first two coordinates of the reference
points:
8.1.2.2.5.1.0.1. (bidim-init-ring cbd nrefs)
|
(knnenv.sn) |
Returns a codebook of nrefs references located on a circle whose radius
and center are the standard deviation and means of the points of
training codebook cbd .
8.1.2.2.5.1.0.2. (bidim-init-2d cbd n m)
|
(knnenv.sn) |
Returns a codebook of n times
m references located on a grid of n lines and
m columns whose extent is determined by the statistical
properties of the points of codebook cbd
.
8.1.2.2.5.1.1. The Topological Maps Learning Function in Knnenv.
|
|
This single function trains all kind of topological maps. The
characteristics of the map are specified by three hook functions
described below.
8.1.2.2.5.1.1.0. (learn-tmap cbdata cbref niter topo-func epsi-func dist-func [display-func])
|
(knnenv.sn) |
This function performs niter
iterations of the Topological Map training algorithm. Data points are
accessed in codebook cbdata . The
Topological Map reference points are stored in codebook
cbref . Arguments topo-func
, epsi-func and
dist-func are hook functions which specify the
characteristics of the map.
- (topo-func n mat)
The topology is encoded by function topo-func
.
When this function is called, integer n
is the subscript of the closest reference point to the current pattern
(in the input space). Vector mat
contains one component per reference node. This function must store into
matrix mat the distances (as defined
by the map topology) between the n -th
reference node and every node of the map.
- (epsi-func coeff)
The learning rate is scheduled by function
epsi-func .
This function is called once before each iteration. Argument
coeff is a number which decreases linearly from
1 to 0 with the iteration
number. This function must return the current learning rate.
- (dist-func d coeff)
The neighborhood size is scheduled by function
epsi-func .
This function is called before updating a reference node located at
distance d (as defined by the map
topology) of the closest node to the current pattern (in the input
space). Argument coeff is a number
which decreases linearly from 1 to
0 with the iteration number.
If this function returns () , the
reference point is not changed. If this function returns a number, the
learning rate is multiplied by this number before updating the reference
point.
The optional argument display-func
specifies a function called after each iteration for performing various
display tasks. This function takes three arguments: the data codebook
cbdata , the reference codebook cbref
and the iteration number.
8.1.2.2.5.1.2. Topological Maps Predefined Hook Functions in Knnenv.
|
|
Standard hook functions are provided. The following calls generate a
standard hook function for various topologies, various scheduling
strategies and various simple displays.
8.1.2.2.5.1.2.0. Topology Hooks in Knnenv.
|
|
8.1.2.2.5.1.2.0.0. (tmap-topo-1d n)
|
(knnenv.sn) |
Returns a topology hook function topo-func
which defines a mono-dimensional topological map. Argument
n indicates the number of nodes in the map. Such topological
maps are better initialized using function
bidim-init-ring .
8.1.2.2.5.1.2.0.1. (tmap-topo-ring n)
|
(knnenv.sn) |
Returns a topology hook function topo-func
which defines a mono-dimensional ring-shaped topological map. Argument
n indicates the number of nodes in the map. Such topological
maps are better initialized using function
bidim-init-ring .
8.1.2.2.5.1.2.0.2. (tmap-topo-2d n m)
|
(knnenv.sn) |
Returns a topology hook function topo-func
which defines a bi-dimensional topological map organized as a grid with
n lines and m columns. Such
topological maps are better initialized using function
bidim-init-2d .
8.1.2.2.5.1.2.0.3. (tmap-topo-3d n m p)
|
(knnenv.sn) |
Returns a topology hook function topo-func
which defines a tri-dimensional topological map organised as a grid with
n planes, m lines and
p columns.
8.1.2.2.5.1.2.0.4. (tmap-topo-kd n1...nk)
|
(knnenv.sn) |
Returns a topology hook function topo-func
which defines a k-dimensional topological map organised as a grid whose
sizes are given by n1 to
nk .
8.1.2.2.5.1.2.1. Topological Maps Learning Rate in Knnenv.
|
|
8.1.2.2.5.1.2.1.0. (tmap-epsi-const ieps)
|
(knnenv.sn) |
Returns a learning rate scheduling function
epsi-func which sets up a constant learning rate whose value
is ieps . When the learning rate is
equal to 1 , the algorithm exactly
moves the closest reference point over the current data point. A value
of 0.1 is often a good choice for
ieps .
8.1.2.2.5.1.2.1.1. (tmap-epsi-lin ieps)
|
(knnenv.sn) |
Returns a learning rate scheduling function
epsi-func which sets up a linearly decreasing learning rate
whose initial value is ieps .
When the learning rate is equal to 1 ,
the algorithm exactly moves the closest reference point over the current
data point. A value of 0.3 is often a
good choice for ieps .
8.1.2.2.5.1.2.2. Neighborhood Size.
|
|
8.1.2.2.5.1.2.2.0. (tmap-dst-const idist)
|
|
Returns a neighborhood size scheduling function
dist-hook which selects a constant neigborhood size. This
scheduling function returns 1 for all
reference nodes whose distance to the closest node is smaller than
idist .
8.1.2.2.5.1.2.2.1. (tmap-dst-lin idist)
|
|
Returns a neighborhood size scheduling function
dist-hook which selects a linearly decreasing neigborhood
size. This scheduling function returns 1
for all reference nodes whose distance to the closest node is smaller
than idistxcoeff .
8.1.2.2.5.1.2.3. Topological Maps Display Hook in Knnenv.
|
|
Several simple display functions are provided. These functions display a
projection of both the data points and the reference points onto the
first two axes:
8.1.2.2.5.1.2.3.0. (tmap-display-bidim-1d [xmin ymin xmax ymax])
|
(knnenv.sn) |
Returns a display hook function display-func
for displaying in the current window a mono-dimensional Topological Map:
Reference points are linked by a line. Scaling is computed to
accommodate points whose first two coordinates fit in range
[xmin,ymin]x[xmax,ymax] . The default range is
[-1,+1]x[-1,+1] .
8.1.2.2.5.1.2.3.1. (tmap-display-bidim-ring [xmin ymin xmax ymax])
|
(knnenv.sn) |
Returns a display hook function display-func
for displaying in the current window a mono-dimensional ring shaped
Topological Map. Reference points are linked by a closed line. Scaling
is computed to accommodate points whose first two coordinates fit in
range [xmin,ymin]x[xmax,ymax] . The
default range is [-1,+1]x[-1,+1] .
8.1.2.2.5.1.2.3.2. (tmap-display-bidim-2d n m [xmin ymin xmax ymax])
|
|
Returns a display hook function display-func
for displaying in the current window a mono-dimensional Topological Map.
The reference points are linked by a grid of n
lines and m columns. Scaling is computed to accommodate points whose
first two coordinates fit in range
[xmin,ymin]x[xmax,ymax] . The default range is
[-1,+1]x[-1,+1] .
8.1.2.2.5.1.3. How to Write Efficient Topology Hook Functions in Knnenv ?
|
|
You must define a new topology hook function whenever you want to run a
Topological Map with a new topology. There is clearly a right way and a
wrong way to program such a hook functions.
- The wrong way
consists in computing the distances whenever the hook function is
called. Such a hook function might dominate the running time.
- The right way takes profit of the static nature of the Topological
Map structure. It consists in computing all the distances at once. The
hook function just copies the distances of interest in the destination
matrix.
Assume that we use a Topological Map organized as a grid of three lines
and four columns. The distance between two nodes of the grid will be the
number of vertical or horizontal segments between these nodes.
N0 N1 N2 N3
N4 N5 N6 N7
N8 N9 N10 N11
The topology hook function topo-func
must compute a matrix whose elements are the distances from a given node
to any other nodes. For instance, the distance matrices for nodes N5 and
N8 are:
2 1 2 3 2 3 4 5
1 0 1 2 1 2 3 4
2 1 2 3 0 1 2 3
The function tmap-topo-2d first
computes a matrix with six lines and eight columns which contains the
distances the central element of the matrix, located on (3,4).
5 4 3 2 3 4 5 6
4 3 2 1 2 3 4 5
3 2 1 0 1 2 3 4
4 3 2 1 2 3 4 5
5 4 3 2 3 4 5 6
6 5 4 3 4 5 6 7
It returns then a function which copies a 3x4 submatrix into the
destination matrix. This submatrix always includes element 3x4 of the
big matrix on the same location than the closest node:
5 4 3 2 3 4 5 6 5 4 3 2:3:4:5 6
4 3 2:1:2:3 4 5 4 3 2 1:2:3:4 5
3 2 1:0:1:2 3 4 3 2 1 0:1:2:3 4
4 3 2:1:2:3 4 5 4 3 2 1 2 3 4 5
5 4 3 2 3 4 5 6 5 4 3 2 3 4 5 6
6 5 4 3 4 5 6 7 6 5 4 3 4 5 6 7
The hook function thus never computes a distance but copies the right
section of the big matrix into the destination matrix. This copy is
achieved by an efficient C routine. The user is encouraged to design
custom topologies using the same efficient and flexible method.
8.1.2.2.5.2. A Complete Example of Topological Map in Knnenv
|
|
The following example is provided on-line:
8.1.2.2.5.2.0. (tsp-demo)
|
(knnenv.sn) |
This function implements a ring shaped Topological Map for solving the
Travelling Salesman Problem. It creates a window in which you can
acquire cities with mouse clicks. The learning process begins and stops
automatically. Here is the text of this program:
(de tsp-demo(&optional niter (epsi 0.33))
;; Create window,
;; Acquire cities,
;; Initialize the topological map
(let ((window ()))
(new-window)
(let* ((cbdata (bidim-query "Enter cities."))
(ncities (car (bound (eval cbdata))))
(nrefs (2* ncities))
(cbref (bidim-init-ring cbdata nrefs)) )
;; Compute the solution
(learn-tmap cbdata cbref (or niter ncities)
(tmap-topo-ring nrefs)
(tmap-epsi-const epsi)
(tmap-dst-lin (2/ ncities))
(tmap-display-bidim-ring) )
;; Move the nodes on the closest city
;; Note that epsilon=1!
(learn-tmap cbdata cbref 1
(tmap-topo-ring nrefs)
(tmap-epsi-const 1)
(tmap-dst-const 0)
(tmap-display-bidim-ring) ) )
window ) )
If you look at the call to function learn-tmap, you see that:
-
Data is acquired using bidim-query ,
- The map is initialized using bidim-init-ring
,
- The topology hook is returned by
tmap-topo-ring ,
- The learning rate hook is returned by
tmap-epsi-const ,
- The neighborhood hook is returned by
tmap-dst-lin ,
- The display hook is returned by
tmap-display-bidim-ring . You then have an example of
programming topological maps.
8.1.2.2.6. Programming with Knnenv: Examples.
|
|
Examples of Knnenv programms are located in
"sn2.8/examples/Knnenv/examples.sn" . Please copy this file
in your directory. You can then launch the function go described below.
(de go ()
(printf "*********LOADING IRIS DATA SETS*********\n")
;load learning set and test set
(load-iris)
;or
;(setq iris_a (load_codebook "../examples/Knnenv/iris_train"))
;(setq iris_t (load_codebook "../examples/Knnenv/iris_test"))
(printf "\n\n*********TESTING 1NN METHOD*********\n")
;nearest-neighbor performance
(printf "performance=%f\n" (perf-knn iris-t iris-a 1))
(printf "Printing confusion matrix\n")
(matrix-print (confusion-knn iris-t iris-a 1))
(printf "\n\n*********TESTING SOME PREPROCESSING*********\n")
(printf "Multi-editing learning set\n")
(setq multi-edited (multiedit iris-a 3))
(printf "1NN performance on multi-edited set =%f\n"
(perf-knn iris-t multi-edited 1) )
(printf "Condensing the multi-edited set\n")
(setq condensed (condense multi-edited))
(printf "1NN performance on condensed set = %f\n"
(perf-knn iris-t condensed 1) )
(printf "Condensing without multi-edit:\n")
(setq condensed (condense iris-a))
(printf "1NN performance after condensing learning set = %f"
(perf-knn iris-t condensed 1) )
(printf " \n\n********* UNSUPERVISED KMEANS*********\n")
(printf "Learning process with 9 references\n")
(setq kiris-a (k-means iris-a 9))
(printf "Assign class to references\n")
(codebook-assign-labels iris-a kiris-a)
(printf "1NN performance = %f\n" (perf-knn iris-t kiris-a 1))
(printf "Performances of a few test examples at random\n")
(for (i 0 9)
(let*((random (int (rand 0 (bound (codebook-word iris-t) 1))) )
(answer (test-knn iris-t random kiris-a 1)) )
(printf "testing example %3d of test set = " random)
(if (= answer t) (printf "well-classified\n")
(printf "misclassified\n") ) ) )
(printf "Have a look to the labels of the 9 references\n")
(matrix-print (codebook-label kiris-a))
(printf "\n Note that there are:\n")
(printf "\t2 references dedicated to class 0\n")
(printf "\t3 references dedicated to class 1\n")
(printf "\t4 references dedicated to class 2\n")
(printf " \n\n********* SUPERVISED KMEANS*********\n")
(printf "Learning with 3 references per class\n")
(setq kiris-a (k-means-class iris-a 3))
(printf "1NN performance = %f" (perf-knn iris-t kiris-a 1))
(printf "\n\n********* LVQ1*********\n")
(printf "Learning with 3 references per class\n")
(printf "on 20 iteratiobns\n")
(setq kiris-a (init-lvq-class iris-a 3))
(learn-lvq 1 iris-a kiris-a 20)
(printf "LVQ1 performance = %f" (perf-knn iris-t kiris-a 1))
(printf "\n\n********* LVQ2*********\n")
(printf "Learning with 3 references per class\n")
(setq kiris-a (init-lvq-class iris-a 3))
(learn-lvq 2 iris-a kiris-a 20)
(printf "LVQ2 performance = %f"(perf-knn iris-t kiris-a 1))
(printf "\n\n********* LVQ3*********\n")
(printf "Learning with 3 references per class\n")
(setq kiris-a (init-lvq-class iris-a 3))
(learn-lvq 3 iris-a kiris-a 20)
(printf "LVQ3 performance = %f" (perf-knn iris-t kiris-a 1))
;saving any of the above codebooks
;for example kiris-a is saved in file results.cb
(save-codebook kiris-a "results")
(printf "\n\n*********RADIAL BASIS FUNCTIONS*********\n")
(printf "Learning with 3 nodes per class\n")
(init-rbf iris-a 2)
(learn-rbf iris-a 20)
(printf "Performances of RBF algorithm\n")
(perf-rbf iris-t)
;saving the RBF weights
;saving the codebook containing the nodes
(save-codebook rbf-codebook "nodes")
;saving the nodes widths and linear weights
(save-net "sigmaw")
;for loading previously saved rbf weights
(init-rbf iris-a 2)
(setq rbf-codebook (load-codebook "nodes"))
(load-net "sigmaw")
(printf "\n Remove all files built by this program\n")
(sys "rm sigmaw.wei")
(sys "rm nodes.cb")
(sys "rm results.cb")
)