8.1.2. SN2.8.


This package provides the functions of Neuristique's SN28 neural network simulator. Paper documentation is available from Orlogia mailto:info@orlogia.com or http://www.orlogia.com .



8.1.2.0. Netenv.




8.1.2.0.0. Introduction to Netenv.




8.1.2.0.0.0. Overview of Netenv.


"Netenv" is the main component of the first generation libraries:

Curious readers might find Netenv based code for the following algorithms:



8.1.2.0.0.1. About this Netenv Manual.


This manual is a reference manual. It lists all the functions of Netenv and describes their purpose. It also gives equivalent mathematical formulas when such a description is adequate.



8.1.2.0.1. Low Level Functions in Netenv.




8.1.2.0.1.0. Introduction to Low Level Functions in Netenv.


In this section, we describe the primitive C functions of SN2.8 for the simulation of neural networks. These functions usually are very simple and thoroughly optimized.

Most of these functions, however, are seldom used directly: users call instead high level functions which actually use the low level functions. Primitive functions are the basic elements of such higher level functions You may use them if you want to define a new algorithm and thus write a new library.

Understanding these basic computation functions of the simulator, however, is a very efficient way to use the full capacity of SN2.8.



8.1.2.0.1.1. Creating and Scaning Networks Architectures in Netenv.


The network is the main data struture of the simulator. This is an arbitrary oriented graph. A number of units or neurons are linked by oriented connections or synapses. The biological analogy stops here... .IP A unit is identified by its number. .IP A layers is represented as a list of numbers.

The special unit 0 is often referred to as the bias unit. It is always allocated and its initial state is set to 1 . Establishing connections from this unit is a simple way to introduce an adaptable bias parameters.



8.1.2.0.1.1.0. Allocation of Neurons and Connections in Netenv.




8.1.2.0.1.1.0.0. (alloc-net units links [weights])


Allocates memory space for networks with at most units units and at most link connections. The third argument weights is only allowed by kernels sn28ite and sn28itenew. It defines the maximal number of independent weights. If this third argument is not given, the number of weights allocated is equal to the number of connections.

This function returns the number of bytes allocated.



8.1.2.0.1.1.0.1. (clear-net)


Destroy all units and connections created by previous calls to functions newneurons or connect . There is only one unit remaining after a call to clear-net : the bias unit (unit number 0 ) whose state is set to 1 .



8.1.2.0.1.1.0.2. nmax
[VAR]


Contains the maximum number of units, as set by function alloc-net . This number includes the bias unit. The largest available unit number thus is nmax-1 .



8.1.2.0.1.1.0.3. smax
[VAR]


Contains the maximum number of connections, as set by function alloc-net .



8.1.2.0.1.1.0.4. wmax
[VAR]


With kernels sn28ite and sn28itenew only, this variable contains the maximum number of independent parameters in a network, as set by function alloc-net .



8.1.2.0.1.1.0.5. nnum
[VAR]


This variable contains the number of units currently used.



8.1.2.0.1.1.0.6. snum
[VAR]


This variable contains the number of connections currently used.



8.1.2.0.1.1.0.7. wnum
[VAR]


With kernels sn28ite and sn28itenew only, this variable contains the number of shared weights currently used.



8.1.2.0.1.1.0.8. (newneurons n)


Creates n new units in the network memory and returns a list of numbers. A sufficient storage space should have been previously allocated using alloc-net . This function never connects the bias unit to the new units. This must be done explicitly when required.
; create a set of 10 units and store the list of their indices in the variable my-layer
? (setq my-layer (newneurons 10))
= (5 6 7 8 9 10 11 12 13 14)

See: (alloc-net units links [ weights ])




8.1.2.0.1.1.1. Connections in Netenv.




8.1.2.0.1.1.1.0. (connect presyn postsyn [val [eps]])


Creates a new connection from unit presyn to unit postsyn . Arguments presyn or postsyn may also be lists of units. In this case, no optional parameters are allowed and all units of presyn are connected to all units of postsyn .

The optional argument val specifies an (initial) weight value for this connection. With kernels sn28new and sn28itenew, the optional argument eps specifies the learning rate of the connection.



8.1.2.0.1.1.1.1. (dup-connection m1 m2 n1 n2 [val [eps]])


With kernels sn28ite and sn28itenew only, this function creates a new connection from n1 to units n2 which shares the same weight as the connection from m1 to m2 . This function is used for creating networks with equality constraints among the weights. Like function connect , arguments m1 , m2 , n1 and n2 may be lists of cell indices.

Like function connect , function dup-connection handles the optional arguments val for setting the weight value and eps for setting the learning rate.



8.1.2.0.1.1.1.2. (amont n)
(netenv.sn)


This function returns the list of all units that have a connection to unit n (a list of the pre-synaptic cells of cell n ). The order of the units in the returned list is the order of creation of the corresponding connections.



8.1.2.0.1.1.1.3. (pre-syn n)


This function is a synonim for amont .



8.1.2.0.1.1.1.4. (aval n)
(netenv.sn)


This function returns the list of all units which that a connection from unit n (a list of the post-synaptic cells of cell n ). The order of the units in the returned list is the order of creation of the corresponding connections.



8.1.2.0.1.1.1.5. (post-syn n)


This function is a synonim for amont .



8.1.2.0.1.1.1.6. (nfanin n)


Returns the number of units connected to unit n . The fan-in of a unit is not stored in a predefined field; function nfanin recomputes it each time.



8.1.2.0.1.1.1.7. (cut-connection from to)


Function cut-connection removes the connection between cells from and to from the internal lists used for performing the network computations.

Cutting a connection does not free the corresponding weight slot. The weight value however is cleared when you cut the last connection sharing this weight. This property ensures that the weight vector generated by BPtool after pruning a network can be used by the regular C code generated by Nettool.



8.1.2.0.1.2. Accessing Internal Variables in Netenv.


When a network is created, space is allocated for recording numerical values on the units and connections of the network graph.

Internal variables associated with the units are called ``nfields'', an acronym for ``neuron fields''. Internal variables associated with the connections are called ``sfields'', an acronym for ``synapse fields''.

Fields are highly specialized: in order to increase speed, many computational functions are working on implicit fields. In addition, all possible fields are not available on all versions of SN2.8. There is no reason indeed to slow down a regular network by updating the fields required by a shared weigths network.



8.1.2.0.1.2.0. Accessing the Unit Fields ``nfields''.


Here is a list of the fields allocated for each unit, as well as their availability on the major kernels of SN2.8. Each nfield is identified by a number. Symbolic names are defined by the "netenv.sn" library.

Fields marked with a star (*) do not exist in sn28new and sn28itenew.Accessing these fields however is simulated by writing to a reading from the equivalent sfields s-epsilon and s-sigma of the incoming connections.



8.1.2.0.1.2.0.0. Convertion between Fields and Numbers or Lists of Numbers in Netenv.




8.1.2.0.1.2.0.0.0. (nfield n f [v])


Set/get field number f of unit n . If the third optional argument is given, this function sets the field f of unit n to value v . This function always returns the current value of field f .

A group of functions have been defined for directly accessing most fields.



8.1.2.0.1.2.0.0.1. (nval n [v])


Set/get the state of unit n (in nfield n-val ).



8.1.2.0.1.2.0.0.2. (ngrad n [v])


Set/get the activation gradient attached to unit n (in nfield n-grad ).



8.1.2.0.1.2.0.0.3. (nsum n [v])


Set/get the activation of unit n (in nfield n-sum ).



8.1.2.0.1.2.0.0.4. (nbacksum n [v])


Set/get the output gradient of unit n (in nfield n-backsum ).



8.1.2.0.1.2.0.0.5. (nepsilon n [v])


Set/get the learning rate of unit n (in nfield n-epsilon ).

With kernels sn28ite and sn28itenew, this function sets the learning rate of all the connections to cell n and returns the mean of those learning rates.



8.1.2.0.1.2.0.0.6. (nggrad n [v])


Sets/gets the instantaneous second derivative of the cost function with respect to the activation of unit n (in nfield n-ggrad ).This is not the real second derivative, but the contribution of the current pattern to the overall second derivative.

This function is defined by kernels sn28new and sn28itenew only.



8.1.2.0.1.2.0.0.7. (nsqbacksum n [v])


Sets/gets the instantaneous second derivative of the cost function with respect to the output of unit n (in nfield n-sqbacksum ). This is not the real second derivative, but the contribution of the current pattern to the overall second derivative.

This function is defined by kernels sn28new and sn28itenew only.



8.1.2.0.1.2.0.0.8. (nsigma n)


Returns the moving average of the second derivative of the cost function with respect to the connections arriving on unit n . Since field n-sigma does not exist, this function computes the average of the sfields s-sigma of the incoming connections of unit n .

This function is defined by kernels sn28new and sn28itenew only.



8.1.2.0.1.2.0.0.9. (nspare1 n [v]) (nspare2 n [v]) (nspare3 n [v])


Set/get one of the three spare fields attached to unit n . These fields are not used by the simulator routines and can be used freely by the user.



8.1.2.0.1.2.1. Accessing the Connection Fields: ``sfields''.


Similarly, fields associated to each connection are named sfields . Here is a list of the fields in the connections. Like nfields , the various sfields are identified by a number.

Kernels sn28ite and sn28itenew implement shared weight networks by setting up connections which physically share certain sfields: they point to the same piece of memory. A star (*) in the above table indicates those sfields which are shared when you set up a shared weight connection with function dup-connection .



8.1.2.0.1.2.1.0. (sval n1 n2 [v])


Set/get the value of the connection going from unit n1 to unit n2 (in sfield s-val ). This sfield usually contains the weight of the connection. This function returns the empty list if the connection does not exist. It produces an error if you attempt to set the value of a non-existing connection.



8.1.2.0.1.2.1.1. (sdelta n1 n2 [v])


Set/get the weight change of the connection going from unit n1 to unit n2 (in sfield s-delta ). This function returns the empty list if the connection does not exist. It produces an error if you attempt to set the value of a non-existing connection.



8.1.2.0.1.2.1.2. (sepsilon n1 n2 [v])


Set/get the learning rate of the connection going from unit n1 to unit n2 (in sfield s-epsilon ). This function returns the empty list if the connection does not exist. It produces an error if you attempt to set the value of a non-existing connection.

This function is defined by kernels sn28new and sn28itenew only.



8.1.2.0.1.2.1.3. (sacc n1 n2 [v])


Set/get the accumulated gradient for the weight associated to the connection from unit n1 to unit n2 . This function returns the empty list if the connection does not exist. It produces an error if you attempt to set the value of a non-existing connection.

This function is defined by kernels sn28ite and sn28itenew only.



8.1.2.0.1.2.1.4. (ssigma n1 n2 [v])


Sets/gets the partial time average of the second derivative of the cost function for the connection between unit n1 and n2 (in sfield s-sigma ). This function returns the empty list if the connection does not exist. It produces an error if you attempt to set the value of a non-existing connection.

This function is defined by kernels sn28new and sn28itenew only.



8.1.2.0.1.2.1.5. (shess n1 n2 [v])


Set/get the estimate of the Hessian's diagonal term associated to the weight of the connection between units n1 and n2 (in sfield s-hess ). This function returns the empty list if the connection does not exist. It produces an error if you attempt to set the value of a non-existing connection.

This function is defined by kernels sn28new and sn28itenew only.



8.1.2.0.1.2.1.6. (mean-sqr-weight n)


Returns the mean squared value of the weights (in sfields s-val ) of the connections arriving to cell n .



8.1.2.0.1.2.1.7. (mean-sqr-delta n)


Returns the mean square value of the last weight change (in sfields s-delta ) of the connections arriving to cell n .



8.1.2.0.1.2.1.8. (scounter n1 n2)


Get the number of connections which share the same sfields as the connection between units n1 and n2 . This function returns 0 if the connection does not exist.

This function is defined by kernels sn28ite and sn28itenew only.



8.1.2.0.1.2.1.9. (sindex n1 n2 [v])


This function sets/gets the identification index of the weight associated to the connection from unit n1 to unit n2 . Connections with the same index actually share their sfields. Function sindex may be used either to check or to modify the way connections are shared. It returns () if the connection does not exist.

This function is defined by kernels sn28ite and sn28itenew only.



8.1.2.0.1.3. Fields Manipulation in Netenv.


Two kinds of functions operate on nfields and sfields. The first ones are highly optimized algorithm-dependant functions. The second ones implement several general purpose computations. We describe here these latter functions.



8.1.2.0.1.3.0. Unit Fields Manipulation in Netenv.




8.1.2.0.1.3.0.0. (set-nfield l f v)


Sets the fields f of all units in layer l to the value v .
(set-nfield input n-val 0)
; clears all the states of layer input.



8.1.2.0.1.3.0.1. (gauss-state l sdev)


Adds a zero mean gaussian noise of standard deviation sdev to the states of the cells specified by list l .



8.1.2.0.1.3.0.2. (flip-state l p)


Changes the state sign with probability p for all units in list l .



8.1.2.0.1.3.0.3. (find-winner l) (find-loser l)


These functions return the unit of list l whose field n-val has the largest (respectively smallest) value.



8.1.2.0.1.3.0.4. (copy-nfield l1 f1 [l2] f2)


Copies field f2 of units in list l2 into field f1 of units in list l1 . The fields are identified by integers or by their symbolic names as described with function nfield .

If argument l2 is not defined, list l1 is assumed. Execution is slightly faster in this case.

See: (nfield n f [ v ])




8.1.2.0.1.3.0.5. (op-nfield l f1 f2 g2 f3 g3)


Performs a linear combination of fields f2 and f3 of units in l and puts the result in field f1 of the same unit. The operation performed is defined by the real numbers g2 and g3 in the following way:
f1 = g2*f2 + g3*f3



8.1.2.0.1.3.0.6. (mul-nfield l f1 f2 f3)


Computes the product of fields f2 and f3 of units in l and puts the result in field f1 of the same unit. The operation performed is defined in the following way:
f1 = f2*f3

Function mul-nfield returns the sum of all the computed terms i.e. the dot product of fields f2 and f3 .



8.1.2.0.1.3.0.7. (invert-nfield l f1 f2 g)


Divides the scalar g by fields f2 of units in l and puts the result in field f1 of the same unit. The operation performed is defined in the following way :
f1 = g / f2

Function invert-nfield returns the sum of all the computed terms.



8.1.2.0.1.3.1. Connection Fields Manipulation in Netenv.




8.1.2.0.1.3.1.0. (set-sfield l <sf v)


Sets field sf of all incoming connections of layer l to value v .



8.1.2.0.1.3.1.1. (copy-sfield l1 sf1 [l2] sf2)


Copies field sf2 of incoming connections of layer l2 into field sf1 of incoming connections of layer l1 . If argument l2 is not provided, l1 is assumed.



8.1.2.0.1.3.1.2. (op-sfield l sf1 sf2 g2 sf3 g3)


Performs a linear combination of fields sf2 and sf3 of all incoming connections of layer l and puts the result into sfield sf1 . The operation is defined by the reals g2 , g3 :
sf1 = g2 * sf2 + g3 * sf3



8.1.2.0.1.3.1.3. (mul-sfield l sf1 sf2 sf3)


Computes the product of fields sf2 and sf3 of all incoming connection of layer l . The result is stored into field sf1 .



8.1.2.0.1.3.1.4. (invert-sfield l sf1 sf2 g)


Divides g by the value of field sf2 of all incoming connection of layer l . The quotients are stored into field sf1 .



8.1.2.0.1.3.1.5. (set-swn n sf sf)


Takes fields f from all units connected to unit n and transfers them into field sf of the corresponding connections.



8.1.2.0.1.3.1.6. (acc-swn n sf f)


Takes field f from all units connected to unit n and adds it to field sf of the corresponding connections.



8.1.2.0.1.4. Non Linear Functions.


Non Linear Functions (NLF) are special lisp objects which describe a numerical function f(x) of a real variable. NLFs are used for specifying the activation functions of the units and their derivatives.



8.1.2.0.1.4.0. (NLF x)
[NLF]


A NLF object can be used like a Lisp function. This call return f(x) , where f is the function associated to the NLF NLF .



8.1.2.0.1.4.1. (nlf-f-0)


Returns the null NLF:
f(x) = 0



8.1.2.0.1.4.2. (nlf-f-tanh A B C D)


Returns a sigmoid NLF:
f(x) = A tanh (B x) + C x + D



8.1.2.0.1.4.3. (nlf-f-lin A B C D)


Returns a linear NLF:
f(x) = A B x + C x + D



8.1.2.0.1.4.4. (nlf-f-piecewise A B C D)


Returns a piecewise linear NLF:
f(x) = A g(Bx) + Cx + D

where

g(x) = -1  if x < -1
g(x) =  1  if x > +1
g(x) =  x  otherwise



8.1.2.0.1.4.5. (nlf-f-threshold A B C D)


Returns a threshold NLF:
f(x) = A g(Bx) + Cx + D

where

g(x) =  1  if x > 0
g(x) = -1  otherwise



8.1.2.0.1.4.6. (nlf-f-bell)


Returns a bell shaped NLF:
f(x) = 0  if x < -1
f(x) = 0  if x > +1
f(x) = (1 - x**2)**3  otherwise



8.1.2.0.1.4.7. (nlf-f-lisp f)


Returns a NLF associated to the lisp function f . Applying this NLF to a number x calls the lisp function f and returns the result. This is especially fast if the lisp function is a DZ.

See: DZ




8.1.2.0.1.4.8. (nlf-f-spline xl yl)


Returns a tabulated NLF. The graph of this NLF is a smooth interpolation of the points whose abcissas are specified by list xl and which ordinates are specified by list yl .



8.1.2.0.1.4.9. (nlf-df-all nlf)


Returns a NLF describing the derivative of NLF nlf . This derivative is computed with a finite difference and thus is not very accurate.



8.1.2.0.1.4.10. (nlf-ddf-all nlf)


Returns a NLF describing the second derivative of NLF nlf . Again, this derivative is computed with finite differences and is inaccurate.



8.1.2.0.1.4.11. (nlf-df-tanh A B C D)


Returns a NLF describing the derivative of a sigmo•d NLF. This NLF computes the derivative faster and more accurately than (nlf-df-all (nlf-tanh A B C D)). It actually computes:
df(x) = AB (1 + tanh2 (Bx)) + C



8.1.2.0.1.4.12. (nlf-df-bell)


Returns the derivative of the bell shaped NLF:
df(x) = 0  if x < -1
df(x) = 0  if x > +1
df(x) = -6 x (1 - x**2)**2 otherwise



8.1.2.0.1.4.13. (nlf-df-spline X Y)


This function returns NLFs describing the first derivative of a tabulated NLF. As usual, computation is faster and more accurate than
(nlf-df-all (nlf-spline X Y))



8.1.2.0.1.4.14. (nlf-ddf-spline X Y)


This function returns NLFs describing the second derivative of a tabulated NLF. As usual, computation is faster and more accurate than
(nlf-ddf-all (nlf-spline X Y))



8.1.2.0.1.5. Functions for Quasi-Linear Units.




8.1.2.0.1.5.0. Propagation Implementation for Quasi-Linear Units.




8.1.2.0.1.5.0.0. (update-weighted-sum [theta]...layers...)


Computes the weighted sum (stored in nfield n-sum ) of each cell in layers layers, according to the following formula:
nsum(i) = Sum on j ( sval(j,i) nval(j) )

The optional argument theta specifies the standard deviation of a gaussian noise added to the weighted sum.



8.1.2.0.1.5.0.1. (update-state-only nlf...layers...)


Applies NLF nlf to the units in layers layers.
nval(i) = NLF [nsum(i)]



8.1.2.0.1.5.0.2. (update-state [theta] nlf...layers...)


This function combines the two previous functions. It first computes the weighted sum for each unit and then applies the NLF nlf . This is the basic function for updating the state of one or several layers.

See: (update-weighted-sum [ theta ]... layers ...)
See: (update-state-only nlf ... layers ...)




8.1.2.0.1.5.1. Online Gradient Implementation for Quasi-Linear Units.


The following functions are used for implementing ``online'' or ``stochastic'' gradient algorithms. In these algorithms, the weights are updated after each pattern presentation.



8.1.2.0.1.5.1.0. (init-grad-lms dnlf output desired)


Computes the gradient of the following cost function (Mean Squared Error):
Cost = 0.5 *  Sum on i (nval(output i) - nval(desired i))**2

with respect to the weighted sum and state of units in layer output:

- Gradient of Cost on nval(outputi)
  =  nbacksum(output i)  =  nval(desired i) - nval(output i)
- Gradient of Cost on nsum(output i))
  =  ngrad(output i)  =  nbacksum(output i)  DNLF[nsum(output i)]

In addition, kernels sn28new and sn28itenew set the second derivative to 1:

nsqbacksum(output i) = 1

The desired values for a network are usually stored in a pseudo layer desired. The function init-grad-lms compares these desired values with the states of the output layer output and initializes the computation of the gradients. Function init-grad-lms returns the value of the cost function.



8.1.2.0.1.5.1.1. (init-grad-thlms dnlf output desired threshold)


Computes the gradient of a thresholded cost function defined as follows:
nbacksum(output i)
  = 0 if nval(output i) > threshold and nval(desired i) > threshold,
  = 0 if nval(output i) < threshold and nval(desired i) < threshold,
  = nval(desired i) - nval(output i)  otherwise.
- Gradient of Cost on nsum(outputi)
  = ngrad(output i) = nbacksum(output i) * DNLF[nsum(output i)]

This cost function is useful for implementing Rosenblatt's perceptron like algorithms. These algorithms however are often unstable when the minimal cost is not zero.



8.1.2.0.1.5.1.2. (update-back-sum...l...)


Computes the derivatives of the cost with respect to the states of units in layers l , given the derivatives with respect to the weighted sum of the downstream units. .VP - Gradient of Cost on nval(i) = nbacksum(i) = sum on j (sval(i, j) * ngrad(j)) .PP



8.1.2.0.1.5.1.3. (update-gradient-only dnlf...l...)


Computes the derivatives of the cost with respect to the activations (nfields n-sum ) of units in layers l , given the derivatives with respect to their states.
- Gradient of Cost on nsum(i)
= ngrad(i) = nbacksum(i) * DNLF[nsum(i)]

The NLF object dnlf must be the derivative of the NLF object given as argument to the function update-state-only called on that layer.



8.1.2.0.1.5.1.4. (update-gradient dnlf...l...)


Calls both update-backsum and update-gradient-only . This is the basic step of the well known back-propagation algorithm.



8.1.2.0.1.5.1.5. (update-weight alpha decay [...l...])


Updates the weights according to the gradient of the cost, the momentum parameter alpha and the exponential decay parameter decay .
sdelta(i, j) = alpha * sdelta(i, j) + nval(i) * ngrad(j) * nepsilon(j)
sval(i, j) = ( 1 - decay * nepsilon(j) ) * sval(i, j) + sdelta(i, j)

Kernels sn28new and sn28itenew use sepsilon(i,j) instead of nepsilon(j) .

When layers l are specified, the weight update is only performed on the incoming connections to layers l . Otherwise, the weight update happens in all the network.

When arguments alpha and/or decay are zero, a faster optimized code is used.

With kernels sn28ite and sn28itenew, the possible presence of shared weights makes the computation more complex and slower. Function update-weight then calls the functions clear-acc , update-acc , update-delta and update-wght-only described in the next section. Three differences with the non iterative version are then important:

sval(i, j) = (1 - decay) * sval(i, j) + sdelta(i, j)



8.1.2.0.1.5.2. Batch Gradient Implementation for Quasi-Linear Units.


Batch gradient (also called true gradient) consists in updating the weights after the presentation of several patterns. A new sfield, s-acc accumulates the contribution of each pattern. This field is only present whith kernels sn28ite and sn28itenew.

The sfield s-acc is also used for accumulating the contributions of several connections to the gradient of shared weights. In fact, shared weights and batch gradient use the same underlying mechanisms. Four functions then replace the usual call to update-weight :



8.1.2.0.1.5.2.0. (clear-acc)


Sets the s-acc field of all weights to zero.
sacc = 0



8.1.2.0.1.5.2.1. (update-acc [...l...])


Adds the contribution to the gradient of all incoming connections to layer l . When argument l is omitted, all existing connections are considered.
sacc(i, j) = sacc(i, j) + nval(i) * ngrad(j) * nepsilon(j)

Kernels sn28new and sn28itenew use sepsilon(i,j) instead of nepsilon(j) .



8.1.2.0.1.5.2.2. (update-delta alpha)


Updates the sfields s-delta of all weights, according to the value of the sfields sacc field and to the momentum parameter alpha :
sdelta = alpha * sdelta + sacc



8.1.2.0.1.5.2.3. (update-wghtonly decay)


Updates the weights, according to sfields s-delta and to the decay parameter decay .
sval = (1 - decay) * sval + sdelta



8.1.2.0.1.5.3. Second Order Algorithms Implementation for Quasi-Linear Units.


Adjusting the learning rates in a neural network is sometimes difficult and often tricky. A simple and efficient idea is used by the Newton algorithm: relying on the curvature of the cost function. The stronger the curvature, the smaller the learning rate in that direction.

The curvature is described by the matrix of the second derivatives of the cost, the Hessian matrix. Since this matrix has a huge number of coefficients, we only compute the diagonal coefficients.

The weight update equation then becomes

W =  W  -  Gradient of Cost on W * Inverse( Hessian of Cost on W )

This simple approach, however, is plagued by several problems:

W =  W - sepsilon * Gradient of Cost on W / (mu + |ssigma|)

We have replaced our learning rate problem by three new parameters. These new parameters however are much more robust: we always use the same values although more refined values may prove more efficient:

sepsilon = 0.0005
gamma = mu = 0.05

This scheme is further complexified by two important additional factors:

These algorithms are implemented by kernels sn28new and sn28itenew. Two sets of new functions are required for computing the derivatives and for updating the weights.



8.1.2.0.1.5.3.0. (update-ggradient dnlf ddnlf gamma...layers...)


This function first computes the nfields n-ggrad of units in layers layers . It then computes the sfields s-sigma of all incoming connections to layers layers. Argument ddnlf is the second derivative of the NLF used for that specific layer. Argument dnlf is the first derivative of the NLF.
For each unit i in layers:
  nsqbacksum(i) =  Sum on k ([sval(i, k)]**2 * nggrad(k))
  nggrad(i) = nbacksum(i) * DDNLF(nsum(i)) + nsqbacksum(i) * [ DNLF(nsum(i)) ]**2
  ssigma(j, i) = gamma * nggrad(i) * [nval(j)]**2 + (1 - gamma) * ssigma (j, i)

This computation requires that the fields n-ggrad of all downstream units are correct. All the fields n-ggrad and s-sigma thus are computed during a single backward reccurrence similar to the usual back-propagation.

Remark:



8.1.2.0.1.5.3.1. (update-lmggradient dnlf gamma...layers...)


This function computes the Levenberg-Marquardt approximation of s-sigma . This function takes no argument ddnlf for the second derivative of the NLF because the Levenberg-Marquardt approximation consists in ignoring this second derivative!

The computation is similar to that of update-ggrad except that the nfield n-ggrad is computed according to the following formula:

nggrad(i) = nsqbacksum(i) * [DNLF(nsum(i))]2



8.1.2.0.1.5.3.2. (clear-hess)


This function is similar to clear-acc . It clears the sfield s-hess in each weight.
shess = 0



8.1.2.0.1.5.3.3. (update-hess [...l...])


Accumulates the sfield s-sigma of each connection into the sfield s-hess of the corresponding weight. When arguments l are present, only the incoming connections to layers l are processed.
For each connection:
shess = shess + ssigma



8.1.2.0.1.5.3.4. (hessian-scale mu)


This function is usually inserted before update-delta and update-wghtonly . It scales field s-acc according to the ``second derivative'' stored in field s-hess and to the parameter mu . The usual functions update-delta and update-wghtonly then perform the desired weight update .
For all weights:
sacc = sacc / (mu + |shess|)



8.1.2.0.1.5.3.5. (update-w-newton alpha decay mu [...layers...])


This function is similar to update-weight, but updates the weights according to the quasi-Newton equation. Actually, this function calls in sequence:
 (clear-acc)
 (clear-hess)
 (update-acc...layers...)
 (update-hess...layers...)
 (hessian-scale mu)
 (update-delta alpha)
 (update-wghtonly decay)

The performance remarks discussed with function update-weight function are even more valid. With kernel sn28itenew, function update-w-newton becomes very slow if you specify argument layers. It is much better to use the basic functions themselves.

See: (update-weight alpha decay [... l ...])




8.1.2.0.1.5.4. Conjugate Gradient Implementation for Quasi-Linear Units.


Conjugate Gradient is a powerful batch optimization algorithm which outperforms all other algorithms implemented in SN when looking for a high optimization accuracy. This algorithm is especially suitable to function approximation tasks.

This state-of-the-art implementation of the conjugate gradient method rely on partial line searches based on the analytical computation of the cost function curvature. Such partial line searches are allowed by a self-restarting Hestenes-Stiefel formula for computing the conjugate directions.



8.1.2.0.1.5.4.0. (update-deltastate dnlf...layers...)


Function update-deltastate is the basic component of the computation of the Gauss Newton approximation to the curvature information. This computation is performed during a single forward propagation. If the connection field s-delta contains the search direction, function update-deltastate stores in fields n-grad the derivative of the cell state with respect to a weight change along the search direction.

For each cell i in layers layers, function update-deltastate updates the gradient field n-grad according to the following formula:

ngrad(i) = DNLF(nsum(i)) isu(j;; sdelta(j,i) nval(j) + sval(j,i) ngrad(j))



8.1.2.0.1.5.4.1. (update-wghtonly-conjgrad curvature)


Assuming that connection field s-acc contains the gradient, that connection field s-delta contains the search direction, and that argument curvature is the second derivative along the search direction, function update-wghtonly-conjgrad performs a Newton step along the search direction. This is the main component of the partial line search.

This function updates the weight vector according to the following formula:

sval = sval + f(sdelta . sacc;curvature) sdelta



8.1.2.0.1.6. Functions for Implementing Euclidian Units.


Instead of computing:
f( Sum on i (wi xi) )

euclidian unit compute a function of the distance between the vector xi and the weights wi :

f( Sum on i (wi - xi)**2 )

Euclidian units are useful for implementing several algorithms, like Kohonen's map, Learning Vector Quantization, Radial Basis Functions, K means and Nearest Neighbour. This implementation in Netenv provides an alternative to KNNenv.

A few additional functions whose name contains "nn" implement these units in SN2.8. These functions just replace their standard counterparts for quasi-linear units. An euclidian unit in SN is the same object than a quasi-linear unit. The only difference is the use of a different set of functions for computing their states, gradients and weights.



8.1.2.0.1.6.0. Propagation Implementation for Euclidean Units.




8.1.2.0.1.6.0.0. (update-nn-sum [theta]...layers...)


This is the euclidian counterpart of update-weighted-sum . It computes the nfield n-sum of each unit in layers according to the following formula:
nsum (i) =  Sum on j ( sval(j, i) - nval(j) )**2

Argument theta is the standard deviation of an optional gaussian noise added during the computation.



8.1.2.0.1.6.0.1. (update-nn-state [theta] nlf...layers...)


This is the counterpart of update-state . It first calls update-nn-sum then calls update-state-only computing thus field n-val of each euclidian unit in layer layers .



8.1.2.0.1.6.1. Gradient Descent Algorithms Implementation for Euclidean Units.


Back propagating through euclidian units is essentially identical to back-propagating through quasi-linear units. The equations however are slightly different and are implemented by new SN2.8 functions.



8.1.2.0.1.6.1.0. (update-nn-back-sum...layers...)


This is the counterpart of update-back-sum . It computes the field n-backsum of units in layers according to the value of field n-grad of their downstream units. These downstream units are assumed to be Euclidian.
nbacksum(i) = 2 * Sum on k ( ngrad(k) * [nval(i) - sval(i,k)] )

Units in layers do not need to be actually euclidian units. Their downstream units however have to be euclidian units.



8.1.2.0.1.6.1.1. (update-nn-gradient dnlf...layers...)


This is the counterpart of update-gradient. It computes the field n-grad by calling update-nn-back-sum and update-gradient-only .

8.1.2.0.1.6.1.2. (update-nn-weight alpha decay...layers...)


This is the counter part of update-weight . It updates the weights for all incoming connections of units in layers . These units are assumed to be euclidian.
For each unit i in layers,
 For each unit  j connected to i:
   sdelta (i,j) = alpha * sdelta(j,i) + (1 - alpha) * ngrad(i) * (sval(j,i) - nval(j)) * nepsilon(i)
   sval(i,j) = (1 - nepsilon(i) * decay) * sval(j, i) + sdelta(j, i)

With kernels sn28ite or sn28itenew, this function has the same limitations as function update-weight . It is then better to use explicitely clear-acc , update-nn-acc , update-delta and update-wghtonly .

See: (update-weight alpha decay [... l ...])




8.1.2.0.1.6.1.3. (update-nn-acc...layers...)


Adds the contribution of the incoming connections of units in layers to the gradient according to the following formula. This function is only available with kernels sn28ite and sn28itenew.
sacc(j,i) = sacc(j, i) + ngrad(i) * ( sval(j, i) - nval(j) ) * nepsilon(i)



8.1.2.0.1.6.2. Second Order Algorithms Implementation for Euclidean Units.


Similarly, a few functions allow the computation of the second derivative in order to use the Quasi Newton or Levenberg Marquardt algorithms.



8.1.2.0.1.6.2.0. (update-nn-ggradient dnlf ddnlf gamma...layers...)


This is the counterpart of update-ggradient . It computes the field n-ggrad according to the following formula:
nsqbacksum(i) = 2 * Sum on k ( 2 * (sval(i, k) - nval(i))**2 * nggrad(k) - ngrad(k) )
nggrad(i) = nsqbacksum(i) * [DNLF(nsum(i))]2 - nbacksum(i) * DDNLF(nsum(i))

The sfields s-sigma of each connection are then updated:

ssigma(i,j) = (1 - gamma) * ssigma(i, j) 
    + gamma[ 2 * nggrad(j) * (sval(i, j) - nval(i))**2 - ngrad(j) ]



8.1.2.0.1.6.2.1. (update-nn-lmggradient dnlf...layers...)


This is the counterpart of function update-lmggradient . It behaves like function update-nn-ggradient but performs the Levemberg-Marquardt approximation:
nsqbacksum(i) = 2 * Sum on k ( 2 * (sval(i, k) - nval(i))**2 * nggrad(k) - ngrad(k) )
nggrad(i) = nsqbacksum(i)  [DNLF(nsum(i))]**2 
ssigma(i,j) = (1 - gamma) * ssigma(i, j) + gamma[ 2 * nggrad(j) * (sval(i, j) - nval(i))**2 - ngrad(j) ]



8.1.2.0.1.6.2.2. (update-nn-w-newton alpha decay mu...layers...)


This is the newton weight update for euclidian cells. It calls in sequence:
 (clear-acc)
 (clear-hess)
 (update-nn-acc...layers...)
 (update-hess...layers...)
 (hessian-scale mu)
 (update-delta alpha)
 (update-wghtonly decay)

Like update-weight this function is much slower than calling the seven above functions.

See: (update-w-newton alpha decay mu [... layers ...])
See: (update-weight alpha decay [... l ...])




8.1.2.0.1.7. Sequencement in Netenv.


In order to make clear how to use these functions, let us consider a simple example. A network has 4 layers, layer1 to layer4 ; each layer uses its own nlf2 to nlf4 activation functions. In addition, layer3 is composed of euclidian units:
layer1  Input
layer2  Quasi Linear (nlf2, dnlf2, ddnlf2)
  
layer3  Euclidian    (nlf3, dnlf3, ddnlf3)
  
layer4  Quasi Linear (nlf4, dnlf4, ddnlf4)
  
Let us consider an additional layer <des> which contains desired states for
layer <layer4>.
The standard backpropagation algorithm thus is
;present a pattern into layer layer1
;forward-prop
 (update-state nlf2 layer2)
 (update-nn-state nlf3 layer3)
 (update-state nlf4)
; present a desired output into layer des
; backward-prop
 (init-grad-lms dnlf4 layer4 des)
 (update-gradient dnlf3 layer3)
 (update-nn-gradient dnlf2 layer2)
; update-weights
 (update-weight 0 0 layer4 layer2)
 (update-nn-weight 0 0 layer3)

With kernels sn28ite and sn28itenew, functions update-weight and update-nn-weight are very inefficient when they do not operate on the whole network. This is the case here. The last two lines thus are better replaced by:

; update-weights (sn28ite or sn28itenew)
 (clear-acc)
 (update-acc layer4 layer2)
 (update-nn-acc layer3)
 (update-delta 0)
 (update-wghtonly 0)

And finally, if we use the Levemberg-Marquardt algorithm, these lines become:

; second backward pass (newton)
 (update-lmggradient dnlf4 0.05 layer4)
 (update-lmggradient dnlf3 0.05 layer3)
 (update-nn-lmggradient dnlf2 0.05 layer2)
; weight update (newton)
 (clear-acc)
 (clear-hess)
 (update-acc layer4 layer2)
 (update-nn-acc layer3)
 (update-hess)
 (hessian-scale)
 (update-delta 0)
 (update-wghtonly 0)



8.1.2.0.2. High Level Functions in Netenv.


This library of high level functions is automatically loaded on startup. It provides an easier way to create and run networks in SN2.8. Convenient functions are defined for quickly creating a network, running the back-propagation algorithm and displaying the results. Hooks have been managed for calling user-defined algorithm, data processing and display routines.



8.1.2.0.2.0. Overview of High Level Functions in Netenv.


We suggest the reader to browse the file "sn28/lib/netenv.sn" which defines most functions in the library. This file is rather small and has a simple structure. Understanding it is a good way to use SN2.8 more efficiently. There are four families of functions in this file:

These four functions in turn make heavy use of eight simpler functions:

 forward-prop  do-compute-err
 backward-prop  backward-2-prop
 do-update-weight  do-update-weight-newton
 do-update-acc  acc-update-hess

These functions are automatically defined when function define-net is called, either manually by the user or automatically during the network definition (the usual network definition function, build-net , calls this function).

By changing the definition of these functions, you can create new connectionist models. For instance, both the kMeans and LVQ algorithms have been implemented by redefining these functions (see file "sn28/examples/netlvq/lvq.sn" ).



8.1.2.0.2.1. Creating a Network.




8.1.2.0.2.1.0. (build-net lunits llinks)
(netenv.sn)


Function build-net is a high level function for building networks. It provides an easy way for creating multilayer perceptrons. It also sets up all the data structures needed by the library.
(alloc-net 50 100)
= 4400
? (build-net
  ; create the cells
   '((input 4)    ; input layer
     (hidden 2)    ; a hidden layer
     (output 4) )  ; output layer
  ; make connections
   '((input hidden)  ; connect input  layer to the hidden layer
     (hidden output))) ; connect hidden layer to the output layer
= ( ( ) ( ) )
?  input
= ( 1 2 3 4 )
? input-layer
= ( 1 2 3 4 )
? hidden
= ( 5 6 )
? nnum  ; number of units
= 15
? snum  ; number of connections
= 22

A variable net-struc is set for backward compatibility purposes. It contains some additional information about the structure of the network.



8.1.2.0.2.1.1. (build-net-nobias lunits llinks)
(netenv.sn)


This function is quite similar to build-net . It does not connect however the allocated cells to the bias unit. This function is especially useful for creating shared weight nets with constraints on the biases.



8.1.2.0.2.1.2. (make-input l)
(netenv.sn)


This functions stores layer l into variable input-layer . It should be called whenever this layer are redefined or modified. This functions is automatically called by build-net and build-net-nobias .



8.1.2.0.2.1.3. (make-output l)
(netenv.sn)


This functions stores layer l into variable output-layer . It should be called whenever this layer is redefined or modified. This functions is automatically called by build-net and build-net-nobias .



8.1.2.0.2.1.4. (make-desired l)
(netenv.sn)


This functions stores layer l into variable desired-layer . It should be called whenever this layer is redefined or modified. This functions is automatically called by build-net and build-net-nobias .



8.1.2.0.2.1.5. (define-net arg)
(netenv.sn)


This function defines the eight forward and backward propagation functions. This function is automatically called by build-net and build-net-nobias . Argument arg usually is a list of single element lists. Each single element list contains the name of a layer.

Remark: A limited support is offered by build-net , build-net-nobias and define-net for defining specific activation functions per layer. This feature is described in section 3.5.3.



8.1.2.0.2.2. Defining the Examples in Netenv.


The library handles the examples using functions present-pattern and present-desired . These functions store example number n in the input layer and the desired layer. The user can redefine these functions for implementing pre-processing or for calling a more refined database management.

The default functions present-pattern and present-desired just store the n -th row of matrices pattern-matrix or desired-matrix into the input layer or into the desired layer.



8.1.2.0.2.2.0. (ensemble pmin pmax)
(netenv.sn)


Defines the training set as being composed of examples number pmin to pmax inclusive. These bounds are stored in the global variables patt-min and patt-max .
See: patt-min patt-max




8.1.2.0.2.2.1. <patt-min patt-max
[VAR] (netenv.sn)


These two variables store the upper lower bounds of the training set.

You should call function ensemble instead of modifying directly these variables.



8.1.2.0.2.2.2. (test-set pmin pmax)
(netenv.sn)


Defines the test set as being composed of examples pmin to pmax inclusive.

Examples in the training set are used for learning; examples in the test set are only used for evaluating the performance of the neural network on fresh data. This measure gives an idea of the generalization ability of the network.

Function test-set also prints a pessimistic evaluation of a 90% confidence interval on the accuracy of the performance measure. This evaluation is computed with function hoeffding according to the value of variable confidence.

The bounds of the test set are stored in the global variables tpatt-min and tpatt-max .

See: tpatt-min tpatt-max




8.1.2.0.2.2.3. tpatt-min tpatt-max
[VAR] (netenv.sn)


You should call function test-set instead of directly modifying these variables.



8.1.2.0.2.2.4. (hoeffding tau eta n)
(netenv.sn)


Computes a bound epsilon of the difference between the empirical average of n independant identically distributed positive random variables Xi and their mean m with confidence 1 - h (1-eta) . Argument tau is the maximal value of variables Xi .

The bound is computed according to Hoeffding's formula:

P ( | mean - 1/n * Sum on i (Xi) | ) > epsilon )
  <  2 * exp( -2 * n * e**2)  = 1 - eta 



8.1.2.0.2.2.5. (present-pattern l n)
(netenv.sn)


This function is called whenever the library needs a new example. It must load the fields n-val of layer l with the input part of example number n .

This function is often redefined by the user. The default function just loads the n -th row of the 2-dimensionnal matrix pattern-matrix into layer l using function get-pattern and adds a gaussian noise of standard deviation input-noise .



8.1.2.0.2.2.6. input-noise
[VAR] (netenv.sn)


Noise level added to the input patterns before any network calculations. The computation is faster when input-noise is set to zero. This is the default value. A non zero value (0.1 for example ) is sometimes used during training in order to get a better generalization.



8.1.2.0.2.2.7. (present-desired l n)
(netenv.sn)


This function is called by the library whenever a new example is needed. It must load the fields n-val of layer l with the desired output part of example number n .

This function is often redefined by the user. The default function just loads the n -th vector of the 2-dimensionnal matrix desired-matrix into layer l using function get-pattern .



8.1.2.0.2.3. Accessing Internal Variables in Netenv.


When a network is created, space is allocated for recording numerical values on the units and connections of the network graph.

Internal variables associated with the units are called ``nfields'', an acronym for ``neuron fields''. Internal variables associated with the connections are called ``sfields'', an acronym for ``synapse fields''.

Fields are highly specialized: in order to increase speed, many computational functions are working on implicit fields. In addition, all possible fields are not available on all versions of SN2.8. There is no reason indeed to slow down a regular network by updating the fields required by a shared weigths network.



8.1.2.0.2.3.0. Accessing Neural Fields of Groups of Units.




8.1.2.0.2.3.0.0. Conversion between Neural Fields and Lists.




8.1.2.0.2.3.0.0.0. (mapncar f arg)
(netenv.sn)


This is a general function used by other functions.
(de state l (mapncar 'nval l))
= state

we can easily set or get the state of a list of units:

? (setq foo (newneurons 4))  ; allocate 4 units
= ( 1 2 3 4 ) 
? (state)                      ; get their state (with the bias unit state)
= ( 1 0 0 0 0 ) 
? (state '(2 3) 0.5)           ; set the state of units 2 and 3 to 0.5
= 0.5 
? (state foo)                ; display the states of layer foo
= ( 1 0 0.5 0.5 0 ) 
? (state 0.33)                 ; set the state of all the units to 0.33
= 0.33 
? (state)                      ; display the network state
= ( 1 0.33 0.33 0.33 0.33 )

And in fact, the following functions are already defined:



8.1.2.0.2.3.0.0.1. (state args)
(netenv.sn)


Set or get the states of a group of units (field n-val ).



8.1.2.0.2.3.0.0.2. (gradient args)
(netenv.sn)


Set or get the gradients of a group of units (field n-grad ).



8.1.2.0.2.3.0.0.3. (weighted-sum args)
(netenv.sn)


Set or get the total input of a group of units (field n-sum ).



8.1.2.0.2.3.0.0.4. (back-sum args)
(netenv.sn)


Set or get the backward sum of a group of units (field n-backsum ).



8.1.2.0.2.3.0.0.5. (epsilon args)
(netenv.sn)


Set or get the learning rate of a group of units (field n-epsilon ). With kernels sn28new and sn28itenew, this function returns the average of the learning rates attached to the weights of each unit.



8.1.2.0.2.3.0.0.6. (real-epsilon args)
(netenv.sn)


With kernels sn28new and sn28itenew only, this function gets the effective learning rate of a group of units. The effective learning rate is equal to:
effective_epsilon = epsilon / (mu + | sigma |)

where epsilon is the learning rate attached to the unit (or the weight) and sigma is the mean second derivative of the cost function with respect to the weights of the unit extracted from the sfield s-hess .



8.1.2.0.2.3.0.0.7. (sqback-sum args)
(netenv.sn)


With kernels sn28new and sn28itenew only, sets or gets the instantaneous second derivatives of the cost function with respect to the state of a group of units (field n-sqbacksum ).



8.1.2.0.2.3.0.0.8. (ggradient args)
(netenv.sn)


With kernels sn28new and sn28itenew only, sets or gets the instantaneous second derivatives of the cost function with respect to the total input of a group of units (field n-ggrad ).



8.1.2.0.2.3.0.0.9. (sigma args)
(netenv.sn)


With kernels sn28new and sn28itenew only, sets or gets the average of the second derivative of a group of units (field n-sigma ).



8.1.2.0.2.3.1. Conversion between Neural Fields and Matrices.


Several functions transfer data from a matrix to the n-val or sval field of units.

8.1.2.0.2.3.1.0. (get-pattern arr n l1...ln)


This function is used to transfer a pattern vector into the states of a list of units. The units to be loaded are defined by the lists l1 ... ln . Matrix arr must have 2 dimensions; n is the index of the vector that should be transfered. The sum of the length of the lists l1 ... ln should be equal to the length of the vector (i.e. the size of the matrix in the second dimension)
(dim my-patterns 10 5)     ; define a matrix with 10 vectors of dimension 5
(initialize-my-patterns)   ; put interesting data into the matrix
                           ; transfer elements (7 x) to units 1, 2, 3, 11, 12
(get-pattern my-pattern 7 '(1 2 3) '(11 12)) 



8.1.2.0.2.3.1.1. (get-pattern-2 arr l1...ln)


This function is also used to transfer a pattern vector into the states of a list of units. However it works slightly differently than get-pattern .

Matrix arr is transfered entirely into the units defined by l1 ... ln . It may have any number of dimensions. The sum of the length of the lists l1 ... ln should be equal to the number of elements in the matrix. This function is mainly designed for dealing with submatrices.

Function get-pattern-2 is slightly slower than get-pattern. However, it implements a more general way to transfer data from a matrix to a layer in the network.

; get pattern is now defined as 
(de get-pattern(arr n . l)
   (apply get-pattern-2 (cons (submatrix arr n ()) l)) )



8.1.2.0.2.3.2. Accessing Synaptic Fields of Groups of Units.




8.1.2.0.2.3.2.0. Conversion between Synaptic Fields and Lists.


The following functions are not meant for saving or restoring weights on disk but for temporarily saving a weight configuration in a lisp variable. The content of this variable can be used to restore the weight configuration afterwards. These functions are convenient for experimenting with several sets of parameters or for comparing different weight configurations.



8.1.2.0.2.3.2.0.0. (dump-weights)
(netenv.sn)


Returns a list of lists. Each list contains the weights of a particular unit. An empty list corresponds to unit with no incoming weights. The units are ordered according to their number.
; save the weight configuration into variable a-good-solution
? (setq a-good-solution (dump-weights))
=...



8.1.2.0.2.3.2.0.1. (rest-weights l)
(netenv.sn)


Loads the network weights with the content of the variable passed as argument.

Caution: This function does not create connections but just sets the weight values. The network structure must be exactly identical to the structure used when dump-weights has been called.

; restore the weights stored in variable a-good-solution
? (rest-weights a-good-solution)
= ()



8.1.2.0.2.3.2.0.2. (weight n)
(netenv.sn)


Returns the list of the incoming weights to cell n (field s-val ).



8.1.2.0.2.3.2.0.3. (set-weight n l)
(netenv.sn)


Sets the weights of unit n to the values contained in list l .



8.1.2.0.2.3.2.0.4. (deltaw n)
(netenv.sn)


Returns the list of the weight change of the incoming weights to cell n (field s-delta ).



8.1.2.0.2.3.2.0.5. (set-deltaw n l)
(netenv.sn)


Sets the delta's of the weights coming into unit n to the values contained in l .



8.1.2.0.2.3.2.1. Conversion between Synaptic Fields and Matrices.




8.1.2.0.2.3.2.1.0. (matrix-to-weight mat)


Transfers the elements of a mono-dimensional matrix mat into the network weights. This function is usually called by load-net .



8.1.2.0.2.3.2.1.1. (weight-to-matrix [ mat ])


Transfers the weights of the networks into a mono-dimensional matrix mat . When no suitable argument is supplied, a new matrix is created. This function is now called by save-net .



8.1.2.0.2.3.2.1.2. (matrix-to-acc mat)


Transfers the elements of a mono-dimensional matrix mat into the gradient accumulator s-acc for each weight in the network. This functions is available in kernels sn28ite and sn28itenew.



8.1.2.0.2.3.2.1.3. (acc-to-matrix [ mat ])


Returns a mono-dimensional matrix containing the gradient accumulator s-acc for each weight in the network. When provided, the optional argument mat specifies a destination matrix and saves the memory allocation time. This functions is available in kernels sn28ite and sn28itenew.



8.1.2.0.2.3.2.1.4. (matrix-to-delta mat)


Transfers the elements of a mono-dimensional matrix mat into the weight change field s-delta for each weight in the network. This functions is available in kernels sn28ite and sn28itenew.



8.1.2.0.2.3.2.1.5. (delta-to-matrix [ mat ])


Returns a mono-dimensional matrix containing the weight change field s-delta for each weight in the network. When provided, the optional argument mat specifies a destination matrix and saves the memory allocation time. This functions is available in kernels sn28ite and sn28itenew.



8.1.2.0.2.3.2.1.6. (matrix-to-hess mat)


Transfers the elements of a mono-dimensional matrix mat into the hessian curvature field s-delta for each weight in the network. This functions is available in sn28itenew only.



8.1.2.0.2.3.2.1.7. (hess-to-matrix [ mat ])


Returns a mono-dimensional matrix containing the hessian curvature field s-hess for each weight in the network. When provided, the optional argument mat specifies a destination matrix and saves the memory allocation time. This functions is available in sn28itenew only.



8.1.2.0.2.4. Weight Files in Netenv.


The weights (sfield s-val ) may be saved in files under several formats.

First, it is possible to store them under ascii or binary formats.

Secondly, it is possible to store them with (resp. without) information related to the neural net architecture, leading to a explanative (resp. blind) format.

Function load-net and merge-net recognize all these formats.



8.1.2.0.2.4.0. (save-net/merge str [ l ])


Saves the weight values of connections involving any two units in list l and connections from the bias unit (unit 0 ) to any unit in list l . When argument l is omitted, all units are considered. All the weights then are saved.

Function save-net stores the weights into file str with a binary explanative format. The first 12 bytes of the file are:



8.1.2.0.2.4.1. (save-ascii-net/merge str [ l ])


This function is similar to save-net/merge , but creates a human readable text file.

The first line of the file is a file header. It is composed of the letters ".WEI" , of the numbers of cells involved in the file and of the age of the network. Each connection is described on one line, consisting in the number of the upstream cell, the number of the downstream cell and the weight value.

Note that the celle 0 is always the threshold cell. After 160 iterations, a three layers XOR network might be saved as:

        .WEI    6 160
           0    3 -1.7688258
           0    4  2.2176821
           0    5  1.5779345
           1    3  2.3539493
           1    4  2.0589452
           2    3  0.8007546
           2    4  1.9538255
           3    5  1.5308707
           4    5 -1.4315528

In this file, unit 0 is the bias, units 1 and 2 are the input units, units 3 and 4 are the hidden units and unit 5 is the output unit.



8.1.2.0.2.4.2. (save-net str [ l ])
(netenv.sn)


When argument l is provided, this function works like save-net/merge and create an old format file. Argument l indicates indeed that the weights are stored for a subsequent use by function merge-net .

When argument l is omitted, this function saves the weights in file str as a regular mono-dimensional TLisp matrix using function save-matrix . This matrix just contains the weights by their order of creation.



8.1.2.0.2.4.3. (save-ascii-net str [ l ])
(netenv.sn)


When argument l is provided, this function works like save-ascii-net/merge and create an explanative format file. Argument l indicates indeed that the weights are stored for a subsequent use by function merge-net .

When argument l is omitted, this function saves the weights in file str using blind format which means a regular mono-dimensional TLisp matrix using function save-ascii-matrix . This matrix just contains the weights by their order of creation.



8.1.2.0.2.4.4. (load-net str)
(netenv.sn)


Loads a weight configuration saved with save-net from file str .

See: (merge-net str [ l ])




8.1.2.0.2.4.5. (merge-net str [ l ])


When a list l is given as argument, function merge-net loads the weights saved in file str and stores them in the fields s-val of the corresponding connections between units in list l .

When argument l is omitted, function merge-net returns the number of units involved by the weight file str . When argument l is omitted and when str is not an explanative format weight file, function merge-net returns () . This feature is used for sensing the file format (explanative vs. blind).

Function merge-net is usually called with the same list of units than the corresponding call to save-net . Using a different list lets you load the weights of a network into a part of another network and vice-versa.

Function merge-net never creates a connection. Connections have to be created first.

;; The following function is defined in the file <"netenv.sn">. 
;; It loads a weight file created by save-net  without arguments 
 
(de load-net (str)
  (merge-net str (range 1 (1-num))) )



8.1.2.0.2.5. Setting the Parameters in Netenv.




8.1.2.0.2.5.0. Initial Weights in Netenv.




8.1.2.0.2.5.0.0. (forget x)
(netenv.sn)


Sets all the weights of a network to a random value chosen between - x and x according to a uniform probability distribution.



8.1.2.0.2.5.0.1. (forget-inv x)
(netenv.sn)


Like forget but divides the bounds of the uniform distribution by the fanin of each unit.



8.1.2.0.2.5.0.2. (forget-sqrt x)
(netenv.sn)


Like forget but divides the bounds of the uniform distribution by the square root of the fanin of each unit.



8.1.2.0.2.5.0.3. (smartforget)
(netenv.sn)


Equivalent to (forget-sqrt 1) . This weight initialisation usually is a good choice with the default non-linear function.



8.1.2.0.2.5.0.4. (forget-layer x l)
(netenv.sn)


This function is essentially equivalent to forget . It reinitialize only the weights of incoming connections to layer l . The remaining weights are left unchanged.



8.1.2.0.2.5.0.5. (forget-inv-layer x l)
(netenv.sn)


This function is essentially equivalent to forget-inv . It reinitialize only the weights of incoming connections to layer l . The remaining weights are left unchanged.



8.1.2.0.2.5.0.6. (forget-sqrt-layer x l)
(netenv.sn)


This function is essentially equivalent to forget-sqrt . It reinitialize only the weights of incoming connections to layer l . The remaining weights are left unchanged.



8.1.2.0.2.5.1. Learning Parameters in Netenv.


It is rarely advisable to use the same ``learning rate'' in the whole network. This is of course possible with function epsilon . The following functions usually are a better choice:



8.1.2.0.2.5.1.0. (epsi x)
(netenv.sn)


Sets the learning rate of each unit (each connection with kernels sn28new and sn28itenew) to x divided by the number of inputs to the unit.



8.1.2.0.2.5.1.1. (epsi-sqrt x)
(netenv.sn)


Sets the learning rate of each unit (each connection with kernels sn28new and sn28itenew) to x divided by the square root of the number of inputs to the unit.



8.1.2.0.2.5.1.2. (mask-epsi s)
(connect.sn)


It is often useful to set small learning-rates on shared connections. Using mask-epsi rather than epsi-sqrt or epsi is often a good choice for setting the learning rates in a shared weights network.

Function mask-epsi sets the learning rate for each unit to s divided by the square root of the number of incoming connections and by the square root of the sharing count of the incoming connections.



8.1.2.0.2.5.1.3. (epsi-layer x l)
(netenv.sn)


This function is essentially similar to function epsi . However it sets only the learning rates for the units in layer l . The other learning rate are kept unchanged.



8.1.2.0.2.5.1.4. (epsi-sqrt-layer x l)
(netenv.sn)


This function is essentially similar to function epsi-sqrt . However it sets only the learning rates for the units in layer l . The other learning rate are kept unchanged.

Most other learning parameters are constant for all the network. They are stored in a couple of global variables.



8.1.2.0.2.5.1.5. decay
[VAR] (netenv.sn)


Decay factor. In each learning iteration, the weights are multiplied by ( 1 - epsilon.decay ) . This computation is disabled when decay is set to 0 . The simulation is slightly faster in this case. The default value is 0 .



8.1.2.0.2.5.1.6. theta
[VAR] (netenv.sn)


Noise level q . A zero-mean gaussian random variable with standard deviation theta is added to the total input of each unit after each state update. The state of a unit is thus equal to:
X(i) = f(A(i) + N(q))

Where A(i) is the total input to unit i and N(q) is a zero mean gaussian random variable with standard deviation q . This computation is disabled when theta is set to 0 . The simulation is slightly faster in this case. The default value is 0 . A non zero value can be used for:



8.1.2.0.2.5.1.7. alpha
[VAR] (netenv.sn)


Momentum factor a for back-propagation. The weight increment is computed with the following formula:
increment W(i,j)  :=  increment W(i,j)  -  epsilon * gradient of Cost on W(i,j)

This computation is disabled when alpha is set to 0 . The simulation is slightly faster in this case. The default value is 0 .



8.1.2.0.2.5.1.8. mu
[VAR] (netenv.sn)


With kernels sn28new and sn28itenew, variable mu is used for computing the effective learning rate in the approximated newton method. The effective learning rate of a unit is defined by:
Effective epsilon(i,j)  =  epsilon(i,j) / (mu + | sigma(i,j) |)



8.1.2.0.2.5.1.9. gamma
[VAR] (netenv.sn)


With kernels sn28new and sn28itenew, variable gamma controls the time constant used for computing the average second derivative sigma which is updated according to the rule:
sigma(i) := (1 - gamma) * sigma(i) + gamma * (d2 E / d W 2) * X(i)**2
where X(i)**2  is the square of the state of unit i.

See: (update-state-only nlf ... layers ...)
See: (update-weight alpha decay [... l ...])
See: (update-ggradient dnlf ddnlf gamma ... layers ...)
See: (update-w-newton alpha decay mu [... layers ...])




8.1.2.0.2.5.2. Choosing the Activation Functions in Netenv.




8.1.2.0.2.5.2.0. nlf, dnlf, ddnlf
[VAR] (netenv.sn)


These three variables contain the default NLF used by the library for all layers. They are set by the functions described below.

Since NLF objects have a functional behavior, writing (nlf x) then returns the value of the default NLF in x , writing (dnlf x) returns the value of its derivative in x and writing (ddnlf x) returns the value of its second derivative in x .



8.1.2.0.2.5.2.1. (nlf-tanh mx dmin [scale [offset]])
(netenv.sn)


Sets the non-linear function to a sigmoid of the following form:
f(x)  = scale * tanh( mx * x ) + offset + dmin * x

Since the hyperbolic tangent is an odd function taking values in range ]-1,1[, argument scale determines the amplitude, argument dmin the minimum value of the derivative (if dmin is not 0 the asymptotes are slanted), argument mx defines the gain or the inverse of the ``temperature'' and argument offset shifts the curve up or down.

Argument offset default to 0 . If argument scale is omitted, a scale is computed to ensure that f(1) = 1 and f(-1) = -1 .

The most common settings are the following:

The default non linear function for SN2.8 and its derivatives:

(nlf-tanh 0.666 0)



8.1.2.0.2.5.2.2. (nlf-lin dmin dmax [th])
(netenv.sn)


Set the non-linear function to a piecewise linear function. This odd function is made of three parts. The first part, between -th and th , is a line segment with a slope equal to dmax . The remaining parts are straight lines of slope dmin outside range [-th,th] . The three parts form a continuous curve.

Although this function is not differentiable, the computation of its derivative causes no problem except for two points. SN2.8 assumes that the derivative at these two boundaries are those of the central part. A typical piecewise non linear function: and its derivatives:

(nlf-lin  0.1  1.1)



8.1.2.0.2.5.2.3. (nlf-bell)
(netenv.sn)


Set the non linear function to a bell shaped function whose equation is:
f(x) = (1-x**2)**3  if -1<x<1
f(x) = 0  otherwise



8.1.2.0.2.5.2.4. (nlf-bin mx dmin [scale [offset]])
(netenv.sn)


Set the non linear function to a binary threshold with a smooth pseudo-derivative similar to the derivative obtained by the nlf-tanh function. The nlf-bin function, here (nlf-bin 0.666 0 1) , is made of a binary step. However, a smooth derivative is used during the backward pass.



8.1.2.0.2.5.2.5. (nlf-spline x y)
(netenv.sn)


Set the default non-linear function (defined by nlf , dnlf , ddnlf ) to a cubic spline interpolated between points whose coordinates are in x and y . Computing a spline interpolation is often faster than computing a transcendent function. Here is a code for tabulating the standard NLF as a spline:
? (nlf-tanh 0.6666 0)                  ; sets a standard function
= 1.14393
? (let* ((x (range -4 4 0.05))
         (y (all ((i x)) (nlf i))))
    (nlf-spline x y) )                 ; copy it with a cubic spline
= 400



8.1.2.0.2.5.2.6. NLF support in build-net.


The library allows the use of different NLFs per layer. In the build-net instruction, the layer specification may be completed by a call to function of the type nlf-XXX .
(build-net
 '((input 4)
   (hidden 2)
   (output 4 (nlf-lin 1 1)))
 '((input hidden)
   (hidden output)) )

In that example, layer hidden uses the default NLF, defined by variables nlf , dnlf , ddnlf . This default NLF can be changed by subsequent calls to a function nlf-XXX . Layer output however uses a fixed linear function, defined by (nlf-lin 1 1) . Subsequent use of functions nlf-XXX will never affect the activation function of layer output.



8.1.2.0.2.6. Performance Evaluation in Netenv.




8.1.2.0.2.6.0. Single Example Performance in Netenv.




8.1.2.0.2.6.0.0. (test-pattern n)
(netenv.sn)


Tests the response to pattern n . This function presents the pattern number n using present-pattern , propagates the states forward using forward-prop , computes the error on that pattern, stores its average scalar value into local-error using do-compute-error , decides if the answer is correct using classify and finally calls a display function.

Function test-pattern is defined as follows:

(de test-pattern (n)
 ; transfers pattern n into the input-layer
 (present-pattern input-layer n)
 ; propagates the states forward: computes the outputs
 (forward-prop net-struc)
 
 ; gets the desired outputs
 (present-desired desired-layer n)
 
 ; computes the error for this pattern,
 ; and stores it into local-error
 (do-compute-err output-layer desired-layer)
 
 ; decides if the result in good-answer
 ; and stores the result in good-answer
 (setq good-answer (classify n)
 
 ; call displaying functions
 (process-pending-events)
 (disp-perf-iteration n)
 
 ; returns the error
 local-error)

Functions forward-prop and do-compute-error are defined at network creation time by the function define-net . These functions are described in the next section.

Function classify is defined by the set-class-XXX functions. It defines how to measure the quality of the systems. The default classify function returns t if the outputs and the desired outputs have the same sign. This is too restrictive for most applications: you should always call one of the set-class-XXX functions in order to define your classification criterion.



8.1.2.0.2.6.0.1. local-error
[VAR] (netenv.sn)


This global variable contains a flag indicating the success of the network on the current pattern. It is set by functions test-pattern , basic-iteration and basic-newton-iteration .



8.1.2.0.2.6.0.2. good-answer
[VAR] (netenv.sn)


This global variable contains the average scalar training error for the current pattern. It is set by functions test-pattern , basic-iteration and basic-newton-iteration .



8.1.2.0.2.6.1. Multiple Example Performance in Netenv.




8.1.2.0.2.6.1.0. (perf [n1 n2])
(netenv.sn)


Without arguments, this function computes the performance of the network on the training set. When integer arguments n1 and n2 are given, the performance is evaluated on the patterns whose indices are between n1 and n2 inclusive. Function perf temporary cancels the input-noise settings.

Function perf prints the average error over the examples and over the outputs as well as the percentage of good recognition (as defined by the function classify). This function also calls the function save-perf wich saves the result of the measure in the file whose name is contained in the global variable perf-file .



8.1.2.0.2.6.1.1. (performance n1 n2)
(netenv.sn)


Lower level performance evaluation function called by perf . Function performance does not affect the input-noise settings. If input-noise is not 0 , the measured performance may thus be different if you call performance twice.

This function computes the average error over patterns between n1 and n2 and stores the value into global variable global-error . It also computes the percentage of good answers and stores it into global variable good-an-percent . The value of global-error is returned.



8.1.2.0.2.6.1.2. global-error, good-an-percent
[VAR] (netenv.sn)


These variables contain the results of the last call to performance.



8.1.2.0.2.6.2. Performance File in Netenv.




8.1.2.0.2.6.2.0. perf-file
[VAR] (netenv.sn)


This variable defines an optional performance file.

This file is used for storing information about the main neural function calls and results.

perf,
epsi-layer, epsi-sqrt-layer, forget-layer, forget-sqrt-layer,
epsi, epsi-sqrt, mu, forget, forget-sqrt, ensemble, test-set.

Example:

;;; ============ NEW PERFORMANCE
;;; ============ NEW NETWORK
;;; (forget-sqrt 1)
;;; (epsi-sqrt 0.1)
;;; (ensemble 0 319)
;;; (test-set 320 479)
;;; trun-action using function 'learn'...
     0 0.53568070   9.69 ;{0-319}
     0 0.53332978  10.00 ;{320-479}
   320 0.07470918  93.75 ;{0-319}
   320 0.08033946  91.25 ;{320-479}
   640 0.05402946  97.50 ;{0-319}
   640 0.06082892  95.62 ;{320-479}
   960 0.04978141  98.75 ;{0-319}
   960 0.05853230  96.88 ;{320-479}
  1280 0.04746896  99.06 ;{0-319}
  1280 0.05897417  97.50 ;{320-479}
;;; (epsi-sqrt 0.05)
  1600 0.03919556 100.00 ;{0-319}
  1600 0.05168496  97.50 ;{320-479}
  1920 0.03774762 100.00 ;{0-319}
  1920 0.05145141  97.50 ;{320-479}
;;; (epsi-sqrt 0.025)
  2240 0.03479438 100.00 ;{0-319}
  2240 0.04916938  98.12 ;{320-479}
  2560 0.03418482 100.00 ;{0-319}
  2560 0.04903031  98.12 ;{320-479}
  2880 0.03369421 100.00 ;{0-319}
  2880 0.04896870  98.12 ;{320-479}
  3200 0.03326655 100.00 ;{0-319}
  3200 0.04892774  98.12 ;{320-479}
;;; break



8.1.2.0.2.6.2.1. (save-perf [n1 [n2...]])
(netenv.sn)


This function takes numeric arguments which will be written on a single line in a file whose name is in global variable perf-file .

Function save-perf is called by function perf .



8.1.2.0.2.6.2.2. (plot-perf-file fname)
[VAR] (netenv.sn)


Function plot-perf-file creates a performance plot and an error plot of the informations stored in performance file fname .



8.1.2.0.2.6.3. Classification Modes.




8.1.2.0.2.6.3.0. (class-lms pn margin)
(netenv.sn)


Tests if the mean square distance between the state of desired-layer and the state of output-layer is smaller than margin. Argument pn is not used.



8.1.2.0.2.6.3.1. (set-class-lms margin)
(netenv.sn)


Defines the current classification function to be class-lms with margin margin .



8.1.2.0.2.6.3.2. (class-sgn <pn tmin tmax)
(netenv.sn)


Decides of the correctness of the output based on the region where the output state is compared to the desired output. Arguments tmin and tmax are two thresholds which defines three regions: less than tmin , between tmin and tmax and above tmax .

The output is considered correct if all the states of the units in output-layer are in the same region as the states of units in desired-layer . This lisp function is slower than class-quadrant . Argument pn is not used.



8.1.2.0.2.6.3.3. (set-class-sgn tmin tmax)
(netenv.sn)


Set the classification function to class-sgn with thresholds tmin and tmax . If both tmin and tmax are 0 , the faster function class-quadrant is used.



8.1.2.0.2.6.3.4. (class-max pn)
(netenv.sn)


Tests if the most active unit has the same rank in output-layer and desired-layer . Argument pn is not used.



8.1.2.0.2.6.3.5. (set-class-max)
(netenv.sn)


Set the classification function to class-max .



8.1.2.0.2.6.3.6. (class-hamming pn margin)
(netenv.sn)


Tests if all the differences between the states of the output layer and the corresponding desired states are less than margin. Argument pn is not used.



8.1.2.0.2.6.3.7. (set-class-hamming margin)
(netenv.sn)


Set the classification function to class-hamming .



8.1.2.0.2.6.3.8. (class-quadrant pn)
(netenv.sn)


Tests if the output vector and the desired vector are in the same quadrant. Argument pn is not used.



8.1.2.0.2.6.3.9. (set-class-quadrant)
(netenv.sn)


Set the classification function to class-quadrant .



8.1.2.0.2.6.3.10. (set-class-nil)
(netenv.sn)


Sets the classification function to an empty function. This should be used whenever possible to speed up the simulation.



8.1.2.0.2.6.3.11. (classify pn)
(netenv.sn)


Main classification function. Function classify calls one of the previous classification functions. It can be user redefined.



8.1.2.0.2.7. Training the Network with Online Algorithms in Netenv.




8.1.2.0.2.7.0. Choice of the Patterns in Netenv.


On line algorithms update the weights after each pattern presentation. The order of presentation thus becomes important. The following functions define how the library presents the patterns.

Learning functions learn and run use the function next-pattern to choose the next pattern to present to the network. The user can redefine next-pattern to fit his needs or select a predefined method.



8.1.2.0.2.7.0.0. (next-pattern n)
(netenv.sn)


Choose a new pattern in the training set and returns its index. Argument n is the index of the current pattern.



8.1.2.0.2.7.0.1. current-pattern
[VAR] (netenv.sn)


Contains the index of the current pattern.



8.1.2.0.2.7.0.2. (set-nextpat-seq)
(netenv.sn)


Redefines next-pattern to be equivalent to nextpat-seq . The following pattern is chosen regardless of the result of the previous classification.



8.1.2.0.2.7.0.3. (set-nextpat-stay)
(netenv.sn)


Redefines next-pattern to be equivalent to nextpat-stay .

The same pattern is kept until the network returns the right answer (until function classify returns a positive result). The next pattern is then chosen using function next-choice .



8.1.2.0.2.7.0.4. (next-choice n)
(netenv.sn)


This function chooses the next pattern when a new pattern is needed.



8.1.2.0.2.7.0.5. (set-nextcho-seq)
(netenv.sn)


This function redefines next-choice so that the patterns are chosen sequentially in the training set.



8.1.2.0.2.7.0.6. (set-nextcho-rnd)
(netenv.sn)


This function redefines next-choice so that the next pattern is chosen at random in training set with uniform probability.



8.1.2.0.2.7.1. Forward and Backward Propagation Functions in Netenv.


Like test-pattern , the learning functions make heavy use of eight elementary functions defined at network creation time by define-net . Tampering with the definition of these functions is the preferred way to explore new algorithms.



8.1.2.0.2.7.1.0. (forward-prop ns)
[DE]


This function propagates the states forward in the network. It is usually defined as a sequence of calls to update-state . Argument ns is an inheritance of primitive versions of SN2.8 and is never used.



8.1.2.0.2.7.1.1. (forward-2-prop ns)
(netenv.sn)


Function forward-2-prop is defined at network creation time by define-net and plays a role similar to that of functions forward-prop , backward-prop , backward-2-prop , etc... Function forward-2-prop just calls function update-deltastate on each layers of the network.

This function is used by the conjugate gradient algorithms.



8.1.2.0.2.7.1.2. (do-compute-err <out des)
[DE]


This function computes the training error between the output layer out and the desired layer des , stores the average scalar error in the global variable local-error and initializes the gradients by using init-grad .



8.1.2.0.2.7.1.3. (backward-prop ns)
[DE]


This function propagates the gradients backward in the network. It is usually defined as a sequence of calls to update-gradient . Argument ns is unused.



8.1.2.0.2.7.1.4. (do-update-weight ns)
[DE]


Given the gradients and the states, this function performs the weight update. It is usually defined as a single call to update-weights . With kernels sn28ite and sn28itenew, this function may be defined as:
(de do-update-weight (ns)
 (clear-acc)
 (do-update-acc ns)
 (update-delta alpha)
 (update-wght only decay))
 (do-update-acc ns)

With kernels sn28new and sn28itenew only, this function loops over the connections and adds their contribution to the gradient in the field s-acc of the weights. Usually defined as a single call to update-acc , this function may be redefined as a sequence of calls to update-acc .



8.1.2.0.2.7.1.5. (backward-2-prop ns)
[DE]


With kernels sn28new and sn28itenew only, this function propagates backward the second derivatives of the cost (fields n-sqbacksum and n-ggradient ). This function is usually defined as a sequence of calls to update-gradient or update-lmggradient according to the status of the flag lev-mar .



8.1.2.0.2.7.1.6. (do-update-weight-newton m)
[DE]


With kernels sn28new and sn28itenew only, this function updates the weights given the states, gradients and second derivatives of the cost function. It is usually defined as a single call to update-w-newton .

With kernel sn28itenew however, this function might be redefined as:

(de do-update-weight (ns)
 (clear-acc)
 (clear-hess)
 (do-update-acc ns)
 (do-update-hess ns)
 (hessian-scale mu)
 (update-delta alpha)
 (update-wghtonly decay))
 (do-update-hess ns)

Accumulates the contributions of each connection to the second derivatives (in fields s-sigma ) into the field s-hess of the weights. This is usually done by a single call to function update-hess .



8.1.2.0.2.7.2. Online Gradient.




8.1.2.0.2.7.2.0. (init-grad out des)
(netenv.sn)


Sets the gradients (fields n-backsum and n-grad ) of units in out using the desired state stored in units of list des . This function uses either init-grad-lms or init-grad-thlms depending on the active mode. It returns the total output error.



8.1.2.0.2.7.2.1. (set-cost-lms)
(netenv.sn)


Defines the current error measure to be the classical mean-square:
C =  0.5 * | DesiredOutput - ActualOutput |**2

Function init-grad then calls init-grad-lms .



8.1.2.0.2.7.2.2. (set-cost-thlms threshold)
(netenv.sn)


Defines the current error measure to be the thresholded mean-square.

If the actual output is larger than the desired output and the desired output is larger than threshold or if the actual output is smaller than the desired output and the desired output is smaller than threshold then the error is set to 0 and no gradient is propagated. Otherwise the error is the classical mean-square error.

Function init-grad then calls init-grad-thlms.



8.1.2.0.2.7.2.3. (basic-iteration n)
(netenv.sn)


This function performs an elementary learning iteration. It first gets an input pattern using the redefinable function present-pattern . Then it propagates the states using forward-prop , gets the desired values using present-desired and sets the variables local-error and good-answer using functions do-compute-err and classify .

The gradients then are propagated backward using function backward-prop and the weights are updated with function do-update-weight .

Function basic-iteration also is in charge of incrementing the variable age and calling a few display functions. It is defined as follows:

(de basic-iteration (n)
 ; get a pattern 
 (present-pattern input-layer n)
 ; propagates the states
 (forward-prop net-struc)
 ; get the desired output
 (present-desired desired-layer n)
 ; computes local-error and initializes the gradients
 (do-compute-err output-layer desired-layer)
 ; propagates the gradients
 (backward-prop net-struc)
 ; update the weights
 (do-update-weight net-struc)
 ; increments "age"
 (incr age)
 ; computes "good answer"
 (setq good-answer (classify n))
 ; display functions
 (process-pending-events)
 (disp-basic-iteration)
  local-error)

The following learning function provides convenient interfaces for calling function basic-iteration .



8.1.2.0.2.7.2.4. (learn n)
(netenv.sn)


Performs n learning iterations. This function modifies the variable current-pattern after each learning iteration and uses the pattern-choosing function next-pattern .



8.1.2.0.2.7.2.5. (learn-cycle n)
(netenv.sn)


Performs n sweeps through the training set as defined by ensemble. The patterns are taken in sequence, regardless of the pattern-choosing function. This function does not modify the variable current-pattern .



8.1.2.0.2.7.2.6. (run cyc-length cyc-num)
(netenv.sn)


This is the main learning function. Function run performs cyc-length learning iterations, starting at pattern current-pattern and choosing the next pattern with the function next-pattern . Then it measures the performance on the training set (i.e the patterns whose index are between patt-min and patt-max as defined by the function ensemble ). This whole sequence is repeated cyc-num times .



8.1.2.0.2.7.2.7. (trun cyc-length cyc-num)
(netenv.sn)


This function behaves like function run . But it also measures the global performance on the test set specified by the function test-set .



8.1.2.0.2.7.3. Online Second Order Algorithms in Netenv.


A set of functions are provided for implementing second order stochastic back-propagation algorithms. The functions described in this section are only available with kernels sn28new and sn28itenew.



8.1.2.0.2.7.3.0. lev-mar
[VAR] (netenv.sn)


This flag controls the behavior of backward-2-prop and thus the behavior of basic-newton-iteration .

8.1.2.0.2.7.3.1. (basic-newton-iteration n)
(netenv.sn)


This function closely looks like basic-iteration . It however calls functions backward-2-prop and do-update-weight-newton . It applies thus the Quasi Newton or the Quasi Levemberg Marquardt algorithm according to the status of the flag lev-mar .



8.1.2.0.2.7.3.2. (learn-newton n)
(netenv.sn)


This function closely looks like learn . However it clears the flag lev-mar and calls basic-newton-iteration , applying thus the Quasi Newton algorithm.



8.1.2.0.2.7.3.3. (learn-cycle-newton n)
(netenv.sn)


This function closely looks like learn-cycle . However it clears the flag lev-mar and calls basic-newton-iteration , applying thus the Quasi Newton algorithm.



8.1.2.0.2.7.3.4. (run-newton cyc-length cyc-num)
(netenv.sn)


This function closely looks like run . However it clears the flag lev-mar and calls basic-newton-iteration , applying thus the Quasi Newton algorithm.



8.1.2.0.2.7.3.5. (trun-newton cyc-length cyc-num)
(netenv.sn)


This function closely looks like trun . However it clears the flag lev-mar and calls basic-newton-iteration , applying thus the Quasi Newton algorithm.



8.1.2.0.2.7.3.6. (learn-lm n)
(netenv.sn)


This function closely looks like learn . However it temporarily sets the flag lev-mar and calls basic-newton-iteration , applying thus the Quasi Levemberg Marquardt algorithm.



8.1.2.0.2.7.3.7. (learn-cycle-lm n)
(netenv.sn)


This function closely looks like learn-cycle . However it temporarily sets the flag lev-mar and calls basic-newton-iteration , applying thus the Quasi Levemberg Marquardt algorithm.



8.1.2.0.2.7.3.8. (run-lm cyc-length cyc-num)
(netenv.sn)


This function closely looks like run . However it temporarily sets the flag lev-mar and calls basic-newton-iteration , applying thus the Quasi Levemberg Marquardt algorithm.



8.1.2.0.2.7.3.9. (trun-lm cyc-length cyc-num)
(netenv.sn)


This function closely looks like learn , learn-cycle , run and trun . However it temporarily sets the flag lev-mar and calls basic-newton-iteration , applying thus the Quasi Levemberg Marquardt algorithm.



8.1.2.0.2.8. Training the Network with Batch Algorithms in Netenv.


These ``batch'' algorithms update the weights after the presentation of several patterns.



8.1.2.0.2.8.0. Batch Gradient in Netenv.


A set of functions are provided for implementing the batch version of back-propagation (among other learning algorithms). The functions described in this section are only available with kernels sn28ite and sn28itenew.

The batch version accumulates the gradients over a set of patterns before performing a weight update. The batch version is much slower than the on-line version (it usually takes much more forward and backward passes to learn a given problem). It is sometimes useful for analyzing the error surface and the behavior of an algorithm.



8.1.2.0.2.8.0.0. (run-batch cyc-length cyc-num)
(netenv.sn)


This is the main simulation function for the batch version.

Function run-batch performs cyc-length batch learning iterations. Each iteration consist in a sweep through the entire training set (i.e. patterns between patt-min and patt-max ). Then it measures the performance on this training set. This entire sequence is repeated cyc-num times.



8.1.2.0.2.8.0.1. (trun-batch cyc-length cyc-num)
(netenv.sn)


This is the main simulation function for the batch version.

Function trun-batch performs cyc-length batch learning iterations. Each iteration consist in a sweep through the entire training set (i.e. patterns between patt-min and patt-max ). Then it measures the performance on this training set and on the test set (i.e. patterns between tpatt-min and tpatt-max ). This entire sequence is repeated cyc-num times.



8.1.2.0.2.8.0.2. (learn-batch n)
(netenv.sn)


Performs n batch learning iterations. Each iteration consist in a sweep through the entire training set (i.e. patterns between patt-min and patt-max ).



8.1.2.0.2.8.0.3. (batch-iteration pmin pmax)
(netenv.sn)


This function implements the batch version of back-propagation.

Function batch-iteration accumulates the gradients and the errors of the training set patterns and then performs a weight update. The global variable global-error is set to the average output error over the set of patterns and its value is returned. The global variable age is incremented after each pattern presentation in order to remain consistent with the stochastic version.



8.1.2.0.2.8.1. Batch Conjugate Gradient in Netenv.


Function learn-batch-cg performs the conjugate gradient optimisation using two auxiliary function batch-cg1-iteration and batch-cg1-iteration . An elementary propagation function, forward-2-prop , is also defined at network creation time by function define-net . You can of course find more information about our implementation of the conjugate gradient by looking at these functions in file ".../sn28/lib/netenv.sn" .



8.1.2.0.2.8.1.0. (learn-batch-cg ns [grad1] [grad2] [gradc])
(netenv.sn)


Function learn-batch-cg performs ns epochs of the batch conjugate gradient algorithm. Each epoch consists in a complete pass over the training set. There is no need to set the learning rates or any other parameters before applying the batch conjugate gradient algorithm. As a side effect, function learn-batch-cg sets all learning rates to 1 . You must therefore reset the learning rates before applying another algorithm.

The optional arguments grad1 , grad2 and gradc are three matrices used for continuing the conjugate gradient algorithm without loosing the information accumulated during the previous epochs. Function learn-batch-cg indeed returns a list (it grad1 grad2 gradc) that you can use as argument list for the next call to function learn-batch-cg .

;; WRONG: Restarts the conjugate gradient every other epoch!
(repeat n 
 (learn-batch-cg 2) 
 (perf) ) 
;; RIGHT: Runs the conjugate gradient during 2*n epochs!
(let ((cgs (list 2)))
 (repeat n
 (setq cgs (apply learn-batch-cg cgs))
 (perf) ) )



8.1.2.0.2.8.1.1. (run-batch-cg cyc-length cyc-num)
(netenv.sn)


Function run-batch-cg repeats cyc-num times the following operations:

This function is simular to function run usually used for the online gradient algorithm.



8.1.2.0.2.8.1.2. (trun-batch-cg cyc-length cyc-num)
(netenv.sn)


Function trun-batch-cg repeats cyc-num times the following operations:

This function is similar to function trun used for the online gradient algorithm.



8.1.2.0.2.8.1.3. (batch-cg1-iteration from to)
(netenv.sn)


Function batch-cg1-iteration computes the gradient of the cost function. It performs a complete pass of the training set, accumulates the gradient of the cost function over all examples, and stores this gradient in connection field s-acc . This function is close to function batch-iteration but does not modify the weights.



8.1.2.0.2.8.1.4. (batch-cg2-iteration from to)
(netenv.sn)


Function batch-cg2-iteration returns the Gauss-Newton approximation to the curvature of the cost function along a search direction stored in connection field s-delta . It performs a second pass of the training set and computes the curvature using function forward-2-prop which is defined at network creation time by define-net .



8.1.2.0.2.8.2. Batch Second Order Algorithm in Netenv.


Finally, a set of functions are provided for implementing the batch version of back-propagation (among other learning algorithms). The functions described in this section are only available with kernel sn28itenew.



8.1.2.0.2.8.2.0. (batch-newton-iteration pmin pmax)
(netenv.sn)


This function is similar to batch-iteration . It however applies the Quasi Newton or the Quasi Levemberg Marquardt algorithm according to flag lev-mar.



8.1.2.0.2.8.2.1. (learn-batch-newton n)
(netenv.sn)


This functions is similar to learn-batch . However it clears the flag lev-mar and calls batch-newton-iteration , applying thus the Quasi Newton Algorithm.



8.1.2.0.2.8.2.2. (learn-batch-newton cyc-length cyc-num)
(netenv.sn)


This functions is similar to run-batch . However it clears the flag lev-mar and calls batch-newton-iteration , applying thus the Quasi Newton Algorithm.



8.1.2.0.2.8.2.3. (trun-batch-newton cyc-length cyc-num)
(netenv.sn)


This functions is similar to trun-batch . However it clears the flag lev-mar and calls batch-newton-iteration , applying thus the Quasi Newton Algorithm.



8.1.2.0.2.8.2.4. (learn-batch-lm n)
(netenv.sn)


This functions are similar to learn-batch . However it sets the flag lev-mar and calls batch-newton-iteration , applying thus the Quasi Levemberg Marquardt Algorithm.



8.1.2.0.2.8.2.5. (learn-batch-lm cyc-length cyc-num)
(netenv.sn)


This functions are similar to run-batch . However it sets the flag lev-mar and calls batch-newton-iteration , applying thus the Quasi Levemberg Marquardt Algorithm.



8.1.2.0.2.8.2.6. (trun-batch-lm cyc-length cyc-num)
(netenv.sn)


This functions are similar to trun-batch . However it sets the flag lev-mar and calls batch-newton-iteration , applying thus the Quasi Levemberg Marquardt Algorithm.



8.1.2.0.2.9. Pruning Connections in Netenv: Optimal Brain Damage (OBD).


Optimal Brain Damage (OBD) is a weight pruning method. It consists in computing a saliency criterion based on the curvature of the cost function.

This method sometimes improves the generalization performances by adjusting the effective number of parameters. The simulation speed is also improved. The optimised runtime speed however may remain identical because modern computers spend more time to perform a test than to perform a multiplication by zero.

Using pruning methods is quite a long process because the mathematics of these method usually require that the weight saliencies are computed on an optimum of the cost function. The best results are obtained after training the network during a long time before applying OBD.

Two library functions of "netenv.sn" implement OBD. These functions require kernels sn28new or sn28itenew because the computation of the saliency criterion requires the evaluation of the second derivatives of the cost function.

A typical use of OBD consists in the following steps:

An example file is provided in directory "sn28/examples/obd.sn" .



8.1.2.0.2.9.0. (obd-compute-curvature pmin pmax )
(netenv.sn)


Computes the diagonal of the hessian matrix of the cost function measured on patterns pmin to pmax . These diagonal values are left in field s-hess of the weights.



8.1.2.0.2.9.1. (obd-prune fraction)
(netenv.sn)


Removes a proportion fraction of the connections with the lowest saliency using function cut-connection and returns the exact number of connections removed. Argument fraction must be a number between 0 (prune nothing) and 1 (prune all).

When pruning a shared weight network (using sn28itenew) you must remember that the saliency criterion is a characteristic of the weight and not of the connection. All the connections sharing a same weight will be either kept or removed as a whole.



8.1.2.0.2.10. Displaying and Plotting in Netenv.




8.1.2.0.2.10.0. Netenv Displaying Modes.




8.1.2.0.2.10.0.0. (disp-basic-iteration)
(netenv.sn)


This is the general display function which is called after each learning iteration. This function can be dynamically redefined to one of the built-in display functions or to a user defined function.

See: (basic-iteration n )




8.1.2.0.2.10.0.1. (disp-perf-iteration)
(netenv.sn)


This is the general display function which is called after each pattern testing. This function can be dynamically redefined to one of the built in display functions or to a user defined function.

See: (test-pattern n )




8.1.2.0.2.10.0.2. (set-disp-everything)
(netenv.sn)


Makes disp-basic-iteration equivalent to disp-everything and disp-perf-iteration equivalent to disp-nil .



8.1.2.0.2.10.0.3. (set-disp-error)
(netenv.sn)


Makes disp-basic-iteration equivalent to disp-error and disp-perf-iteration equivalent to disp-nil . This function also creates a window disp-error-window and a plot-port disp-error-port for plotting the error.



8.1.2.0.2.10.0.4. (set-disp-net)
(netenv.sn)


Makes disp-basic-iteration and disp-perf-iteration equivalent to disp-net .



8.1.2.0.2.10.0.5. (set-disp-net-and-error)
(netenv.sn)


Makes disp-basic-iteration equivalent to disp-error and disp-perf-iteration equivalent to disp-net . SN2.8 will thus plot the instantaneous error during the training phase and draw the network states during the performance evaluation.



8.1.2.0.2.10.0.6. (set-disp-text)
(netenv.sn)


Makes both disp-basic-iteration and disp-perf-iteration equivalent to disp-text .



8.1.2.0.2.10.0.7. (set-disp-nil)
(netenv.sn)


Makes both disp-basic-iteration and disp-perf-iteration equivalent to disp-nil .



8.1.2.0.2.10.0.8. (disp-everything n)
(netenv.sn)


Function disp-everything can be redefined by the user. However, its default definition is:
(de disp-everything (patt-num)
  (print-layer output-layer)
  (printf "age=%d  pattern=%d  error=%9.5f  %s\n" 
      age patt-num local-error 
      (if good-answer "  ok  " "**arrgh**"))
  (plot-error age local-error)
  (draw-net net-struc patt-num) )



8.1.2.0.2.10.0.9. (disp-error n)
(netenv.sn)


Function disp-error can be redefined by the user. However, its default definition is:
(de disp-error (patt-num)
  (plot-error age local-error) )



8.1.2.0.2.10.0.10. (disp-net n)
(netenv.sn)


Function disp-net can be redefined by the user. However, its default definition is:
(de disp-net (patt-num)
  (draw-net net-struc patt-num) )



8.1.2.0.2.10.0.11. (disp-text n)
(netenv.sn)


Function disp-text can be redefined by the user. However, its default definition is:
(de disp-text (patt-num)
  (printf "age=%d  pattern=%d  error=%9.5f  %s\n" 
     age patt-num local-error 
     (if good-answer "  ok  " "**arrgh**")) )



8.1.2.0.2.10.0.12. (draw-net struc n)
(netenv.sn)


The user defined function draw-net must produce a graphic display of the network state on the current graphic window. This function is initially defined as an empty function.

Function draw-net is called by certain functions disp-XXX . The first argument is now obsolete. The second argument is the index of the current pattern.

It is easy to write a display function using function draw-list . Here is an example of how to define a draw-net function for a small network:

; Define the network as in the "getting started" 4-2-4 encoder example
; then define a display function for the 4-2-4 encoder
(de draw-net (a b)       ; a and b are dummy arguments, we won't use them
  (draw-list 100 100 (state output) 4 30 50 48)
  (draw-list 150 170 (state hid)    2 30 50 48)
  (draw-list 100 240 (state input)  4 30 50 48) )
(new-window)            ; now open a graphic window
(set-disp-net)          ; enable display
(learn 16)              ; do a couple iterations, look at the graphic screen.

The network editor ``Nettool'' generates automatically draw-net functions. These functions create an interactive display window. By clicking outside the network, the window is refreshed and displays the states of the network. By clicking on a neuron, the weights of the connections leading to the target neuron are displayed and the other neurons are turned to color gray (corresponding to weight 0).



8.1.2.0.2.10.0.13. (plot-error n err)
(netenv.sn)


Function plot-error plots the output error for the current pattern on a graph. It is called by certain functions disp-XXX . The default function plots the instantaneous error in the plot-port disp-error-port .



8.1.2.0.2.10.0.14. (print-layer l)
(netenv.sn)


Very simple textual description of the states of a layer.



8.1.2.0.2.10.0.15. (prsyn)
(netenv.sn)


Very simple textual description of the connections of the network.



8.1.2.0.2.10.0.16. (prneur)
(netenv.sn)


Very simple textual description of the states of the network.



8.1.2.0.2.10.0.17. (prnet)
(netenv.sn)


Very simple textual description of a network.



8.1.2.0.2.10.1. Monitoring Error and Performance in Netenv.




8.1.2.0.2.10.1.0. (init-error-plotting nsweep maxerr)
(netenv.sn)


This function creates window error-plotting-window if it does not exist already. It creates then two plot ports for plotting the mean squared error. These plot ports are named training-error-port and test-error-port .

As soon as these plot ports exist, functions run and trun plot an error curve in this window.



8.1.2.0.2.10.1.1. (init-perf-plotting nsweep)
(netenv.sn)


This functions creates window perf-plotting-window if it does not exist already. It creates then two plot ports for plotting the performance. These plot ports are named training-perf-port and test-perf-port .

As soon as these plot ports are defined, functions run and trun plot a performance curve in this window.

Remark: You can destroy these error and performance windows, either by using your window manager or by typing:

(delete perf-plotting-window)
(delete error-plotting-window)

Plotting is then cancelled.



8.1.2.0.2.10.2. Netenv Miscellaneous Functions.




8.1.2.0.2.10.2.0. (all-neuron (i) body)
(netenv.sn)


Evaluates body with i taking all possible unit indices.



8.1.2.0.2.10.2.1. (all-synapse (i j) body)
(netenv.sn)


Evaluates body with i and j taking all possible values for the upstream and downstream units of a connection.



8.1.2.0.3. Building Connections in Netenv.




8.1.2.0.3.0. Introduction to Building Connections in Netenv.


The functions introduced in this section are located in the file "connect.sn" which is automatically loaded on startup. They make the creation of complex networks easier by providing lisp functions which connect layers according to various connectivity patterns.

These functions have been designed for both efficiency and simplicity. They only use the two low-level functions connect and dup-connection .

See: Connections in Netenv.




8.1.2.0.3.1. Functions for Locally Connecting Layers in Netenv.


A first class of functions creates local connections. The first arguments always describe the upstream and downstream layers and their sizes.



8.1.2.0.3.1.0. (local-1d-connect layer1 n1 layer2 n2 <step size)
(connect.sn)


This function creates local connections between layer1 (a list of n1 cells) and layer2 (a list of n2 cells). The cells of layer2 are connected to a window of size cells from layer1 , stepped by step cells.

Function local-1d-connect complains if n1 != step*n2+size-step .



8.1.2.0.3.1.1. (local-2d-connect layer1 row1 col1 <layer2 row2 col2 xstep ystep xsize <ysize)
(connect.sn)


This function creates local connections between layer1 (a list of row1*col1 cells) and layer2 (a list of row2*col2 cells). Cells in layer1 and layer2 are listed by row order:
(r1c1 r1c2 r1c3...r1cN r2c1 r2c2...rMcN).

Cells of layer layer2 are connected to a window of xsize*ysize cells from layer1 , stepped by xstep and ystep cells. Function local-2d-connect complains if row1 != ystep*row2+ysize-ystep or col1 != xstep*col2+xsize-xstep .



8.1.2.0.3.1.2. (local-toric-connect layer1 row1 col1 layer2 row2 col2 xstep ystep xsize ysize)
(connect.sn)


This function is almost identical to function local-2d-connect : it creates local connections between layer1 (a list of row1*col1 cells) and layer2 (a list of row2*col2 cells). Cells of layer layer2 are connected to a window of xsize*ysize cells from layer1 , stepped by xstep and ystep cells.

However, side effects are handled in a different way: when using the function local-2d-connect , cells in the periphery of layer1 have less downstream connections than the central cells. Input data thus are best located in the center of the input layer.

In certain cases, it is easier to use the function local-toric-connect whose sliding window wraps around layer1 .

Layer layer1 always has col1*xstep columns and row1*ystep rows. The input window of the rightmost cells of layer2 wraps thus on the left of layer1 . The input window of the bottommost cells similarly wraps on the top.



8.1.2.0.3.2. Functions for Creating Shared Weights Masks in Netenv.


All the local connection functions have an equivalent function for performing shared weights connections. Of course, these functions require kernel sn28ite or sn28itenew.



8.1.2.0.3.2.0. (mask-1d-connect layer1 n1 layer2 n2 step size)
(connect.sn)


This function creates shared weight connections between layer1 (a list of n1 cells) and layer2 (a list of n2 cells). Cells of layer2 are connected to a window of size cells from layer1 , stepped by step cells, always using the same set of shared weights.



8.1.2.0.3.2.1. (mask-2d-connect layer1 row1 col1 layer2 row2 col2 xstep ystep xsize ysize)
(connect.sn)


This function creates shared weights connections between layer1 (a list of row1*col1 cells) and layer2 (a list of row2*col2 cells). Cells of layer layer2 are connected to a window of xsize*ysize cells from layer1 , stepped by xstep and ystep cells, always using the same set of shared weights.



8.1.2.0.3.2.2. (mask-toric-connect layer1 row1 col1 layer2 row2 col2 xstep ystep xsize ysize)
(connect.sn)


This function is almost identical to mask-2d-connect: it creates shared weights connections between layer1 (a list of row1*col1 cells) and layer2 (a list of row2*col2 cells). Cells of layer layer2 are connected to a window of xsize*ysize cells from layer1 , stepped by xstep and ystep cells.

However, side effects are handled like function local-toric-connect .



8.1.2.0.3.2.3. (equal-mask-1d-connect layer1 n1 layer2 n2 step size)
(connect.sn)


This function is basically similar to the mask-1d-connect function, except that all the connections share a same single weight. Function equal-mask-1d-connect might be used for implementing averaging/smoothing operations on signal data.



8.1.2.0.3.2.4. (equal-mask-2d-connect layer1 row1 col1 layer2 row2 col2 xstep ystep xsize ysize)
(connect.sn)


This function is basically similar to the mask-2d-connect function, except that all the connections share a same single weight. Function equal-mask-2d-connect might be used for implementing averaging/smoothing operations on image data.

Example:

; divide the resolution of a 16x16 layer 'layer1' by 2 into a 8x8 layer 'layer2'
(equal-mask-connect layer1 16 16 
     layer2  8  8
     2  2  2  2 )



8.1.2.0.3.2.5. (tdnn-connect layer1 nframes1 ntraits1 layer2 nframes2 ntraits2 step size)
(connect.sn)


This function helps implementing Time Delay Neural Networks. Cells in lists layer1 and layer2 are ordered by trait order:
(t1f1 t2f1 t3f1...t1f2 t2f2...).

Each frame of layer2 is connected to a sliding window in layer1 of size frames, stepped by step frames, always using the same set of weights.



8.1.2.0.3.2.6. (mask-2d-reverse layer1 row1 col1 layer2 row2 col2 xstep ystep xsize ysize)
(connect.sn)


This function creates shared weights connections between layer1 (a list of row1*col1 cells) and layer2 (a list of row2*col2 cells). Cells of layer layer1 are connected to a window of xsize*ysize cells from layer2 , stepped by xstep and ystep cells, always using the same set of shared weights.

Function mask-2d-reverse complains if row2 != ystep*row1+ysize-ystep or col2 != xstep*col1+xsize-xstep .



8.1.2.0.3.3. Functions for Creating Shared Biases in Netenv.


Shared weights usually are useful for designing translation invariant networks. Such networks also require the biases to be invariant.

Function build-net , however, always creates a separate bias per unit. The solution consist in using function build-net-nobias instead of function build-net . It uses the same arguments, but no biases are created.

The following functions let the user create biases by hand.



8.1.2.0.3.3.0. (bias-connect layer)
(connect.sn)


Creates a separate bias for each cell in list layer.



8.1.2.0.3.3.1. (shared-bias-connect layer)
(connect.sn)


Creates a single shared bias for each cell in list layer.



8.1.2.0.3.3.2. (tdnn-bias-connect layer nframes ntraits)
(connect.sn)


Creates a shared bias for each cell in each trait of layer layer which is composed of nframes*ntraits units.



8.1.2.1. SN Tools.




8.1.2.1.0. Introduction to SN Tools.


The Netenv library is mainly used for simulating multilayer networks. Indeed, SN2.8 contains efficient heuristics for training large multilayer networks using several variants of the back-propagation algorithm. This task is made even easier by two dedicated graphical interfaces, ``NetTool`` and ``BPTool``.

This manual describes all the functionalities of these tools.



8.1.2.1.1. NetTool.




8.1.2.1.1.0. Overview of NetTool.


NetTool is a graphic network editor dedicated to multilayer networks.

The NetTool window appears when function NetTool is runned. If a function NetTool-net is defined, the corresponding network is loaded into the editor.

The NetTool window includes a menu bar and a workspace controlled by two scrollbars.

Using NetTool you can build a network using all the connection patterns defined in the library "connect.sn" . NetTool deals with two kind of objects: layers and layer-to-layer connections.

The NetTool editor generates three lisp functions:

These functions can be used by the interpreter or saved into a file.



8.1.2.1.1.0.0. (NetTool)
(NetTool.sn)


This autoload function loads once the libraries "ogre.sn" and "NetTool.sn" and create the NetTool interface.



8.1.2.1.1.1. Loading and Saving a Network Architecture in NetTool.


There are four items in menu Network .

Item "New Network" clears the workspace and initializes the network editor. If the current network has not been saved into a file, a confirmation dialog is displayed.

Item "Load Network" pops up a dialog for loading a network description file. A mouse click on button "Load Network" of the requester reads the specified network description file. A network description file is just a file containing three SN functions: NetTool-net , create-net and draw-net . NetTool extracts the network architecture from the first function, NetTool-net .

Item "Save or Create Network" pops up a dialog for saving a network description file or creating a network.

Item "Quit" closes the network editor window. If unsaved changes have been performed on the current network, a confirmation dialog is displayed.



8.1.2.1.1.2. Layers in NetTool.


Layers are managed from menu Layer . There are also four items in this menu for creating, editing, duplicating and deleting layers:



8.1.2.1.1.2.0. Layers Creation in NetTool.


Selecting New in menu Layers pops up the layer editing dialog.

Several parameters can be specified.

The name of the layer is controled by an editstring. A default unique name is provided.

The size of the layer is controled by an editstring.

The biases of the units in the layer is controled by four exclusive buttons.

The display mode of the layer is controled by three exclusive buttons This mode is only useful for generating function draw-net.

The new layer appears in the workspace when you click button "Ok" .



8.1.2.1.1.2.1. Layers Manipulation in NetTool.


Layers are symbolized by a decorated rectangle and a label. The size and the location of the rectangle denotes the size and location of the layer drawn by function draw-net .

Four zones can be activated:



8.1.2.1.1.2.2. Layers Edition in Netttol.


Layers are edited with the layer editing dialog, which is also used for creating layers.

This dialog can be poped up and handle a given layer when the layer to edit has been selected by clicking on its label and either item Edit of menu Layer is called or the label is clicked on again.

See: Creating Layers in NetTool.




8.1.2.1.1.2.3. Layers Deletion in NetTool.


Layers are deleted after they have been selected by choosing item Delete of menu Layer . A confirmation dialog is poped up before effective deletion.



8.1.2.1.1.2.4. Layers Duplication in NetTool.


Layers are duplicated after they have been selected by choosing item Duplicate of menu Layer or typing key d .

The name of the new layers is derived from the name of the original layers. The parameters of the new layer are the same as the original layers. New connections are created similarily with the connections with the original layers.



8.1.2.1.1.3. Connections in NetTool.


Layer-to-layer connections are created, deleted and edited with the connection edition dialog.



8.1.2.1.1.3.0. Creating a Connection.


A connection is created by clicking in the connection handle of the upstream layer (the gray triangle on the right side of the layer layout) and draging the mouse to the downstream layer.

After the initial click in the connection handle, while the mouse is moved but its button is not released yet, a dashed line is drawn between the connection handle and the mouse position.

When the mouse button is released on the downstream layer, the connection edition dialog appears and lets you specify a pattern of connections among those offered by the functions of library "connect.sn" .

Menu Full / Local proposes full and local connection patterns.

Menu Shared proposes shared weight connection patterns. Networks using these connection patterns require kernels sn28ite or sn28itenew.

When you select certain patterns of connection, you must enter additional parameters in the fields named "Size" and "Step" . Each parameter can be a single digit, like "4" or a pair of digits, like "3x3" . You can find more details about the standard connection patterns in the chapter describing the library "connect.sn" of the manual "Netenv".

Parameters "From" and "To" enable the user to connect a part of the upstream layer to the downstream layer. They defaults so that the whole upstream layer is connected.

The connection is created when you click on button "Ok" .

See: Building Connections in Netenv.




8.1.2.1.1.3.1. Editing and Deleting Connections.


Connections are symbolized by a line linking both layers. In the first half of the line, there is a small black square which is in fact a small button.

This button becomes white if the parameters of this connection do not match the sizes of the layers. Some connection patterns indeed require arithmetic relations between the sizes of the layers and the parameters of the connection pattern.

Clicking on this button pops up the connection edition dialog.



8.1.2.1.1.4. Conclusion to NetTool.


Both beginners and advanced programmers can take advantage of this network editor.

8.1.2.1.2. BPTool.




8.1.2.1.2.0. Overview of BPTool.


BPTool is a graphical interface for controlling the training of multilayer networks.

The BPTool window appears when function BPTool is runned.

It allows you to perform the most common actions allowed by the library "netenv.sn" . It provides sensible default values for all the convergence parameters.

More precisely, BPTool provides mouse-driven interface to:

BPTool relies on the underlying mechanisms of library Netenv. Moreover, the various menus and requesters of BPTool are equivalent to the most common functions of this library. You can even mix BPTool actions and TLisp commands.

When called the BPTool interface first proposes a file requester for setting the performance file.

BPTool contains a menu bar with three menus.

BPTool contains several logical zones.



8.1.2.1.2.0.0. (BPTool)
(BPTool.sn)


This autoload function loads once the libraries "ogre.sn" and "BPTool.sn" and create the BPTool interface.



8.1.2.1.2.1. BPTool Menus.


Menus are somewhat arranged in logical order. You can use the menu items in order: start with menu File to load a network and a database, then specify the parameters with menu Parameters and request some information display using menu Settings .



8.1.2.1.2.1.0. BPTool File Menu.


Menu File lets you define a network, load the databases, load or save the weights.

Item "Define Network" starts the network editor NetTool. Using NetTool you can define or load a network and build it by checking item Create to Memory when you save the network description. Once the network has been created, the Network Status String displays the number of units, connections and weights in this network.

Item "Automatic Network" computes an optimized simple network with one hidden layer and full connections. This computation is made as follows:

Item "Load Patterns" pops up a requester for loading the example files. You must specify the names of two files. When you press button "Load" , both files are loaded into the variables pattern-matrix and desired-matrix using function load-patterns of the Netenv library.

Item "Load Weights" pops up a requester for loading the weights from a file using function load-net of the Netenv library.

Item "Save Weights" pops up a requester for saving the weight into a file using function save-net of the Netenv library.

Item "Quit" closes the BPTool window.



8.1.2.1.2.1.1. BPTool Parameters Menu.


Menu "Parameters" lets you select the parameters of the training algorithm.



8.1.2.1.2.1.1.0. Setting Initial Weights with BPTool.


Item "Initial weights" of menu "Parameters" pops up a requester for initializing the weights in the network.

The weights are initialized using uniform random values in a given interval which depends on a parameter x and the fanin of units (i.e. the number of incoming connections to a downstream cell).

Button "Default" sets the default initialization which uses interval random[-1/sqrt(fanin),+1/sqrt(fanin)].

Button "Ok" actually initializes the weights and pops the requester down.



8.1.2.1.2.1.1.1. Setting Initial Learning Rates with BPTool.


Item "Learning rate" of menu "Parameters" pops up a requester for setting the learning rates in the network.

Four methods are provided which depend on a parameter x , on the fanin (i.e. the number of incoming connections to a downstream cell) and share of units (the number of connections sharing this weight).

Button "Default" sets the default value 0.1 / sqrt(fanin)) .

Button "Ok" actually sets the learning rates.



8.1.2.1.2.1.1.2. Setting Momentums and Decays with BPTool.


Item "Momentum & Decay" of menu "Parameters" lets the user define two parameters used by the first order weight update rule.

This menu item is disabled when a second order algorithm is selected, like Newton or Levemberg-Marquardt.

Using this menu merely sets the variables alpha and decay which are used by the Netenv library. Both values default to 0.

See function update-weight of the Netenv library for more details.



8.1.2.1.2.1.1.3. Setting Second Order Parameters with BPTool.


Item "Mu & Gamma" of menu Parameters lets the user define two parameters used by the second order weight update rules.

This menu item is only enabled if a first order algorithm is selected, like Newton or Levemberg-Marquardt.

Using this menu merely sets the variables gamma and mu which are used by the Netenv library. Both values default to 0.05.

See function update-ggradient and hessian-scale of the Netenv library for more details.



8.1.2.1.2.1.1.4. Setting the Global Transfer Function with BPTool.


Item "Transfer function" lets the user choose a global transfer function. It calls a dialog providing sigmoidal, piecewise linear and binary activation functions. BPTool handles only the (single) global activation function.

The default function is a sigmoid.

Button "Default" sets the default sigmoid.

Button "Draw" draws the current activation function.



8.1.2.1.2.1.2. BPTool Settings Menu.


Menu Settings lets you choose a performance criterion and select display options.



8.1.2.1.2.1.2.0. Plotting Global Performances with BPTool.


Item "Plotting" pops up a requester for setting up windows for plotting the mean squared error and the network performance.

Field Number of sweeps specifies the length of the plot as a number of passes of the training set.

Pressing button "Initialize Plotting" actually creates the windows and recomputes the axes using the current number of iterations and the current size of the training set.



8.1.2.1.2.1.2.1. Classification criterion in BPTool..


Item "Classify" lets the user define how the performance is computed.



8.1.2.1.2.1.2.2. Display Options in BPTool.


Item "Display" of menu Settings lets the user set up various displays updated during every training iterations or every measurement iterations. This is similar to the functions set-disp of the Netenv library.



8.1.2.1.2.2. Training with BPTool.




8.1.2.1.2.2.0. Defining the test set and the training set with BPTool.


Four fields in the ``Data Selection Zone'' let you enter the boundaries of the training set and the test set. These values are passed to functions ensemble and test-set of the Netenv library.



8.1.2.1.2.2.1. Choosing the back-propagation version with BPTool.


You must also select a training algorithm using the ``Algorithm Selection Zone''. The algorithm selection zone offers four choices of an optimisation algorithm: Online Gradient, Online Levemberg Marquardt, Batch Gradient and Batch Conjugate Gradient.

Certain items are disabled when the current kernel does not support such training algorithm.

The default training algorithm is the so-called "Online Gradient Algorithm" (i.e. stochastic gradient) which is faster when you train your network on extremely redundant data, like handwritten digits.

If you are using sn28ite or sn28itenew, you can also select the "Batch Gradient Algorithm" . This algorithm is sometimes better for finishing the training and tuning the weights of a network. You must progressively decrease the learning rate when learning imporves.

If you are using sn28new or sn28itenew, you can use second order optimization of the cost function using the stochastic Levemberg-Marquardt's algorithm. These methods can be used with both the stochastic and batch gradient methods.

You can just specify a constant learning rate of about 0.0005 and let the algorithm adjust the step sizes. Although slower than well tuned parameters, these algorithms put the strain onto your computer rather than you.

The "Conjugate Gradient Method" is useful for deep optimization of small networks on medium size databases.



8.1.2.1.2.2.2. Controlling the training with BPTool.


Finally, the ``Training control Zone'' and ``Learning rate control zone'' let you control the training.

The editable field labelled "Test performance every --- iterations" controls the number of pattern presentation between two performance measurement. When a batch algorithm is selected, this number is rounded to the next multiple of the training set size. By default, performance measurements are done after every pass of the training set.

Three buttons let the user start the training algorithm.

On the first click button "Break" tells BPTool to stop training at the end of the current pass of the training set. This condition is displayed in the message string. On the second click it stops training immediatly.

The BPTool window remains active while the training algorithm is running. In particular, you can use the File , Parameters and Settings menus.

Two buttons control the learning rates.

At any time you can start a measurement pass on the training set or on the test set using the following buttons.



8.1.2.1.3. Other SN Tools.


Three tools are available for data preprocessing or postprocessing.



8.1.2.1.3.0. StatTool.


This tool computes simple statistics on the values of a 2-dimensional matrix.

Item "Load" in menu File lets you load a new matrix from a file or from a lisp variable.

Button "Statistics" computes simple statistics.

Button "Histogram" draws an histogram. The histogram routine however is written in Lisp and may be disappointingly slow.



8.1.2.1.3.0.0. (StatTool [mat])
(matrixtool.sn)


This autoload function invokes the Statool interface.



8.1.2.1.3.1. NormTool.


This tool performs a linear normalization on a 2-dimensional matrix.

There are several choices for entering the coefficients of the linear normalization.

If Column by column is checked, these normalization coefficients are computed independently for each column of the matrix.



8.1.2.1.3.1.0. (NormTool)
(matrixtool.sn)


This autoload function invokes the NormTool interface.



8.1.2.2. Knnenv.




8.1.2.2.0. Introduction to Knnenv.




8.1.2.2.0.0. Overview of Knnenv.


This manual discusses a family of algorithms known as ``codebook based algorithms'' for pattern recognition. SN2.8 offers turnkey routines for:

It also contains a set of lower level routines for manipulating codebooks and performing the essential computations of the above algorithms. In fact, all these algorithms are partly implemented by primitives hardcoded in the kernel, partly implemented as lisp programs located in file ".../sn2.8/lib/knnenv.sn" .

There is a major prerequisite for using codebook based algorithms: codebook based algorithms rely on a simple euclidian topology which is often too simple for handling raw data. They usually perform much better on adequately preprocessed data.

Preprocessing methods include a lot of techniques:

The algorithms themselves have been implemented with numerically robust routines whose convergence is based on well understood properties.



8.1.2.2.0.1. About this Knnenv Manual.


This manual is divided in five major parts:



8.1.2.2.1. Codebooks.




8.1.2.2.1.0. The Codebook Data Structure.


A codebook is a set of pairs composed of a vector and of a numeric label which encodes the class of the corresponding vector. Codebooks serve two purposes:

SN2.8 provides a number of fast primitives for (a) creating and accessing codebooks, (b) merging and splitting codebooks, (c) returning the closest elements of a codebook to a given pattern and (d) creating or improving a codebook of prototypes using a codebook of examples.



8.1.2.2.1.1. Codebook Creation and Access.


Codebooks are lisp objects of class |CODEBOOK|.

The patterns of a codebook are in fact stored in the rows of a matrix (the ``wordmatrix'') associated to the codebook. You can obtain this matrix either with function codebook-word or by evaluating the codebook with function eval .

The labels are stored in the codebook itself. Labels are small positive integer numbers. You can access these labels with function codebook-label .



8.1.2.2.1.1.0. (load-iris)
(knnenv.sn)


This function creates two codebooks containing Fisher's iris database. Codebook iris-a contains a training set of 100 examples. Codebook iris-t contains the total set of 150 examples (including the 100 training example for historical reasons only!)

This database contains examples of three species of iris.



8.1.2.2.1.1.1. (new-codebook <ncode | wordmatrix> <ndim | labelvector>)


When function new-codebook is called with numerical arguments, it returns a new codebook object able to contain ncode patterns of dimension ndim. Both the pattern and the labels are initialized with zeroes.
? (setq a (new-codebook 30 4))
= ::CODEBOOK:30x4
? (eval a)
= ::MAT:30x4
? (bound (eval a))
= (30 4)

When function new-codebook is called with matricial arguments, it returns an initialized codebook object. Argument wordmatrix is a 2-dimensional fmatrix whose rows contain the patterns. Argument labelvector is a 1-dimensionnal containing the labels of the patterns stored in every row of wordmatrix .

The resulting codebook just points to the patterns stored in matrix wordmatrix . Modifying this matrix thus modifies the codebook patterns.



8.1.2.2.1.1.2. (codebook-word cb [ n [ word ]] )


With one argument, this function returns the wordmatrix of codebook cb . This matrix addresses the same elements than the codebook. Modifying this matrix thus modify the codebook.

With two arguments, this function returns the n -th pattern of codebook cb as a vector. This vector addresses the same elements than the codebook. Modifying this vector thus modify the n -th pattern of the codebook.

With three arguments, this function copies vector word into the n -th pattern of codebook cb . It returns then the new pattern.

? (load-iris)
= ::CODEBOOK:150x4
? (codebook-word iris-a 2)))
= ::MAT:4



8.1.2.2.1.1.3. (codebook-label cb [ n [ label ]] )


With one argument, this function returns a new vector containing the labels of each example of codebook cb . Unlike the matrices returned by function codebook-word , modifying this vector does not modify the codebook.

With two arguments, this function returns the label of the n -th example of cb .

With three arguments, this function makes the label of the n -th example of codebook cb equal to integer label. It returns then the new label.

? (load-iris)
= ::CODEBOOK:150x4
? (for (i 0 4) (print (codebook-label iris-a i)))
0
2
2
0
2
= 2



8.1.2.2.1.1.4. (codebook-labels cb)


This function returns a list of all the labels present in codebook cb .
? (load-iris)
= ::CODEBOOK:150x4
? (codebook-labels iris-t)
= (0 1 2)



8.1.2.2.1.2. Codebook Operations.


A few operations are extremely useful for manipulating codebooks as a whole.



8.1.2.2.1.2.0. (codebook-split cb min max )


This function returns a codebook composed of examples min to max of codebook cb . Note that this new codebook addresses the same patterns than cb . Modifying a pattern in the new codebook thus modifies a pattern in codebook cb .
? (load-iris)
= ::CODEBOOK:150x4
? (codebook-split iris-t 25 74)
= ::CODEBOOK:50x4



8.1.2.2.1.2.1. (codebook-merge cb-1...cb-n)


This function combines the example of codebooks cb-1 to cb-n and returns a new codebook containing copies of these examples.
? (load-iris)
= ::CODEBOOK:150x4
? (codebook-merge iris-a iris-t)
= ::CODEBOOK:250x4



8.1.2.2.1.2.2. (codebook-select condition cb-1 ... cb-n)


This function extracts the examples of codebooks cb-1 to cb-n whose labels fulfil condition condition. It returns a new codebook containing copies of these examples.

Argument condition is a function of one argument (the label) which returns a boolean value. An example is selected if a call to this function with the example label as argument returns a non nil result.

? (load-iris)
= ::CODEBOOK:150x4
;; Select all iris of first class in codebooks iris-a and iris-t
? (codebook-select (lambda(c) (= c 1)) iris-a iris-t)
= ::CODEBOOK:81x4



8.1.2.2.1.3. Codebook and Netenv.


If you develop an application which merges a classical network and a codebook-based algorithm, you may wish to transfer data from a codebook to a network or vice versa.



8.1.2.2.1.3.0. Convertions with the Netenv Format.


The codebook data structure is handy for storing or retrieving pattern recognition data. A single data structure contains both the patterns and the class.

The natural format used by the Netenv library is slightly different. Patterns are stored in a first matrix (the pattern matrix) identical to the wordmatrix of a codebook. Labels however are stored in a second matrix (the desired matrix) whose columns correspond to each class. The largest value in a row indicates the class of the corresponding example.



8.1.2.2.1.3.0.0. (codebook-to-patdes cb)
(knnenv.sn)


This function takes a codebook cb and returns a list of the two matrices containing the same examples under the Netenv natural format.



8.1.2.2.1.3.0.1. (patdes-to-codebook pattern-matrix desired-matrix)
(knnenv.sn)


This function takes a pattern matrix pattern-matrix and a desired matrix desired-matrix and returns a codebook containing the same examples.



8.1.2.2.1.3.1. Using Codebooks with Netenv Algorithms.


Besides these conversion functions, you might redefine functions present-pattern and present-desired of the Netenv library for handling the codebook format. These functions indeed are always called for accessing the examples.

For instance, the following functions extract the example data of a back-propagation network from a codebook pattern-codebook rather than using the classical matrices pattern-matrix and desired-matrix .

(de present-pattern(layer pnum)
  (get_pattern (eval pattern-codebook) pnum layer)
  (if (0<> input-noise)
  (gauss-state layer input-noise) ) )
(de present-desired(layer pnum)
  (let ((cl (codebook-label pattern-codebook pnum)))
  (each ((c desired-layer))
  (if (0= cl)
  (nval c 1)
  (nval c -1) )
  (incr cl -1) ) ) )



8.1.2.2.1.4. Codebook Files.




8.1.2.2.1.4.0. Codebook File Format.


There are two formats for codebook files.

The ascii format records the codebook data in text files. The first two lines contain the number of examples ncode and the size of the patterns ndim . Every other line contains the patterns y as a collection of numbers separated by blanks. The corresponding labels are recorded as integer numbers on the intermediate lines.

        <ncode>
        <ndim>
        <y00> <y01> <y02> ...
        <label-0>
        <y10> <y11> <y12> ...
        <label-1>
        ...
The binary format is almost identical. The number of examples <ncode>, the
size of the patterns <ndim> and the labels are stored as four byte integers
(int). The patterns are stored as vectors of single precision numbers (float).
CODEBOOK FILE:
 int ncode;
 int ndim;
 REPEAT ncode TIMES:
 float pattern[NDIM];
 int labels;



8.1.2.2.1.4.1. Codebook I/O Functions.




8.1.2.2.1.4.1.0. (save-codebook cb filename)


This function saves a codebook cb into a file named filename under the binary codebook format.



8.1.2.2.1.4.1.1. (save-ascii-codebook cb filename)


This function saves a codebook cb into a file named filename under the ascii codebook format.



8.1.2.2.1.4.1.2. (load-codebook filename)


This function loads and returns a codebook from file named filename using the binary codebook format. Unlike function load-matrix , these functions never check the type of the data files.



8.1.2.2.1.4.1.3. (load-ascii-codebook filename)


This function loads and returns a codebook from file named filename using the ascii codebook format. Unlike function load-matrix, these functions never check the type of the data files.



8.1.2.2.2. k-Nearest Neighbors.


In this chapter, we describe elementary functions for using codebooks for quantization and classification task. The basic algorithm consists in extracting the k closest examples of a codebook to a given pattern.



8.1.2.2.2.0. Distances in Knnenv.


The very notion of closest examples depends on the selected distance. For efficiency reasons, we have implemented functions using the Euclidian distance. In many cases a non Euclidian topology can be mapped to a Euclidian topology by an adequate preprocessing of the patterns.

Consider for instance the Mahalanobis distance:

Mahalanobis(x,y) = (x-y)T A (x-y)

where matrix A is the inverse of the definite positive covariance matrix of the examples. By diagonalizing matrix A , we obtain A = (P)T L P where L is a diagonal matrix with positive terms. If we denote as R(L) the diagonal matrix whose terms are the square root of the terms of matrix L , we have:

Mahalanobis(x,y) = Euclidian( R(L) P x, R(L) P y)

Actually A is often the inverse of the covariance matrix plus a small regularizing term.



8.1.2.2.2.1. KNN for Vector Quantization.


Vector quantization consists in replacing a pattern by its closest pattern in a codebook. This technique changes a continuous problem (working on a continuous pattern space) into a discrete problem (working on a finite set of codes). It has been used extensively in several area like speech processing.



8.1.2.2.2.1.0. (codebook-distances vector cbref)


This function returns a 1-dimensional matrix containing the Euclidian distances between the vector vector and all patterns in codebook cbref . Argument vector must be a 1-dimensionnal matrix whose size matches the size of the example in codebook cbref .



8.1.2.2.2.1.1. (knn vector cbref k)


This function returns a list containing the order number of the k closest patterns of codebook cbref to vector vector . Argument vector must be a 1-dimensionnal matrix whose size matches the size of the example in codebook cbref .
? (load-iris)
= ::CODEBOOK:150x4
? (knn (codebook-word iris-t 0) iris-a 5)
= (0 73 59 56 87)
? (all ((n result)) (codebook-label iris-a n))
= (0 0 0 0 0)



8.1.2.2.2.2. KNN for Pattern Recognition.


As suggested by the previous example, we can assign a class to a new pattern by selecting the most frequent label among the labels of the k closest examples of a codebook. This is easily achieved by the following functions:



8.1.2.2.2.2.0. (codebook-dist-and-labels vector cbref)
(knnenv.sn)


This function returns a 2-dimensional matrix with 2 colums. The first column contains the Euclidian distances between the vector vector and all patterns in codebook cbref . The second column contains the labels of the corresponding patterns in codebook cbref . Argument vector must be a 1-dimensionnal matrix whose size matches the size of the example in codebook cbref .



8.1.2.2.2.2.1. (knn-class vector cbref k)


This function retrieves in codebook cbref the k closest examples to vector vector and returns the most frequent label. If two labels are equally represented, this function returns -1.
? (load-iris)
= ::CODEBOOK:150x4
? (knn-class (codebook-word iris-t 0) iris-a 5)
= 0

This pattern recognition method is referred to as ``the kNN algorithm''.



8.1.2.2.2.2.2. (test-knn cbdata n cbref <k.)
(knnenv.sn)


This function extracts the n -th pattern of codebook cbdata , retrieves in codebook cbref the k closest examples to vector vector and tests if the most frequent label is equal to the n -th label of codebook cbdata . In this latter case, it returns t ; otherwise () is returned.
? (load-iris)
= ::CODEBOOK:150x4
? (test-knn iris-t 0 iris-a 5)
= t



8.1.2.2.2.2.3. (perf-knn cbdata cbref k [flag])


This function applies the kNN algorithm to every pattern of codebook cbdata using the prototype codebook cbref . It compares then the result with the label stored in codebook cbdata and returns the percentage of correct classification.
? (load-iris)
= ::CODEBOOK:150x4
? (perf-knn iris-t iris-a 2)
= 94

When the optional argument flag is t , this functions returns a list (mis rej) which indicates the misclassification rate mis and the rejection rate rej . A rejection occurs when two labels are equally represented among the k closest examples.

? (perf-knn iris-t iris-a 2 t)
= (1.33333 4.66667)

The actual performance of the kNN algorithm improves when the number of references increases. Given a reference codebook size, the optimal value for parameter k is probably related to the amount of noise present in the data.



8.1.2.2.2.2.4. (confusion-knn cbdata cbref k)
(knnenv.sn)


This function applies the kNN algorithm to every pattern of codebook cbdata using the prototype codebook cbref . It compares then the result with the label stored in codebook cbdata and returns a matrix containing the number of examples of a given class recognized as members of a second class..
? (load-iris)
= ::CODEBOOK:150x4
? (confusion-knn iris-t iris-a 1)
= [[50.00 0.00 0.00]
 [ 0.00 48.00 2.00]
 [ 0.00 0.00 50.00]]

In this matrix, each row denotes the actual class and each column denotes the class returned by the kNN algorithm.



8.1.2.2.3. Codebook Algorithms.


Several issues must be addressed for using efficiently the nearest neighbour methods. On the first hand, the performance of these algorithms depends crucially on the size of the codebook. On the second hand, large codebooks make slower the search for the nearest neighbours.

This chapter present several algorithms for computing small codebooks with good properties using large codebooks. There are several families of algorithms:



8.1.2.2.3.0. Codebook Editing Algorithms.


Editing algorithms select a limited number of examples from a large codebook in order to maintain or improve the rate of correct classifications. We must then select patterns which implement a good piecewise linear approximation of the Bayes boundary.

Certain patterns indeed seem less informative:

Here are a few remarks for using these algorithms:



8.1.2.2.3.0.0. (multiedit cb n [verbose])
(knnenv.sn)


This function applies algorithm multiedit to codebook cb and returns the multiedited codebook. Here is the algorithm multiedit :

The parameter n controls the grain of the multiedit algorithm. It must be included between 3 and the square root of the codebook size. Using larger n eliminates more examples. We suggest starting with the smallest legal value: 3.

When the flag verbose is t , a message is displayed before each iteration. This message indicates the current size of the multiedited codebook.

? (load-iris)
= ::CODEBOOK:150x4
? (perf-knn iris-t iris-a 1)
= 98.6667
? (setq miris-a (multiedit iris-a 3))
= ::CODEBOOK:92x4
? (perf-knn iris-t miris-a 1)
= 96.6667



8.1.2.2.3.0.1. (condense cb [verbose])
(knnenv.sn)


This function applies the condense algorithm to codebook cb and returns the condensed codebook. Here is the condense algorithm:

When the flag verbose is t , a message is displayed before each iteration. This message indicates the current size of the multiedited codebook.

? (setq ciris-a (condense iris-a))
= ::CODEBOOK:17x4
? (perf-knn iris-t ciris-a 1)
= 98.6667

When the data sets are noisy, you should use multiedit before condense. Algorithm condense indeed would keep a lot of noisy patterns. The situation is less obvious on clean data sets. For example, 17 examples selected by the condense algorithm achieve the full performance of the iris training set..



8.1.2.2.3.1. Codebook Clustering Algorithms.


Clustering algorithms compute a codebook whose protoypes xi fulfil an optimal criterion. The main clustering algorithm in SN2.8 is the k-Means algorithm which minimizes the quantification error:
Q = Expectation( Min (x-xi)2 )

This quantity measures the average distance between a pattern and the closest prototype in the reference codebook.



8.1.2.2.3.1.0. (codebook-distorsion cbdata cbref)


This function returns the average distance between a pattern of codebook cbdata and the closest pattern in codebook cbref . This is exactly the quantity Q described above.



8.1.2.2.3.1.1. (k-means cbdata k)


This function returns a codebook of k prototypes which minimizes the quantization distorsion Q computed over codebook cbdata . This codebook is computed using the stochastic k-Means algorithm:

The k-Means algorithm does not depend on the labels of the example codebook.The default labels of the k references of the resulting codebook are those of the k first references of cbdata . It is thus useful to compute optimal labels using the function codebook-assign-labels .



8.1.2.2.3.1.2. (codebook-assign-labels cbdata cbref [k])
(knnenv.sn)


This function assigns labels to the examples of codebook cbref in order to minimize the misclassification rate of the kNN algorithm applied to the examples of codebook cbdata. This is achieved by a voting procedure. The default value for k is 1.
? (load-iris)
= ::CODEBOOK:150x4
? (setq kiris-a (k-means iris-a 12))
= ::CODEBOOK:12x4
? (codebook-assign-labels iris-a kiris-a 1)
= ::CODEBOOK:12x4
? (perf-knn iris-t kiris-a 1)
= 96



8.1.2.2.3.1.3. (k-means-class cbdata kpc [classes])
(knnenv.sn)


There is however a much better alternative for optimizing the pattern recognition performance. Function k-means-class compute a codebook containing kpc prototypes per class. These prototypes are computed by running algorithm k-Means on codebook cbdata within each class.
? (load-iris)
= ::CODEBOOK:150x4
? (setq kiris-a (k-means-class iris-a 2))
= ::CODEBOOK:6x4
? (perf-knn iris-t kiris-a 1)
= 96.6667

The optional argument classes restricts this algorithm to the examples whose labels belong to a given list. If this argument is omitted, all the examples are considered. Important note: you should always test this astounding ``supervised k-Means'' algorithm.



8.1.2.2.3.2. Codebook Iterative Algorithms.


Iterative algorithms try to improve the patterns of a small codebook for decreasing the misclassification rate. The main iterative algorithms are the Learning Vector Quantization (LVQ) algorithms introduced by Kohonen in the middle of the eighties.

The LVQ algorithms always require a good initial codebook of protypes returned by the k-Means or supervised k-Means algorithms. When a large number of clean examples are available, a few iterations of the LVQ algorithms improve the pattern recognition performance.



8.1.2.2.3.2.0. Initializing a Codebook.




8.1.2.2.3.2.0.0. (init-lvq cbdata k)
(knnenv.sn)


This function returns a good initial codebook with k examples for the LVQ algorithms. This codebook is computed by running the k-Means algorithm (see function k-means ) on the example codebook cbdata . The labels are assigned using a voting procedure (see function codebook-assign-labels ).



8.1.2.2.3.2.0.1. (init-lvq-class cbdata kpc)
(knnenv.sn)


This function returns an even better initial codebook with kpc examples per class by running the supervised k-Means algorithm (see function k-means-class above) on the example codebook cbdata . This initial codebook is so good that LVQ algorithms seldom improve the misclassification rate...



8.1.2.2.3.2.1. Codebook Learning Vector Quantization Algorithms.




8.1.2.2.3.2.1.0. (learn-lvq version cbdata cbref niter [ a0 [ win ]])


This single functions implements the three major LVQ algorithms for improving the prototype codebook cbref using the example codebook cbdata .

Argument niter specifies the number of iterations of the LVQ algorithm to perform. Argument version must be 1, 2 or 3; it selects the LVQ1, LVQ2 or LVQ3 algorithms.

Algorithm LVQ1 iterates niter times over the examples of codebook cbdata . For each example x , it retrieves the closest pattern x* in codebook cbref .

For each example x , algorithm LVQ2 retrieves the two closest patterns x* and x** in the prototype codebook.

then the closest prototype pattern x* is pushed away from the example pattern x by quantity epsi * (x-x*) and the second closest prototype pattern x** is pulled towards the example pattern by quantity epsi * (x-x**) .

For each example x , algorithm LVQ3 retrieves the closest pattern x* and the closest pattern x+ with the right label the prototype codebook.

It has been proven that algorithm LVQ3 minimizes the misclassification error on the example codebook when the window size win decreases slowly towards 0 .

Argument a0 indicates the initial value of the step size. The step size et decreases linearly from value a0 to 0 . The initial step size default to 0.1 which is usually a good value for all problems.

Argument win specifies the ``window'' size for the LVQ2 and LVQ3 algorithms. The default window size 0.01 however is too small to allow significant modification of the prototype patterns. You must either select a higher window size and use the LVQ3 algorithm for fine tuning the patterns.

? (load-iris)
= ::CODEBOOK:150x4
? (setq liris-a (init-lvq-class iris-a 2))
= ::CODEBOOK:6x4
? (perf-knn iris-t liris-a)
= 96.6667
? (learn-lvq 3 iris-a liris-a 20 0.1 0.1)
= ()
? (perf-knn iris-t liris-a)
= 97.3333


8.1.2.2.4. Radial Basis Functions in Knnenv.




8.1.2.2.4.0. Overview of Radial Basis Functions in Knnenv.


Radial Basis Function networks have been shown quite efficient for multivariate regressions and classifications.

They have been introduced by Broomhead and Loewe in 1988 and popularized later by Moody.

A RBF net is made of three layers:

fk(x) = g b( f((x-wk)2;s2) )


8.1.2.2.4.1. Implementation of Radial Basis Functions as an Hybrid Algorithm.


This chapter presents an example of neural network algorithm using both codebooks and standard networks.

In order to take advantage of the capabilities of both the Netenv and Knnenv libraries, we have designed a hybrid system. The computation of the distances ((x-wi)**2) is implemented by a codebook. The bell shaped function and the output units are implemented as a multilayer network.

The glue between both libraries is implemented by redefining locally the functions present-pattern and present-desired .

These functions extract an example from a codebook cbdata , compute the distances to the centers of the euclidian units (see function codebook-distances ) and enter these distances as input to the multilayer network.

The first layer of the multilayer network multiplies the distances by the inverse squared widths 1/s2 and applies the bell shaped function. The second layer of the multilayer network computes the linear outputs.

The corresponding lisp code is located in file "sn2.8/lib/knnenv.sn" .



8.1.2.2.4.1.0. (init-rbf cbdata kpc)
(knnenv.sn)


This function computes an initial codebook containing kpc prototypes per class by running the supervised k-Means algorithm on the examples set cbdata. Function init-rbf then builds the multilayer network using the Netenv functions and initializes the weights and the learning rates.

In order to implement the connection between the functions of Knnenv and the functions of NetEnv, this function generates function rbf-present-pattern and rbf-present-desired and variables rbf-codebook and rbf-classes .



8.1.2.2.4.1.1. rbf-codebook
[VAR] (knnenv.sn)


The global variable rbf-codebook contains the codebook containing the weights of the first layer of the RBF network.

It is created by the function init-rbf for implementing the glue between the functions of Knnenv and the functions of NetEnv. It is automatically handled by learn-rbf and perf-rbf .



8.1.2.2.4.1.2. rbf-classes
[VAR] (knnenv.sn)


The global variable rbf-classes contains the list of labels present in the example codebook. The multilayer network contains one output per class in this list.

It is created by the function init-rbf for implementing the glue between the functions of Knnenv and the functions of Netenv. It is automatically handled by learn-rbf and perf-rbf .



8.1.2.2.4.1.3. (rbf-present-pattern layer pnum)
(knnenv.sn)


Function rbf-present-pattern extracts the pnum-th pattern of codebook cbdata , computes the distances to the centers of the euclidian units and enters these distances into the states of the input layer layer .

It is created by the function init-rbf for implementing the glue between the functions of Knnenv and the functions of Netenv. It is automatically handled by learn-rbf and perf-rbf .



8.1.2.2.4.1.4. (rbf-present-desired layer pnum)
(knnenv.sn)


Function rbf-present-desired extracts the pnum-th label of codebook cbdata and stores the corresponding desired output vector into the states of the desired layer layer .

It is created by the function init-rbf for implementing the glue between the functions of Knnenv and the functions of Netenv. It is automatically handled by learn-rbf and perf-rbf .



8.1.2.2.4.1.5. (learn-rbf cbdata niter [epssigma] [epsclass])
(knnenv.sn)


This function performs niter back-propagation epochs on the multilayter network using the examples stored in codebook cbdata . The optional arguments epssigma and epsclass let the user enter learning rates for the two layers of the multilayer network.



8.1.2.2.4.1.6. (perf-rbf cbtest)
(knnenv.sn)


This function tests the current RBF network against the examples of codebook cbtest . It prints the mean squared error and the misclassification rate.
? (load-iris)
= ::CODEBOOK:150x4
? (init-rbf iris-a 4)
= t
? (learn-rbf iris-a 20)
= t
? (perf-rbf iris-t)
------------------------------------------------------
patterns{0-149}, age=2000, error=0.0895082, performance=92
= 92



8.1.2.2.5. Topological Maps in Knnenv.


A generic Topological Maps package has been added to the Knnenv library. The following sections provide

More details on the Topological Map algorithms are given by the following texts:



8.1.2.2.5.0. Overview of Topological Maps in Knnenv.


Like the other algorithms of the Knnenv library, both the databases and the Topological Maps reference points are stored as codebooks. There are mostly three differences between the k-Means and the Topological Map algorithms:

A single training function controls all these parameters. In fact this training function takes various hook functions as arguments. These hooks functions are called for finding the neighboring references, for scheduling the decrease of the learning rate and the neighborhood size and for producing various displays. Several predefined hook functions are available.

Topological Maps thus interact with all features of Knnenv. For instance, function codebook-assign-labels assigns classes to the references of a Topological Map. You can then use functions knn and perf-knn for a pattern recognition task. Although using a Topological Map for pattern recognition is a common practice, you should consider that Topological Maps is a non supervised algorithm. It is usually less efficient for pattern recognition than a supervised algorithm like k-means-class or learn-lvq .



8.1.2.2.5.1. Programming Topological Maps in Knnenv.


Programming Topological Maps requires four steps:

8.1.2.2.5.1.0. Creating and Initializing the Codebooks for 2D Topological Maps.


The Topological Map databases are stored in a codebook. This codebook might be created with new-codebook or loaded from a file with load-codebook .



8.1.2.2.5.1.0.0. (bidim-query string [xmin ymin xmax ymax])
(knnenv.sn)


This function displays string string on top of the current window. It builds a codebook by gathering the coordinates of the mouse clicks. These coordinates are scaled to fit in range [xmin,xmax]x[ymin,ymax] . If the range is not specified, the coordinates are scaled between -1 and +1 . This function returns the newly created codebook when the user clicks in the title string.

The Topological Maps reference points are also stored in a codebook. Such a codebook is easily created with the function new-codebook and might be initialized randomly. It is however more efficient to initialize the reference points using predefined patterns. The following functions lets you initialize a codebook using standard patterns defined in a bi-dimensional space.

These functions only set the first two coordinates of the reference points:



8.1.2.2.5.1.0.1. (bidim-init-ring cbd nrefs)
(knnenv.sn)


Returns a codebook of nrefs references located on a circle whose radius and center are the standard deviation and means of the points of training codebook cbd .



8.1.2.2.5.1.0.2. (bidim-init-2d cbd n m)
(knnenv.sn)


Returns a codebook of n times m references located on a grid of n lines and m columns whose extent is determined by the statistical properties of the points of codebook cbd .



8.1.2.2.5.1.1. The Topological Maps Learning Function in Knnenv.


This single function trains all kind of topological maps. The characteristics of the map are specified by three hook functions described below.



8.1.2.2.5.1.1.0. (learn-tmap cbdata cbref niter topo-func epsi-func dist-func [display-func])
(knnenv.sn)


This function performs niter iterations of the Topological Map training algorithm. Data points are accessed in codebook cbdata . The Topological Map reference points are stored in codebook cbref . Arguments topo-func , epsi-func and dist-func are hook functions which specify the characteristics of the map.

The optional argument display-func specifies a function called after each iteration for performing various display tasks. This function takes three arguments: the data codebook cbdata , the reference codebook cbref and the iteration number.



8.1.2.2.5.1.2. Topological Maps Predefined Hook Functions in Knnenv.


Standard hook functions are provided. The following calls generate a standard hook function for various topologies, various scheduling strategies and various simple displays.



8.1.2.2.5.1.2.0. Topology Hooks in Knnenv.




8.1.2.2.5.1.2.0.0. (tmap-topo-1d n)
(knnenv.sn)


Returns a topology hook function topo-func which defines a mono-dimensional topological map. Argument n indicates the number of nodes in the map. Such topological maps are better initialized using function bidim-init-ring .



8.1.2.2.5.1.2.0.1. (tmap-topo-ring n)
(knnenv.sn)


Returns a topology hook function topo-func which defines a mono-dimensional ring-shaped topological map. Argument n indicates the number of nodes in the map. Such topological maps are better initialized using function bidim-init-ring .



8.1.2.2.5.1.2.0.2. (tmap-topo-2d n m)
(knnenv.sn)


Returns a topology hook function topo-func which defines a bi-dimensional topological map organized as a grid with n lines and m columns. Such topological maps are better initialized using function bidim-init-2d .



8.1.2.2.5.1.2.0.3. (tmap-topo-3d n m p)
(knnenv.sn)


Returns a topology hook function topo-func which defines a tri-dimensional topological map organised as a grid with n planes, m lines and p columns.



8.1.2.2.5.1.2.0.4. (tmap-topo-kd n1...nk)
(knnenv.sn)


Returns a topology hook function topo-func which defines a k-dimensional topological map organised as a grid whose sizes are given by n1 to nk .



8.1.2.2.5.1.2.1. Topological Maps Learning Rate in Knnenv.




8.1.2.2.5.1.2.1.0. (tmap-epsi-const ieps)
(knnenv.sn)


Returns a learning rate scheduling function epsi-func which sets up a constant learning rate whose value is ieps . When the learning rate is equal to 1 , the algorithm exactly moves the closest reference point over the current data point. A value of 0.1 is often a good choice for ieps .



8.1.2.2.5.1.2.1.1. (tmap-epsi-lin ieps)
(knnenv.sn)


Returns a learning rate scheduling function epsi-func which sets up a linearly decreasing learning rate whose initial value is ieps .

When the learning rate is equal to 1 , the algorithm exactly moves the closest reference point over the current data point. A value of 0.3 is often a good choice for ieps .



8.1.2.2.5.1.2.2. Neighborhood Size.




8.1.2.2.5.1.2.2.0. (tmap-dst-const idist)


Returns a neighborhood size scheduling function dist-hook which selects a constant neigborhood size. This scheduling function returns 1 for all reference nodes whose distance to the closest node is smaller than idist .



8.1.2.2.5.1.2.2.1. (tmap-dst-lin idist)


Returns a neighborhood size scheduling function dist-hook which selects a linearly decreasing neigborhood size. This scheduling function returns 1 for all reference nodes whose distance to the closest node is smaller than idistxcoeff .



8.1.2.2.5.1.2.3. Topological Maps Display Hook in Knnenv.


Several simple display functions are provided. These functions display a projection of both the data points and the reference points onto the first two axes:



8.1.2.2.5.1.2.3.0. (tmap-display-bidim-1d [xmin ymin xmax ymax])
(knnenv.sn)


Returns a display hook function display-func for displaying in the current window a mono-dimensional Topological Map: Reference points are linked by a line. Scaling is computed to accommodate points whose first two coordinates fit in range [xmin,ymin]x[xmax,ymax] . The default range is [-1,+1]x[-1,+1] .



8.1.2.2.5.1.2.3.1. (tmap-display-bidim-ring [xmin ymin xmax ymax])
(knnenv.sn)


Returns a display hook function display-func for displaying in the current window a mono-dimensional ring shaped Topological Map. Reference points are linked by a closed line. Scaling is computed to accommodate points whose first two coordinates fit in range [xmin,ymin]x[xmax,ymax] . The default range is [-1,+1]x[-1,+1] .



8.1.2.2.5.1.2.3.2. (tmap-display-bidim-2d n m [xmin ymin xmax ymax])


Returns a display hook function display-func for displaying in the current window a mono-dimensional Topological Map. The reference points are linked by a grid of n lines and m columns. Scaling is computed to accommodate points whose first two coordinates fit in range [xmin,ymin]x[xmax,ymax] . The default range is [-1,+1]x[-1,+1] .



8.1.2.2.5.1.3. How to Write Efficient Topology Hook Functions in Knnenv ?


You must define a new topology hook function whenever you want to run a Topological Map with a new topology. There is clearly a right way and a wrong way to program such a hook functions.

Assume that we use a Topological Map organized as a grid of three lines and four columns. The distance between two nodes of the grid will be the number of vertical or horizontal segments between these nodes.

N0 N1 N2 N3
N4 N5 N6 N7
N8 N9 N10 N11

The topology hook function topo-func must compute a matrix whose elements are the distances from a given node to any other nodes. For instance, the distance matrices for nodes N5 and N8 are:

2 1 2 3        2 3 4 5
1 0 1 2        1 2 3 4
2 1 2 3        0 1 2 3

The function tmap-topo-2d first computes a matrix with six lines and eight columns which contains the distances the central element of the matrix, located on (3,4).

5 4 3 2 3 4 5 6
4 3 2 1 2 3 4 5
3 2 1 0 1 2 3 4
4 3 2 1 2 3 4 5
5 4 3 2 3 4 5 6
6 5 4 3 4 5 6 7

It returns then a function which copies a 3x4 submatrix into the destination matrix. This submatrix always includes element 3x4 of the big matrix on the same location than the closest node:

5 4 3 2 3 4 5 6        5 4 3 2:3:4:5 6
4 3 2:1:2:3 4 5        4 3 2 1:2:3:4 5
3 2 1:0:1:2 3 4        3 2 1 0:1:2:3 4
4 3 2:1:2:3 4 5        4 3 2 1 2 3 4 5
5 4 3 2 3 4 5 6        5 4 3 2 3 4 5 6
6 5 4 3 4 5 6 7        6 5 4 3 4 5 6 7

The hook function thus never computes a distance but copies the right section of the big matrix into the destination matrix. This copy is achieved by an efficient C routine. The user is encouraged to design custom topologies using the same efficient and flexible method.



8.1.2.2.5.2. A Complete Example of Topological Map in Knnenv


The following example is provided on-line:



8.1.2.2.5.2.0. (tsp-demo)
(knnenv.sn)


This function implements a ring shaped Topological Map for solving the Travelling Salesman Problem. It creates a window in which you can acquire cities with mouse clicks. The learning process begins and stops automatically. Here is the text of this program:
(de tsp-demo(&optional niter (epsi 0.33))
    ;; Create window, 
    ;; Acquire cities,
    ;; Initialize the topological map
    (let ((window ()))
      (new-window)
      (let* ((cbdata  (bidim-query "Enter cities."))
      (ncities (car (bound (eval cbdata)))) 
      (nrefs   (2* ncities))
      (cbref   (bidim-init-ring cbdata nrefs)) )
 
 ;; Compute the solution 
 (learn-tmap cbdata cbref (or niter ncities)
      (tmap-topo-ring nrefs)
      (tmap-epsi-const epsi)
      (tmap-dst-lin (2/ ncities))
      (tmap-display-bidim-ring) )
 
 ;; Move the nodes on the closest city
 ;; Note that epsilon=1!
 (learn-tmap cbdata cbref 1
      (tmap-topo-ring nrefs)
      (tmap-epsi-const 1)
      (tmap-dst-const 0)
      (tmap-display-bidim-ring) ) ) 
      window ) )

If you look at the call to function learn-tmap, you see that:



8.1.2.2.6. Programming with Knnenv: Examples.


Examples of Knnenv programms are located in "sn2.8/examples/Knnenv/examples.sn" . Please copy this file in your directory. You can then launch the function go described below.
(de go ()
  (printf "*********LOADING IRIS DATA SETS*********\n")
  ;load learning set and test set
  (load-iris)
  ;or
  ;(setq iris_a (load_codebook "../examples/Knnenv/iris_train")) 
  ;(setq iris_t (load_codebook "../examples/Knnenv/iris_test"))
  (printf "\n\n*********TESTING 1NN METHOD*********\n")
  ;nearest-neighbor performance
  (printf "performance=%f\n" (perf-knn iris-t iris-a 1))
  (printf "Printing confusion matrix\n")
  (matrix-print (confusion-knn iris-t iris-a 1))
  (printf "\n\n*********TESTING SOME PREPROCESSING*********\n")
  (printf "Multi-editing learning set\n")
  (setq multi-edited (multiedit iris-a 3))
  (printf "1NN performance on multi-edited set =%f\n"
  (perf-knn iris-t multi-edited 1) )
  (printf "Condensing the multi-edited set\n")
  (setq condensed (condense multi-edited))
  (printf "1NN performance on condensed set = %f\n"
  (perf-knn iris-t condensed 1) )
  (printf "Condensing without multi-edit:\n")
  (setq condensed (condense iris-a))
  (printf "1NN performance after condensing learning set = %f"
  (perf-knn iris-t condensed 1) )
  (printf " \n\n********* UNSUPERVISED KMEANS*********\n")
  (printf "Learning process with 9 references\n")
  (setq kiris-a (k-means iris-a 9))
  (printf "Assign class to references\n")
  (codebook-assign-labels iris-a kiris-a)
  (printf "1NN performance = %f\n" (perf-knn iris-t kiris-a 1))
  (printf "Performances of a few test examples at random\n")
  (for (i 0 9)
    (let*((random (int (rand 0 (bound (codebook-word iris-t) 1))) )
          (answer (test-knn iris-t random kiris-a 1)) )
       (printf "testing example %3d of test set = " random)
       (if (= answer t) (printf "well-classified\n")
          (printf "misclassified\n") ) ) )
  (printf "Have a look to the labels of the 9 references\n")
  (matrix-print (codebook-label kiris-a))
  (printf "\n Note that there are:\n")
  (printf "\t2 references dedicated to class 0\n")
  (printf "\t3 references dedicated to class 1\n")
  (printf "\t4 references dedicated to class 2\n")
  (printf " \n\n********* SUPERVISED KMEANS*********\n")
  (printf "Learning with 3 references per class\n")
  (setq kiris-a (k-means-class iris-a 3))
  (printf "1NN performance = %f" (perf-knn iris-t kiris-a 1))
  (printf "\n\n********* LVQ1*********\n")
  (printf "Learning with 3 references per class\n")
  (printf "on 20 iteratiobns\n")
  (setq kiris-a (init-lvq-class iris-a 3))
  (learn-lvq 1 iris-a kiris-a 20)
  (printf "LVQ1 performance = %f" (perf-knn iris-t kiris-a 1))
  (printf "\n\n********* LVQ2*********\n")
  (printf "Learning with 3 references per class\n")
  (setq kiris-a (init-lvq-class iris-a 3))
  (learn-lvq 2 iris-a kiris-a 20)
  (printf "LVQ2 performance = %f"(perf-knn iris-t kiris-a 1))
  (printf "\n\n********* LVQ3*********\n")
  (printf "Learning with 3 references per class\n")
  (setq kiris-a (init-lvq-class iris-a 3))
  (learn-lvq 3 iris-a kiris-a 20)
  (printf "LVQ3 performance = %f" (perf-knn iris-t kiris-a 1))
  ;saving any of the above codebooks
  ;for example kiris-a is saved in file results.cb
  (save-codebook kiris-a "results")
  (printf "\n\n*********RADIAL BASIS FUNCTIONS*********\n")
  (printf "Learning with 3 nodes per class\n")
  (init-rbf iris-a 2)
  (learn-rbf iris-a 20)
  (printf "Performances of RBF algorithm\n")
  (perf-rbf iris-t)
  ;saving the RBF weights
  ;saving the codebook containing the nodes
  (save-codebook rbf-codebook "nodes")
  ;saving the nodes widths and linear weights
  (save-net "sigmaw")
  ;for loading previously saved rbf weights
  (init-rbf iris-a 2)
  (setq rbf-codebook (load-codebook "nodes"))
  (load-net "sigmaw")
  (printf "\n Remove all files built by this program\n")
  (sys "rm sigmaw.wei")
  (sys "rm nodes.cb")
  (sys "rm results.cb")
  )