Author(s): Yann LeCun

Class for a Time Delay Neural Network with 3 convolutional layers (with local connectivity). This type of network is appropriate for classifying sequences of time/frequency represnetations such as cepstrum, MEL Scale spectrum, and other. (build-ccc-tdnn params freqs max-seq-len fk1 fs1 nh1 tk1 fk2 fs2 nh2 tk2 fk3 fs3 nh3 tk3)

Build a Time-Delay Neural Network for data such as spectral sequences in which the features have a topology, choosing the connection tables to obtain local connections in feature space (as specified by kernel size and stride for the frequency axis, for each of these layers). The input to the net is assumed to be Fx1xT where F is for example the number of frequency channels (spectral representation) and T is the length of the sequence. Here, the network has 3 convolutional layers. The arguments are the following:
- <params> is a idx1-ddparam on which to allocate parameters for the layers.
- <freqs> is the number of input frequency channels.
- <max-seq-len> = maximum sequence length.
- <fk1>,<fk2>,<fk3> = sizes of frequency kernels (= width of local freq. windows)
- <fs1>,<fs2>,<fs3> = step sizes which separate the successive frequency windows
- <nh1>,<nh2>,<nh3> = number of hidden units per frequency channel, for each layer
- <tk1>,<tk2>,<tk3> = sizes of the temporal kernels, for each layer. (tdnn-present-pattern from to mean idev temporal-window-size)

from is a Txf source matrix, to is a fx1xT' destination state (T' = T + temporal-window-size - 1), mean is a f-vector to substract from source, idev is a f-vector to multiply by source. temporal-window-size is the length of the input of the network yielding an output of length 1. norm-ftdnn

Wrapper around ccc-tdnn which does input normalization and pads the input according to the network architecture. (new norm-ftdnn n-inputs nh1 nh2 n-outputs weight-file norm-file [(-tk1 5)(-tk2 8)(-tk3 12) (-fk1 6)(-fk2 3)(-fk3 3) (-fs1 3)(-fs2 2)(-fs3 1) (max-seq-len 2000)])
[CLASS] (packages/gblearn2/ccc-tdnn.lsh)