Goal is to cluster or categorize the input data.
Categories must be discovered by the network itself from regularities in the input data (data is unlabelled).
Can be used for data encoding and compression by replacing each data vector by the index number of its category (vector quantization).
Network architecture:
Output units compete to classify input patterns.
Only one output unit fires at a time: the one with the largest incoming activation.
Winner-take-all process can be implemented by simply picking the unit with the highest output or through lateral inhibitory connections.
Problems with "grandmother cell" representations:
Simple Competitive Learning Algorithm
Initialize network weights to small random values.
Choose a pattern x from the dataset.
Apply pattern to input layer and determine winning output unit i*.
|x − wi*| < |x − wi | (for all i)
where wi is the weight vector for output unit i.
Update the weights of the winning unit i* only.
Δwj,i* = η × ( xj − wj,i* )
wj,i*new = wj,i*old + Δwj,i*
where xj is the jth component of input vector x
and wj,i* is the weight from input unit j to winning output unit i*.
Go to step 2 and repeat for the next input pattern until output weights stabilize.
A set of points in n-dimensional space divides the space into a Voronoi tessellation:
Output units correspond to prototype vectors, each of which serves as a prototype for all input patterns within its Voronoi region.
Example: categorizing a set of 2-dimensional patterns using 10 prototype
vectors
Network architecture has 10 output units:
Before learning:
After learning:
Each prototype vector represents a region of the input space:
A set of prototype vectors constitutes a codebook.
Input patterns can be replaced by their codebook index in order to achieve data compression.
Problem of dead units, which start out far away from input patterns and never win.
Some solutions:
Competitive learning networks in which location of output unit conveys information.
Output units have fixed positions within a one-, two-, or three-dimensional grid.
A topology preserving map maps points from the input space to units in the output grid in such a way as to preserve neighborhood relations.
As two input patterns get closer in input space, winning output units get closer in output grid.
Define a neighborhood function Λ(i, i*) between
output units.
where ri and ri* are the
positions within the output grid of units i and i*,
and σ is a parameter that controls the neighborhood size.
When i = i*, Λ(i, i*) = 1.
The parameters η (learning rate) and σ (neighborhood size) start large and are decreased during training (with a third parameter controlling the rate of decay of these two parameters).
Self-Organizing Map Algorithm
Initialize network weights to small random values.
Choose a pattern x from the dataset.
Apply pattern to input layer and determine winning output unit i*.
|x − wi*| < |x − wi | (for all i)
where wi is the weight vector for output unit i.
Update the weights of all output units according to
Δwj,i = η × Λ(i, i*) × ( xj − wj,i )
wj,inew = wj,iold + Δwj,i
where xj is the jth component of input vector x
and wj,i is the weight from input unit j to output
unit i.
Go to step 2 and repeat for the next input pattern until output weights stabilize.