Unsupervised Learning

Competitive Learning

Goal is to cluster or categorize the input data.
Categories must be discovered by the network itself from regularities in the input data (data is unlabelled).
Can be used for data encoding and compression by replacing each data vector by the index number of its category (vector quantization).
Network architecture:
- Single layer of weights.
- Input units fully connected to output units.
- Lateral inhibitory connections between units in output layer.
Output units compete to classify input patterns.
Only one output unit fires at a time: the one with the largest incoming activation.
Winner-take-all process can be implemented by simply picking the unit with the highest output or through lateral inhibitory connections.
Problems with "grandmother cell" representations:
- Dedicated output unit required for each category.
- Not robust to degradation or failure. If one output unit fails, entire category is lost.
- No way to represent hierarchical knowledge.
Simple Competitive Learning Algorithm
1. Initialize network weights to small random values.
2. Choose a pattern x from the dataset.
3. Apply pattern to input layer and determine winning output unit i*.
  
  |x − w_i*| < |x − w_i | (for all i)
  
  where w_i is the weight vector for output unit i.
4. Update the weights of the winning unit i* only.
  
  Δw_j,i* = η × ( x_j − w_j,i* )
  
  w_j,i*^new = w_j,i*^old + Δw_j,i*
  
  where x_j is the jth component of input vector x
  and w_j,i* is the weight from input unit j to winning output unit i*.
5. Go to step 2 and repeat for the next input pattern until output weights stabilize.

Geometric Interpretation of Competitive Learning

A set of points in n-dimensional space divides the space into a Voronoi tessellation:
Output units correspond to prototype vectors, each of which serves as a prototype for all input patterns within its Voronoi region.
Example: categorizing a set of 2-dimensional patterns using 10 prototype vectors
Network architecture has 10 output units:
Before learning:
After learning:
Each prototype vector represents a region of the input space:
A set of prototype vectors constitutes a codebook.
Input patterns can be replaced by their codebook index in order to achieve data compression.
Problem of dead units, which start out far away from input patterns and never win.
Some solutions:
- Initialize output weights to samples from the input patterns.
- Update the weights of the losers too, using a smaller learning rate (leaky learning).
- Update the weights of the winning unit's neighbors, using a smaller learning rate that depends on the distance from the winner (self-organizing feature map).
- Other approaches are also possible.
Competitive learning tutorials
Competitive learning demos

Self-Organizing Feature Maps

Competitive learning networks in which location of output unit conveys information.
Output units have fixed positions within a one-, two-, or three-dimensional grid.
A topology preserving map maps points from the input space to units in the output grid in such a way as to preserve neighborhood relations.
As two input patterns get closer in input space, winning output units get closer in output grid.
Define a neighborhood function Λ(i, i*) between output units.

where r_i and r_i* are the positions within the output grid of units i and i*, and σ is a parameter that controls the neighborhood size.
When i = i*, Λ(i, i*) = 1.
The parameters η (learning rate) and σ (neighborhood size) start large and are decreased during training (with a third parameter controlling the rate of decay of these two parameters).
Self-Organizing Map Algorithm
1. Initialize network weights to small random values.
2. Choose a pattern x from the dataset.
3. Apply pattern to input layer and determine winning output unit i*.
  
  |x − w_i*| < |x − w_i | (for all i)
  
  where w_i is the weight vector for output unit i.
4. Update the weights of all output units according to
  
  Δw_j,i = η × Λ(i, i*) × ( x_j − w_j,i )
  
  w_j,i^new = w_j,i^old + Δw_j,i
  
  where x_j is the jth component of input vector x
  and w_j,i is the weight from input unit j to output unit i.
5. Go to step 2 and repeat for the next input pattern until output weights stabilize.

Self-organizing map demos