Lab notes for Wednesday, April 5, 2006

Using the Conx library to create simple neural networks

--------------------------------------------------------------------------

Let's train a simple neural network to solve the AND task:

Input  Output (AND)
 0 0     0
 0 1     0
 1 0     0
 1 1     1

First, we'll write a program to set up an appropriate network...

#--------------------------------------------------------------------------
# File: net.py

# load in Conx support for neural networks
from ConxExtensions import *

# create a basic feedforward backpropagation network
n = BackpropNetwork()

# add layers in the order they will be connected
n.addLayer('input', 2)        # input layer has two units
n.addLayer('output', 1)       # output layer has one unit
n.connect('input', 'output')  # connect the layers together

# learning rate
n.setEpsilon(0.5)
# how often the network reports its total error during training
n.setReportRate(1)
# how close an output value has to be to the target to count as correct
n.setTolerance(0.1)

# specify the dataset to use for learning AND
n.setInputs([[0, 0], [0, 1], [1, 0], [1, 1]])
n.setTargets([[0], [0], [0], [1]])

print "Network is set up"

#--------------------------------------------------------------------------

Now we'll train the network on the dataset by just calling train...

% idle net.py
Network is set up
>>> n.train()
Epoch #     1 | TSS Error: 1.0132 | Correct = 0.0000 | RMS Error: 0.5033
Epoch #     2 | TSS Error: 0.8453 | Correct = 0.0000 | RMS Error: 0.4597
Epoch #     3 | TSS Error: 0.7362 | Correct = 0.0000 | RMS Error: 0.4290
  .
  .
  .

----------------------------------------------------
Final #    45 | TSS Error: 0.0245 | Correct = 1.0000 | RMS Error: 0.0782
----------------------------------------------------

After training, we can use propagate to test out individual inputs:

>>> n.propagate(input=[0, 0])
[0.00092551928227623556]
>>> n.propagate(input=[0, 1])
[0.086001671392928164]
>>> n.propagate(input=[1, 0])
[0.084881557981294223]
>>> n.propagate(input=[1, 1])
[0.90404249930777647]
>>> 

The trained network produces outputs that correspond closely to the training
targets.  We can also run through the entire dataset interactively by first
turning learning off and interactive mode on, and then sweeping through all of
the patterns:

>>> n.setLearning(0)
>>> n.setInteractive(1)
>>> n.sweep()

To make it easier to train the network on different datasets, we put the
training data in the files inputs.dat and and-targets.dat, and use the methods
loadInputsFromFile and loadTargetsFromFile instead of setInputs and
setTargets...

#--------------------------------------------------------------------------
# File: net.py

from ConxExtensions import *

n = BackpropNetwork()

n.addLayer('input', 2)
n.addLayer('output', 1)
n.connect('input', 'output')

n.setEpsilon(0.5)
n.setReportRate(1)
n.setTolerance(0.1)

# changed these lines
n.loadInputsFromFile("inputs.dat")
n.loadTargetsFromFile("and-targets.dat")

print "Network is set up"

#--------------------------------------------------------------------------

The OR task is similar to AND.  Our network can learn it easily.

Input  Output (OR)
 0 0     0
 0 1     1
 1 0     1
 1 1     1

n.loadTargetsFromFile("or-targets.dat")

>>> n.train()
Epoch #     1 | TSS Error: 1.0218 | Correct = 0.0000 | RMS Error: 0.5054
Epoch #     2 | TSS Error: 0.6437 | Correct = 0.0000 | RMS Error: 0.4011
Epoch #     3 | TSS Error: 0.6014 | Correct = 0.5000 | RMS Error: 0.3877
  .
  .
  .

----------------------------------------------------
Final #    35 | TSS Error: 0.0151 | Correct = 1.0000 | RMS Error: 0.0614
----------------------------------------------------

The XOR task is harder.  Our network cannot learn this task using only two
layers of units.

Input  Output (XOR)
 0 0     0
 0 1     1
 1 0     1
 1 1     0

n.loadTargetsFromFile("xor-targets.dat")

>>> n.train()
Epoch #     1 | TSS Error: 1.1188 | Correct = 0.0000 | RMS Error: 0.5289
Epoch #     2 | TSS Error: 1.1276 | Correct = 0.0000 | RMS Error: 0.5309
Epoch #     3 | TSS Error: 1.1240 | Correct = 0.0000 | RMS Error: 0.5301
  .
  .
  .

Epoch #  4818 | TSS Error: 1.0918 | Correct = 0.0000 | RMS Error: 0.5224
Epoch #  4819 | TSS Error: 1.2915 | Correct = 0.0000 | RMS Error: 0.5682
Epoch #  4820 | TSS Error: 1.2618 | Correct = 0.0000 | RMS Error: 0.5616
  .
  .
  .
(interrupt by typing Control-C)

In order to learn XOR, we need to add an extra layer of units to our network,
called the hidden layer:

n.addLayer('input', 2)
n.addLayer('hidden', 2)
n.addLayer('output', 1)
n.connect('input', 'hidden')
n.connect('hidden', 'output')

>>> n.train()
Epoch #     1 | TSS Error: 1.1846 | Correct = 0.0000 | RMS Error: 0.5442
Epoch #     2 | TSS Error: 1.1185 | Correct = 0.0000 | RMS Error: 0.5288
Epoch #     3 | TSS Error: 1.0997 | Correct = 0.0000 | RMS Error: 0.5243
  .
  .
  .
----------------------------------------------------
Final #   175 | TSS Error: 0.0308 | Correct = 1.0000 | RMS Error: 0.0877
----------------------------------------------------

We can use setInteractive and sweep as before to examine the behavior of the
network on the input patterns.  The hidden layer activation patterns
corresponding to each input pattern are shown below:

Input     Hidden

 0 0     0.74 0.00
 0 1     0.04 0.08
 1 0     0.04 0.08
 1 1     0.00 0.86 

This shows that the network has learned a new, internal hidden representation
of the input patterns, in order to be able to solve the task.

#--------------------------------------------------------------------------

Now let's try an auto-association task, where the network simply learns to
reproduce the input patterns on the output layer, using a smaller hidden layer
in the middle...

Here is the file auto-inputs.dat:

1 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0
0 0 1 0 0 0 0 0
0 0 0 1 0 0 0 0
0 0 0 0 1 0 0 0
0 0 0 0 0 1 0 0
0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 1

The dataset consists of eight unique patterns.  We will use a hidden layer
with three units, which will force the network to learn to encode the eight
input patterns using a three-dimensional space of hidden layer patterns...

n.addLayer('input', 8)
n.addLayer('hidden', 3)
n.addLayer('output', 8)
n.connect('input','hidden')
n.connect('hidden','output')

n.loadInputsFromFile("auto-inputs.dat")
n.loadTargetsFromFile("auto-inputs.dat")

>>> n.train()
Epoch #     1 | TSS Error: 11.4415 | Correct = 0.0156 | RMS Error: 0.4228
Epoch #     2 | TSS Error: 7.4859 | Correct = 0.7969 | RMS Error: 0.3420
Epoch #     3 | TSS Error: 7.5817 | Correct = 0.8750 | RMS Error: 0.3442
  .
  .
  .
----------------------------------------------------
Final #   177 | TSS Error: 0.0785 | Correct = 1.0000 | RMS Error: 0.0350
----------------------------------------------------

We can save the hidden layer patterns generated by the inputs in a file:

n.saveHiddenReps('auto')

This creates a log file called 'auto.hiddens' and performs one sweep through
the dataset, recording each hidden layer activation pattern in the log file.
The resulting file looks like this:

0.035725 0.000000 0.309874
0.001861 0.332286 0.999953
0.965465 0.966408 0.000007
0.999960 0.002240 0.000715
0.995763 0.000063 0.995603
0.000233 0.522564 0.000000
0.995154 0.999652 0.998858
0.000038 0.999914 0.555374

These hidden representations correspond to points in a three-dimensional space
(since each pattern is a tuple of three values).  We can easily graph these
eight points using the Linux program gnuplot:

% gnuplot
gnuplot> splot "auto.hiddens" notitle

Dragging the plot with the left mouse button rotates it; dragging left/right
with the middle mouse button zooms the plot in or out; dragging up/down with
the middle mouse button stretches or squeezes the vertical axis.

This way, we can visualize the structure of the hidden representations learned
by the network, at least in the case of three-dimensional patterns.  The
concept extends to higher dimensional patterns in a natural way, although
visualizing them is more difficult.