CS 151 Assignment 7 — Neural Network Learning

Deadline extended to Thursday, November 3
  1. Read the article Recursive Distributed Representations by Jordan Pollack, which is reading #10 in your reading packet. This article describes a technique for encoding symbolic data structures such as parse trees into distributed representations suitable for processing by neural networks. We'll talk about RAAMs in class Tuesday.

  2. Implement the backpropagation neural network learning algorithm and test it on a few datasets, using this code as your starting point. Print out and study this code until you understand how it works. The BPN constructor creates a new network with a specified number of input, hidden, and output units. The loadData method loads in a dataset of training patterns and their associated targets from a pair of text files. For example, to set up a network with 2 input units, 2 hidden units, and 1 output unit for learning the binary XOR function, we could do this:

    net = BPN(2, 2, 1)
    net.loadData('xorInputs.dat', 'xorTargets.dat')
    

    The splitData method randomly divides the current dataset into a subset of training patterns (stored in the lists net.trainInputs and net.trainTargets), and a subset of testing patterns (stored in net.testInputs and net.testTargets). The size of the training subset is specified as a percentage (0-100) of the entire dataset.

    You will need to add additional methods in order to train your networks and evaluate their performance. The additional methods that you are required to implement are described in the comments in the starting code, and are summarized below. However, you are free to add any other methods or variables you like.

Implementation Requirements

You should not modify the names or behavior of any variables or methods given in the starting code, nor should you change the basic internal structure of a network. Changing the values of various parameters and variables is, of course, fine (e.g., net.learningRate, net.tolerance, net.trainInputs, etc.).

The following additional methods are required in your implementation:

Momentum

Your code should use the net.momentum parameter when updating the weights of the network. Adding a momentum term to the weight update rule of backpropagation often helps speed up learning. Momentum (α) is a constant in the range 0 < α < 1, which contributes a portion of the weight update from the previous time step to the current weight update:

Δwij(t) = η δi Oj + α Δwij(t - 1)

Think of this momentum term as influencing the direction of the gradient descent search through weight space, which is analogous to a ball rolling down the error surface. The effect of α is to add momentum that tends to keep the ball rolling in the same direction from one iteration to the next. This can sometimes have the effect of keeping the ball rolling through small local minima in the error surface, or along flat regions in the surface where the ball would stop if there were no momentum.

Testing Your Network

Several datasets are available for testing your network:

Start out by thoroughly testing a 2 × 2 × 1 network on the AND and OR functions to make sure your code works correctly. Your network should have no trouble learning these functions quickly. Then try XOR, which will take longer but should still be learnable as long as you use momentum. Try seeing what happens if you set momentum to zero. Also try using different learning rates.

Next, try out the 8 × 3 × 8 auto-association task, which we discussed in class. After training the network, examine the hidden representations created for each input pattern. You should define methods for easily printing out and saving these representations in a file. Plot the hidden representations in 3-D using the gnuplot program, which is available on the Pomona Linux machines. The command for creating a 3-D plot from a file of vectors data.hiddens is shown below. You can also click and drag the plot to change the viewing angle.

gnuplot> splot "data.hiddens"

Now try out the cancer and digits data. Train your networks until the total error drops below some acceptable threshold. How well are they able to classify the training data? Test out generalization ability by training the networks on a subset of the available data and evaluating its performance on the test set. Does it still perform well for these novel patterns? Experiment with different sizes of training/testing sets, different termination thresholds, and different learning rates.

Finally, try creating a cluster diagram of the hidden representations using the cluster analysis program available in /common/cs/cs151/nnet-analysis on the Pomona Linux machines. To do this, first train the network on some portion of the dataset and then save the resulting hidden representations to a file. You will also need to create a parallel file of labels corresponding to each of the saved hidden vectors, which will be incorporated into the cluster diagram. These labels are just strings that correspond to the target classification associated with each hidden vector. For example, the label for a hidden representation with the associated target pattern 0 0 0 1 0 0 0 0 0 0 should be "3". See the README file for further information.

Turning in Your Homework

Put your code into a single Python file called backprop.py and submit it by running the following script from any Pomona Linux machine. Note: be sure to run this script from the directory containing your program.

/common/cs/submit/cs151-submit

This script will prompt you for the assignment name (hw07) and then your file to submit. More information on using cs151-submit is available here.

If you have questions about anything, just ask.