Read the article Recursive Distributed Representations by Jordan Pollack, which is reading #10 in your reading packet. This article describes a technique for encoding symbolic data structures such as parse trees into distributed representations suitable for processing by neural networks. We'll talk about RAAMs in class Tuesday.
Implement the backpropagation neural network learning algorithm and test it on a few datasets, using this code as your starting point. Print out and study this code until you understand how it works. The BPN constructor creates a new network with a specified number of input, hidden, and output units. The loadData method loads in a dataset of training patterns and their associated targets from a pair of text files. For example, to set up a network with 2 input units, 2 hidden units, and 1 output unit for learning the binary XOR function, we could do this:
net = BPN(2, 2, 1) net.loadData('xorInputs.dat', 'xorTargets.dat')
The splitData method randomly divides the current dataset into a subset of training patterns (stored in the lists net.trainInputs and net.trainTargets), and a subset of testing patterns (stored in net.testInputs and net.testTargets). The size of the training subset is specified as a percentage (0-100) of the entire dataset.
You will need to add additional methods in order to train your networks and evaluate their performance. The additional methods that you are required to implement are described in the comments in the starting code, and are summarized below. However, you are free to add any other methods or variables you like.
You should not modify the names or behavior of any variables or methods given in the starting code, nor should you change the basic internal structure of a network. Changing the values of various parameters and variables is, of course, fine (e.g., net.learningRate, net.tolerance, net.trainInputs, etc.).
The following additional methods are required in your implementation:
net.propagate(pattern) takes an input pattern, represented as a list of floating-point values, propagates the pattern through the network, and returns the resulting output pattern as a list of floating-point values. This method should update the activation values of all input, hidden, and output units as a side effect.
net.evalPatterns(patterns, targets, verbose=False) takes a list of input patterns and a list of their associated target patterns and evaluates the network's performance on the patterns, returning a tuple of 4 values: (correct, total, score, error). Total is the total number of individual values in the target patterns, correct is the number of these that the network got right (to within net.tolerance), score is the percentage (0-100) of correct values, and error is the total sum squared error across all values. The optional verbose argument, when True, should cause the output produced by the network for each pattern to be printed out.
net.teachPattern(pattern, target) should perform one backpropagation update for the given pattern and target, causing all of the weights in the network to be changed. This method is called by teachDataset, which, on each training cycle, performs one entire sweep through the current set of training patterns in a randomized order.
net.learn(cycles) trains the network for the given number of training cycles. This method should repeatedly call teachDataset, displaying the current cycle number and performance of the network after each call.
Your code should use the net.momentum parameter when updating the weights of the network. Adding a momentum term to the weight update rule of backpropagation often helps speed up learning. Momentum (α) is a constant in the range 0 < α < 1, which contributes a portion of the weight update from the previous time step to the current weight update:
Δwij(t) = η δi Oj + α Δwij(t - 1)
Think of this momentum term as influencing the direction of the gradient descent search through weight space, which is analogous to a ball rolling down the error surface. The effect of α is to add momentum that tends to keep the ball rolling in the same direction from one iteration to the next. This can sometimes have the effect of keeping the ball rolling through small local minima in the error surface, or along flat regions in the surface where the ball would stop if there were no momentum.
Several datasets are available for testing your network:
andInputs.dat, andTargets.dat, orInputs.dat, orTargets.dat, xorInputs.dat, and xorTargets.dat are simple binary functions useful for testing and debugging your code.
auto.dat is the simple auto-association example from class.
cancerInputs.dat and cancerTargets.dat is some actual clinical data from testing for the presence of breast cancer based on 9 measured variables: clump thickness, uniformity of cell size, uniformity of cell shape, marginal adhesion, single epithelial cell size, bare nuclei, bland chromatin, normal nucleoli, and mitoses. The single target value classifies each instance as either benign (0) or malignant (1).
digitsInputs.dat and digitsTargets.dat, consist of 6 × 6 binary images of the digits 0-9 in various styles, and their associated classifications (using a localist representation scheme). A more readable visual representation of the input patterns is also available.
Start out by thoroughly testing a 2 × 2 × 1 network on the AND and OR functions to make sure your code works correctly. Your network should have no trouble learning these functions quickly. Then try XOR, which will take longer but should still be learnable as long as you use momentum. Try seeing what happens if you set momentum to zero. Also try using different learning rates.
Next, try out the 8 × 3 × 8 auto-association task, which we discussed in class. After training the network, examine the hidden representations created for each input pattern. You should define methods for easily printing out and saving these representations in a file. Plot the hidden representations in 3-D using the gnuplot program, which is available on the Pomona Linux machines. The command for creating a 3-D plot from a file of vectors data.hiddens is shown below. You can also click and drag the plot to change the viewing angle.
gnuplot> splot "data.hiddens"
Now try out the cancer and digits data. Train your networks until the total error drops below some acceptable threshold. How well are they able to classify the training data? Test out generalization ability by training the networks on a subset of the available data and evaluating its performance on the test set. Does it still perform well for these novel patterns? Experiment with different sizes of training/testing sets, different termination thresholds, and different learning rates.
Finally, try creating a cluster diagram of the hidden representations using the cluster analysis program available in /common/cs/cs151/nnet-analysis on the Pomona Linux machines. To do this, first train the network on some portion of the dataset and then save the resulting hidden representations to a file. You will also need to create a parallel file of labels corresponding to each of the saved hidden vectors, which will be incorporated into the cluster diagram. These labels are just strings that correspond to the target classification associated with each hidden vector. For example, the label for a hidden representation with the associated target pattern 0 0 0 1 0 0 0 0 0 0 should be "3". See the README file for further information.
Put your code into a single Python file called backprop.py and submit it by running the following script from any Pomona Linux machine. Note: be sure to run this script from the directory containing your program.
/common/cs/submit/cs151-submit
This script will prompt you for the assignment name (hw07) and then your file to submit. More information on using cs151-submit is available here.
If you have questions about anything, just ask.