CS 151 Assignment Final Project

CS 151 Final Project — Robot Learning

Due by noon Thursday, December 15 (No extensions)

Introduction

For your final assignment you will use some form of neural network learning to teach a robot to perform a task of your choice. You are strongly encouraged to work in teams of two. As a first step you must determine an appropriate task given the available tools. You will be using the Pyrobot simulator to control a Pioneer robot through Pyro. Here are some of the capabilities that you could incorporate into your task:

Color: robots, walls, and boxes can be colored
Sonar sensors: provide range information to obstacles
Light: bulbs of different colors can be placed on moving robots
Simulator device: allows you to position entities at precise locations and to query them about their current locations
Multiple robots: it is possible to separately control several robots in the same simulated world

If you plan on using one of these features, then experiment with it first to be sure you understand its capabilities. There are examples in the Pyro web pages.

Formulating a task

Create a detailed description of your task. How will you represent the input state? How will you represent the motor output? Since you will be using a neural network as the learning mechanism, you should think about how to scale the inputs and outputs to values between 0 and 1.

In order to use a reinforcement learning method you will need to create a reinforcement procedure. Typically this procedure would take two states, the state prior to executing an action and the state that resulted from executing that action. It would then return a reinforcement value: negative for punishment, 0 for none, or positive for reward. If you plan on using a Genetic Algorithm to evolve the weights of a neural network, then the reinforcement (i.e., fitness) values should always be positive.

The frequency with which you provide a non-zero reinforcement value will determine how difficult the task is to learn. Delayed tasks, where reinforcement is only given at the time of goal achievement, are the hardest. Immediate tasks, where reinforcement is given at every time step, are the easiest. Intermediate tasks, where reinforcement is sporadic, are also possible. If you are using a Genetic Algorithm, then every task is essentially delayed because feedback from the fitness function is only given at the end of a task.

Possible tasks might include robots learning to play hide and seek, to clean up their environment, to follow (or avoid) other robots, or to herd other robots into a target zone. Unfortunately, the Pyrobot simulator does not currently support vision or gripper devices. However, since objects can be moved around in the simulator under program control, it would not be too hard to simulate the gripping or ingestion of objects by the robot by simply moving them temporarily out of the way when the robot "picks them up" or "eats" them.

Learning options

Your learning task should use one of the approaches below. If you would prefer to use some other approach in your project, email me or come talk to me about it.

Use the Complementary Reinforcement Backpropagation (CRBP) idea to train a neural network controller that takes states as input and produces motor commands as output.
Use a Genetic Algorithm to evolve the weights of a neural network controller that takes states as input and produces motor commands as output.

Examples

Here is a simple example that shows how to run multiple robots in a single world directly from Python, without using the Pyro GUI. To run it, type

python -i MultipleRobots.py

Here is a larger example showing how to use a GA with robots in Pyro. The learning robot's task is to seek out a moving light source, which is a light bulb attached to another robot. This example uses five files to implement a GA-based learning system:

TwoFireflies.py: a Pyrobot world with two Pioneer robots, one red and one purple.
AvoidObstacles.py: a simple obstacle-avoidance behavior for the purple wandering robot.
FindLightNN.py: a neural network brain for the red learning robot.
FindLightGA.py: a genetic algorithm to evolve weights for this neural network brain. To begin evolving, execute the following command:
```
python FindLightGA.py
```
TestNN.py: a program to test out the best weights found by the evolutionary process. For example, to observe the behavior of the robot from generation 20 using the weights saved in the file light-gen20.wts, execute the following command:
```
python TestNN.py light-gen20.wts
```

Notice in FindLightGA.py that at the end of every generation the weights of the best individual in the population are saved. Be sure to do this in your genetic algorithm as well. Then if your simulation is interrupted for some reason, you'll be able to re-seed a new population with the saved weights, and re-start evolution from that point rather than having to start from scratch. Also notice that you can use the flush() method on a file pointer to force it to write to the file immediately.

Evolving a good solution to a robot learning problem typically takes many hours of simulation time, so it is crucial that you get your simulation up and running as soon as possible, in order to allow yourself enough time to complete several runs of the genetic algorithm.

Pyro tips

Pyro is available on the following Pomona machines (note: do not use linus). Your Linux username/password should work on all of them:
- marcie, snoopy, woodstock, huey, dilbert, garfield, bucky, satchel, caulfield, patty, calvin, hobbes (.cs.pomona.edu)
Alternatively, you can make your own Live Pyro CD. This CD boots your computer directly into Knoppix Linux from the CD-ROM drive, and contains a complete installation of Pyro along with all of the simulators. To make a Pyro CD (which takes about 5 minutes), take a blank CD-R disc to the Pomona CS lab (all of the machines have a CD burner), put it in the drive, and then type:
```
cd /common/sys/iso
cdrecord pyrobot-4.3.0.iso
```

When starting the Pyro GUI, you can specify a simulator, a world, a robot connection, and a brain on the command line as follows:

pyrobot -s simulatorName -w worldFile -r robotName -b brainFile

For example:

pyrobot -s PyrobotSimulator -w Tutorial.py -r PyrobotRobot60000 -b AvoidObstacles.py

If you are creating a custom world for your task, you may want to look at or modify one of the built-in Pyrobot simulator worlds as a starting point. These worlds are available in the directory /common/cs/python/pyrobot/plugins/worlds/Pyrobot/ on the Pomona Linux machines.
Pyro uses ports and sockets to communicate between robots and the simulator. For example, the PyrobotRobot60000 and PyrobotRobot60001 robots communicate on ports 60000 and 60001, respectively. Therefore, if one student is running a simulation on a particular computer using these robots, no other student will be able to use these same robots on that computer. Attempting to do so will give the error message "socket not created: address already in use".
Remember that you should always close the Pyro window first when exiting. This will automatically shut down the simulator for you. If you forget and quit the simulator first, Pyro may hang, and you'll have to kill your Pyro and Python processes by hand from the command line. To do this, you can use the killall command:
```
killall -9 pyrobot python
```

Paper guidelines

You will turn in a 4-6 page writeup describing your project. Your writeup should include the following:

A short abstract of 200 to 300 words summarizing your results.
An introduction describing the robot learning task.
A detailed description of your experiments. There should be enough information so that someone could redo your experiments.
An explanation of the results. Use figures and tables where appropriate.
A discussion of the significance of your results.

Your grade will be based on the thoroughness of your experiments and the clarity and readability of your paper, not on whether your experiments succeeded or failed. Negative results can be as important as positive results. If you work with a partner, you only need to submit one copy of your writeup, but make sure to include both of your names.

Turning in your project

Your paper must be in my office or in my inbox by noon on Thursday, December 15. No extensions will be granted; no exceptions.
Your code is also due at the same time. Use cs151-submit to turn in your files, including a sample of your most interesting results and the steps needed to demo your robot's behavior. You must include a README file that explains how to run your program with these files.

If you have questions about anything, just ask.