CS 151 Final Project — Robot Learning

Due by noon Thursday, December 15 (No extensions)

Introduction

For your final assignment you will use some form of neural network learning to teach a robot to perform a task of your choice. You are strongly encouraged to work in teams of two. As a first step you must determine an appropriate task given the available tools. You will be using the Pyrobot simulator to control a Pioneer robot through Pyro. Here are some of the capabilities that you could incorporate into your task:

If you plan on using one of these features, then experiment with it first to be sure you understand its capabilities. There are examples in the Pyro web pages.

Formulating a task

Create a detailed description of your task. How will you represent the input state? How will you represent the motor output? Since you will be using a neural network as the learning mechanism, you should think about how to scale the inputs and outputs to values between 0 and 1.

In order to use a reinforcement learning method you will need to create a reinforcement procedure. Typically this procedure would take two states, the state prior to executing an action and the state that resulted from executing that action. It would then return a reinforcement value: negative for punishment, 0 for none, or positive for reward. If you plan on using a Genetic Algorithm to evolve the weights of a neural network, then the reinforcement (i.e., fitness) values should always be positive.

The frequency with which you provide a non-zero reinforcement value will determine how difficult the task is to learn. Delayed tasks, where reinforcement is only given at the time of goal achievement, are the hardest. Immediate tasks, where reinforcement is given at every time step, are the easiest. Intermediate tasks, where reinforcement is sporadic, are also possible. If you are using a Genetic Algorithm, then every task is essentially delayed because feedback from the fitness function is only given at the end of a task.

Possible tasks might include robots learning to play hide and seek, to clean up their environment, to follow (or avoid) other robots, or to herd other robots into a target zone. Unfortunately, the Pyrobot simulator does not currently support vision or gripper devices. However, since objects can be moved around in the simulator under program control, it would not be too hard to simulate the gripping or ingestion of objects by the robot by simply moving them temporarily out of the way when the robot "picks them up" or "eats" them.

Learning options

Your learning task should use one of the approaches below. If you would prefer to use some other approach in your project, email me or come talk to me about it.

Examples

Here is a simple example that shows how to run multiple robots in a single world directly from Python, without using the Pyro GUI. To run it, type

python -i MultipleRobots.py

Here is a larger example showing how to use a GA with robots in Pyro. The learning robot's task is to seek out a moving light source, which is a light bulb attached to another robot. This example uses five files to implement a GA-based learning system:

Notice in FindLightGA.py that at the end of every generation the weights of the best individual in the population are saved. Be sure to do this in your genetic algorithm as well. Then if your simulation is interrupted for some reason, you'll be able to re-seed a new population with the saved weights, and re-start evolution from that point rather than having to start from scratch. Also notice that you can use the flush() method on a file pointer to force it to write to the file immediately.

Evolving a good solution to a robot learning problem typically takes many hours of simulation time, so it is crucial that you get your simulation up and running as soon as possible, in order to allow yourself enough time to complete several runs of the genetic algorithm.

Pyro tips

Paper guidelines

You will turn in a 4-6 page writeup describing your project. Your writeup should include the following:

Your grade will be based on the thoroughness of your experiments and the clarity and readability of your paper, not on whether your experiments succeeded or failed. Negative results can be as important as positive results. If you work with a partner, you only need to submit one copy of your writeup, but make sure to include both of your names.

Turning in your project

If you have questions about anything, just ask.