Problem Statement
The objective of the project is to use the aerial map image from
Google maps of the Penn Campus and to learn a model for predicting
the optimal path which will be taken by a car and a pedestrian by
using imitation learning (a type of reinforcement learning). So in
the project a parameterized cost map generated using an
exponentiated sum of some weighted features. Using expert examples
of optimal policies in the Learch Algorithm, I get the weights for
the cost map that will produce behavior similar to that of the
expert policy.
Description of approach
A large chunk of this project work was focused on feature
engineering and choosing a good seed for the feature weights to
feed into the Learch algorithm for imitation learning. Throughout
this report I make references to the Learch algorithm which is
based on the paper, "Learning to Search: Functional Gradient
Techniques for Imitation Learning" by Ratliff et al. First the
image was resized to one eight the original size and then I
extracted a bunch of features for every image pixel. By looking at
the features I did an educated guess of the weights to be used as
a seed to the Learch algorithm which works better and faster when
the initial cost map is close to the ideal one. Now, I got a
better estimate of the weights using Learch, though I found in
some cases the initial seed was a better estimate when the
learning rate is not properly set. In every iteration in the
Learch algorithm, I compute the cost map with the recently
computed weights which I augment by adding a loss field function
which increases the costs at the desired path by a fraction of the
maximum value of the current cost map (10%). Then I used dijkstra
search, to get an optimal path. Now I take all the points in the
optimal path and the ones in the desired path of all the MDP's
perturb them randomly upto 3 pixels along x and y. I run a binary
classification between the features of the points on the perturbed
optimal path and the desired path with a L2 regularized Linear
Support Vector Machine using Liblinear. I update the old weights
with the weight obtained from the classification after scaling
with an exponentially decaying learning rate. This procedure was
done for both the car and the pedestrian.