CS287 Home Page --- PS1 Q&A

PS1 Q&A

[2009/10/25] Q4: typo in costfun.m, the provided file has dgdu = R, this should be replaced by dgdu = 2*R*u.
[2009/10/22] Extensions: A visualizer for the acrobot (works similarly to the visualizer you already have for the cartpole): draw_acrobot.m
[2009/10/22 (updated)] Q3: Helicopter graphics released! Thanks to Jie Tang and Arjun Singh! How does it work? [Please address any further questions to jietang AT eecs ]
A: [Windows] Download and run the following installer. If you want to avoid having to provide command line arguments, you should install it such that your directory structure would look as follows:
a_directory\PS1-starter-code\
a_directory\heli_visualizer-0.1\
Here "a_directory" could be any directory of your choice. If you adhere to this directory structure, then if you run the heli-visualizer from the start menu, it will look for the following files in your "a_directory\PS1-starter-code\Q3_stabilization_trajectory_following_helicopter_q" directory: hover_log.txt, trajectory_log.txt, trajectory_target_log.txt. Whichever subset of these files it finds, it will then play in the visualizer using the following ordering of colors: red, green, blue.
You can also run from the command line: To provide command line arguments, there is a folder heli_visualization-0.1/game with a main.pyc file. You can run that with:
"python main.pyc ..."
You can also hit "o" to open a file browser and load the input file.
A: [Linux] Installer: installer. Tested on Ubuntu 9.04.
[2009/10/16] Q3: Is the helicopter trajectory going 5m down rather than up?
A: Yes, it is. We use the North, East, Down coordinate frame. So (5,0,5) is 5m North, and 5m Down from the origin. If you prefer your helicopter to fly upwards, you can make your target (5,0-5).
[2009/10/16] Q1: Why are the simulations starting from the corners of a square x in {-.5, .5}, xdot in {-.5,.5}?
A: This choice ensures the trajectories are far enough away from the boundary box so you don't get any funny effects caused by only considering a finite subset of the state space. That being said, it does let some of the discretization go to waste. You could consider moving the starting points a little further away from the origin and use these new starting points to study the performance of the algorithms.
[2009/10/16] Q1: The nearest neighbor discretization does not work very well. Am I doing something wrong? Maybe the discretization is not doing the right thing?
A: Copied from the problem set handout: "If your implementation is similar to mine, you might notice the nearest neighbor model is doing particularly poorly. Can you find an explanation? [Hint: inspect the actions it chooses and inspect its transition models.]" Ok, so the point of this problem is to think about how these algorithms work, and why the nearest neighbor is breaking down and how you might be able to fix this. Yes, you might have to look into the provided discretization code. [And for the extension you would have to actually implement a fix, test it and report on its performance.]
[2009/10/15] Q3: What are Q and R supposed to be?
A: They were missing from your starter file. They have been added to Q_heli_hover.m and Q_heli_trajectory.m.
[2009/10/15] Q1: Does roll-out mean you have to simulate the "current" policy all the way till the end for evaluating a single action? Would that take a long time to run?
A: Yes, and yes. However, the point of the extension is for you to understand/implement/investigate roll-out. If you can do this satisfactorily over a shorter time duration version of the control problem, so much the better.
[2009/10/15] Q1: In execute_greedy_policy_wrt_V(...), what does greedy w.r.t. V mean?
A: The greedy policy w.r.t. to the value function is the policy that selects: \arg\min_u E [ g(s,u) + \gamma V(s') ], where s' is the state at the next time. For the deterministic setting, this simplifies to \arg\min_u g(s,u) + \gamma V(s'). The next-state s' is rarely a grid point, so you will need to perform nearest neighbor than well Delaunay based interpolation to find V(s').
[2009/10/15] Q1: Some figures don't have titles.
A: Fixed.
[2009/10/15] Q3: I don't understand the linearized_heli_dynamics.m very well.
A: Yes, it was much fancier than you would need for solving this problem and a few of the comments were out of date. I posted a version with fixed comments. Here is a simpler version which you most likely would prefer to use for your assignment---linearized_heli_dynamics2.m.
[2009/10/14] Q1: Q1 assumes you work in discrete time. I have updated the pdf file to make this even more explicit.
[2009/10/11] Q1: build_discretized_MPD_for_brick.m was an old version which had various spurious code as well as handled the discounting in a different way. Here is a fixed version build_discretized_MDP_for_brick.m. I updated the posted starter code to reflect this bug fix.