Monte Carlo Simulation.



next up previous
Next: Using and extending Up: No Title Previous: Extensions to ScaLAPACK.

Monte Carlo Simulation.

This problem is offered by Prof. Andrew Gelman of the Statistics Dept. (gelman@stat, 642-9959). Please see him for details.

Background. An important problem in statistical computation is Monte Carlo simulation, and a commonly-used method is Markov chain, or ``random walk'' simulation, in which a computer program simulates a random walk through a probability distribution. The idea is to run the random walk long enough so that it has time to take a tour through most of the distribution. A basic idea in implementation is to simulate several parallel random walks, so that you can (1) go through several parts of the distribution at once, and (2) compare the paths of the several random walks to assess convergence. When the random walks are close to convergence to their ergodic distribution, their paths will have similar statistical properties.

Our current set-up. We currently have a program written in ``serial/parallel'': it starts m random walks, then cycles through them, performing one step in each random walk, until each walk has gone through n steps. Typical values are m=10, n=10000. In some implementations, the program pauses occasionally (perhaps every 1000 steps) to (a) alter the random walk algorithm, and (b) assess convergence. (Output from the random walks up to time t can be used to alter the random walk algorithm and improve the efficiency of the simulations.)

Our goal. We want a parallel set-up in which each of m processors is continually running a random walk. The parallel processors can be in a single machine or, perhaps more usefully, on a network of workstations. The random walks should be run asynchronously, if possible (so that if one of the processors is busy, the others don't have to wait for it). There should also be a processor used as a controller. The controller gives the m processors starting points for their walks; the controller also supplies them with some parameters that govern the random walk algorithms (an example of a parameter is delta, the length of a jump in each step of the walk). The m processors are writing their results into some shared memory, and occasionally the controller reads the memory and uses the results to (a) decide how to alter the random walk algorithms, and (b) assess convergence (i.e., decide whether to stop). The results from (a) are fed back into the random walks, and so forth.

The technical difficulties seem to be: (1) setting up the asynchronous processes with the shared memory, and (2) having the controller be able to read this information without interfering with the processors that are writing it. I'll supply a serial program with all the algorithms so that the issue for you is to parallelize it.

I think that parallel processing for this family of algorithms will be useful to a lot of applied statisticians. Of course, I expect this will result in a publication in a statistics journal (I don't know what gets published in C.S. journals) and will lead to many other applications.



next up previous
Next: Using and extending Up: No Title Previous: Extensions to ScaLAPACK.



Boris Vaysman
Sun Jan 14 22:42:23 PST 1996