More formally, we are given a graph G=(N,V,W) consisting of a set N of n nodes
(or cities), a set of edges V = {(i,j)} connecting cities, and a set of
nonnegative weights W = {w(i,j)} giving the length of edge (i,j) (distance from
city i to city j). The graph is *directed*, so that an edge (i,j)
may only be traversed in the direction from i to j, and edge (j,i) may or
may not exist. Similarly, w(i,j) does not necessarily equal w(j,i), if both
edges exist.

There are a great many algorithms for this important problem, some of which
take advantage of special properties like *symmetry* (edges (i,j) and
(j,i) always exist or do not exist simultaneously, and w(i,j) = w(j,i))
and the *triangle inequality* (w(i,j) <= w(i,k) + w(k,j) for all i,j,k).
In this assignment we assume none of these properties hold. For simplicity,
though, we assume all edges (i,j) exist, and all w(i,j) are positive
integers (note that setting some w(i,j) to be very large effectively
excludes it from appearance in a solution).

For an interesting graphical demonstration of a TSP problem with US cities (where symmetry and the triangle inequality both hold), start up matlab, and then

We will describe a sequence of solutions, in pseudo-code, starting from a
naive *exhaustive search algorithm*, to a naive
*branch-and-bound algorithm*, to a more sophisticated branch-and-bound
algorithm.

For your parallel implementation you should use the *task queue*
which is part of the *Multipol* library of distributed data
structured produced by Prof. Yelick and her students. Multipol stands for
Multiported Object Library, about which you will hear more in lecture.
The task queue is used for *dynamic load balancing*, or distributing
work among processors to keep them all gainfully employed, in the situation
where the workload is generated unpredictably at run-time, precluding any
static preassignment of work to processors.

A sample application which uses the task queue and Split-C to solve the ever-popular N-Queens problem may be found in /usr/castle/share/proj/cpwen/queen. You may copy this to your own directory. To make the program simply type "gmake". To run the program. type "queen" followed by the number of processors to use. Task queue documentation will be forthcoming soon.

Your assignment is

Exhaustive Search Solution of TSP w = w(1,2) + w(2,3) + w(3,4) + ... + w(n-1,n) + w(n,1) Best_S_so_far = ( n, [ 1, 2, 3, ... , n-1, n ], w ) S = ( 1, [ 1 ], 0 ) Search( S, Best_S_so_far ) print Best_S_so_far procedure Search( S, Best_S_so_far ) let ( k, [ i1, i2, ... , ik ], w ) = S let ( n, [ i1B, i2B, ... , inB ], wB ) = Best_S_so_far if k = n then new_w = w + w(ik,i1) if new_w < wB then Best_S_so_far = ( k, [ i1, i2, ... , ik ], new_w ) end if else for all j not in [ i1, i2, ... , ik ] new_w = w + w(ik,j) New_S = ( k+1, [ i1, i2, ... , ik, j ], new_w ) Search( New_S, Best_S_so_far ) end for endif return end

This algorithm searches all (n-1)! possible paths starting at 1, and keeps the best one. There is a great deal of parallelism, because after k recursive call to Search, there are (n-1)*(n-2)*...*(n-k) independent subtrees to search, which can be farmed out to as many processors. Since subtrees are all equally large, the load balance is perfect.

Naive Branch-and-Bound Solution of TSP w = w(1,2) + w(2,3) + w(3,4) + ... + w(n-1,n) + w(n,1) Best_S_so_far = ( n, [ 1, 2, 3, ... , n-1, n ], w ) S = ( 1, [ 1 ], 0 ) Search( S, Best_S_so_far ) print Best_S_so_far procedure Search( S, Best_S_so_far ) let ( k, [ i1, i2, ... , ik ], w ) = S let ( n, [ i1B, i2B, ... , inB ], wB ) = Best_S_so_far if k = n then new_w = w + w(ik,i1) if new_w < wB then Best_S_so_far = ( k, [ i1, i2, ... , ik ], new_w ) end if else for all j not in [ i1, i2, ... , ik ] new_w = w + w(ik,j) if new_w < wB then New_S = ( k+1, [ i1, i2, ... , ik, j ], new_w ) Search( New_S, Best_S_so_far ) end if end for endif return end

A sequential version of this algorithm may be viewed in /usr/castle/share/proj/cpwen/tsp. You may copy this to your own directory. To make, type "gmake". To run, type "tsp" followed by the maximum number of search nodes (i.e., recursive calls to tsp).

Now the search tree is not longer perfectly balanced, with some subtrees being pruned away. This makes load balancing more difficult. Statically assignment subtrees of the search tree to processors as for the first algorithm could lead to some processors being idle, while others have all the work to do. This requires dynamic load balancing, for which you can use the task queue in Multipol.

----------- | all solns | ----------- / \ ---------------- ----------------- | solns with e_i | | solns w/out e_i | ---------------- ----------------- / \ / \ ----------- ---------- ---------- ----------- | with e_j | | w/out e_j | | with e_k | | w/out e_k | ----------- ---------- ---------- -----------Three questions need to be addressed: 1) How do we bound the weight of the solutions in each subtree? 2) How is the set of solutions represented? 3) How do we choose the splitting edge at a given node in the search space?

**Bounding the solutions: **
Assume the input is given as a dense adjacency matrix. We will put
infinities, rather than zeros on the diagonal to avoid traversing
these self-edges.

i\j 1 2 3 4 5 6 7 \ ________________________________________ 1 |Inf 3 93 13 33 9 57 2 | 4 Inf 77 42 21 16 34 3 | 45 17 Inf 36 16 28 25 4 | 39 90 80 Inf 56 7 91 5 | 28 46 88 33 Inf 25 57 6 | 3 88 18 46 92 Inf 7 7 | 44 26 33 27 84 39 InfWe can subtract a constant from a given row or column, as long as the values remain non-negative. This changes the weight of each tour, but not the set of legal tours or their relative weights. We therefore normalize the matrix by subtracting the minimum value in each row from the row and the minimum value of each column from the column. This results in a matrix with at least one zero in every row and column.

i\j 1 2 3 4 5 6 7 \ ________________________________________ 1 |Inf 0 83 9 30 6 50 2 | 0 Inf 66 37 17 12 26 3 | 29 1 Inf 19 0 12 5 4 | 32 83 66 Inf 49 0 80 5 | 3 21 56 7 Inf 0 28 6 | 0 85 8 42 89 Inf 0 7 | 18 0 0 0 58 13 InfAny solution must use one entry from every row and every column, so the sum of the values we just subtracted is a lower bound on the weight of solution. In this case, we subtracted [3 4 16 7 25 3 26] from the rows and then [0 0 7 1 0 0 4] from the columns, so the lower bound on the weight of any solution is 96. We can now define a search tree in which each node represents a set of solutions along with a lower bound on those solutions.

**Representing the set of solutions:** The adjacency matrix can be
used to represent the set of solutions. When an edge is chosen for the
solution, the row and column containing that edge is deleted. When
an edge is eliminated from the solution, its entry is changed to
infinity so that it will never be chosen.
In the above example, assume we choose to split the search space on
the edge from 4 to 6.
For the *right subtree*, which represents
all solutions *not* containing (4,6),
we replace the (4,6) entry by infinity.
The minimum value in row 4 is now 32, so we can renormalize the
matrix and improve the lower bound to 96+32 = 128. Column 6 has
another 0 entry, so it remains unchanged. The matrix for the
right subtree is:

i\j 1 2 3 4 5 6 7 \ ________________________________________ 1 |Inf 0 83 9 30 6 50 2 | 0 Inf 66 37 17 12 26 3 | 29 1 Inf 19 0 12 5 4 | 0 51 34 Inf 17 Inf 48 5 | 3 21 56 7 Inf 0 28 6 | 0 85 8 42 89 Inf 0 7 | 18 0 0 0 58 13 InfThe

i\j 1 2 3 4 5 7 \ __________________________________ 1 |Inf 0 83 9 30 50 2 | 0 Inf 66 37 17 26 3 | 29 1 Inf 19 0 5 5 | 0 18 53 4 Inf 25 6 | 0 85 8 Inf 89 0 7 | 18 0 0 0 58 InfThe process outlined so far can be used by repeatedly following these steps. Following the left branches we are guaranteed to reach some solution in n levels. Following the right branch, we will eventually have a matrix full of infinities, at which point the lower bound (if we kept searching) would become infinity. Pruning is done by eliminating any subtrees whose lower bound is larger than the best lower bound seen so far for a complete solution.

**Choice of the splitting edge:**
In general, the right branches represent a larger set of solutions than
the left branches. Therefore, in choosing the splitting edge, we look
for something that will raise the lower bound of the right-hand subtree
as much as possible. In the example, edge (4,6) was chosen because its
value was zero and the next larger value in row 4 was 32. There are
other zero entries in the matrix, for example, (3,5), but the improvement
in the lower bound for that case would have been only 1 (for row 3) plus
17 (for column 5). Therefore, the general rule is to search for the zero
entry at (i,j) that maximizes the increase in the lower bound (largest
sum of minimum in row i and minimum in column j, not counting the
zero at (i,j)).

**Insertion of infinities in left-branches:** When expanding a left
branch in the search tree for edge (i,j), we noted that edge (j,i)
should be made infinity. In general, we are trying to avoid the
creation of *non-covering cycles* in the tour, i.e., cycles that
traverse only a subset of the nodes. For the example above, assume
that the left-most search path unfolds with the following edge splits:
(4,6), (3,5), and (2,1). At this point the partial solution contains
three disconnected paths; all cycles were prevented by marking (6,4),
(5,3), and (1,2) as infinity.

----------- | all solns | ----------- / \ ------------ ------------- | with (4,6)| | w/out (4,6)| ------------ ------------- / \ ------------ ------------- | with (3,5)| | w/out (3,5)| ------------ ------------- / \ ------------ ------------ | with (2,1)| | w/out (2,1)| ------------ ------------The next choice of splitting edge is (1,4), so according to our above rule (4,1) should be changed to infinity. However, at this point the adjacency matrix does not contain a row for 4, since it was removed with the (4,6) step. Moreover, the partial solution is now: 2-1-4-6 and 3-5. If (6,2) were now added, we would have a non-covering cycle, so we need to change the (6,2) entry to Inf. Thus, the general rule for inserting infinities on left-branches is to construct the subpaths that exist in the partial solution and remove any edges from the end to beginning of each subpath by replacing them with infinities.