CS267 January 1995 Announcements

Also see the CS267 newsgroup

Mon Jan 30 00:31:29 1995

The Sharks and Fish code mentioned in assignment 2 is now all in place.

Sat Jan 28 14:49:15 1995

Because of various CM5 related problems, Assignment 2 and Assignment 3 (the same problem, but using CMMD and Split-C) is now due on Thursday, February 9th.

Wed Jan 25 18:33:37 1995

Some advice: In general, a fast memory-hierarchy cognizant matrix multiply routine on one machine will be fast on another comparable machine. If you're getting frustrated with alfa-romeo being loaded, try doing some of the debugging (and even performance tuning) somewhere else. Admittedly, fine tuning for the Power2 architecture and AIX compiler isn't possible. However, most of the general approaches we've talked about will yield similar performance increases on other machines, especially if you've coded them in a parameterized way. Worried about not having 32 64-bit double precision fp registers? Work temporarily with 32-bit single precision registers, of which most machines have 32. Once you're happy about your routines performance, try it out again on alfa-romeo, and then do architecture specific tuning. Also, this division is a natural split between partners.

Mon Jan 23 22:04:44 1995

  • If anyone is willing to accept one more member in their group, please send mail to zege@grace.lbl.gov.
  • Mon Jan 23 04:41:05 1995

  • The test driver for the matrix multiply contest is finally ready.

    First, it tests that multiplying a few randomly sized matrices produces the correct answer. If the 'error' is greater than a certain bound, the routine is disqualified, a horrible fate indeed. Jim will discuss the error calculation (and what happens to you if you're disqualified :-) sometime this week.

    Second, it does the timing in two groups. Quad-word (16byte) aligned matrices of various sizes are timed first. In this case, the number of columns is always even and the matrix pointers are zero in their least significant 4 bits (you may be able to optimize for this case). Then, we time arbitrary sized and aligned matrices where no alignment assumption may be made.

    The stdout output of the program using ESSL's dgemm is as follows ([matsize,mflops] pairs):

    16 176.690196
    32 240.941176
    64 252.189165
    128 252.116634
    256 247.717283
    23 202.783333
    43 227.162857
    61 241.469149
    79 241.041289
    99 238.080144
    119 238.305313
    151 245.925071
    
    Relative rankings will be determined as follows: each teams mflops rating for each size will be divided by the corresponding ESSL number, and the resulting vector will be averaged producing an overall score.

    You should also be aware that we are essentially ignoring the effects of compulsory cache misses (i.e., initial data not being in the cache) in our timings. For the smaller problem sizes, the entire problem might fit in cache, so we'll be testing in-cache performance after the first iteration. For the larger problem sizes, the problem will never all fit in cache, and the effects of your L1 blocking strategy will start shining through. The 4-way set associativity further complicates matters. The main point? Just realize that we're treating these two sets of problem sizes essentially the same in our timings.

    You have until this Thursday at 6:00pm to have everything up and running. By then, in your alfa-romeo.cs home directory, create a file called 'mul_mfmf_mf.c' containing a routine conforming to:

    void
    mul_mfmf_mf(
    	int matdim, 
    	const double *A,   
            const double *B,
            double *C);
    
    We'll compile and run the routine at the contest, so please try and get things working beforehand.

    Also, create a file in your alfa-romeo home directory called 'copts'. This file should contain the C compiler optimizations you wish to use. The file should be such that

        cc `cat copts` -c mul_mfmf_mf.c 
    
    will work. Remember, -O3 is not necessarily better than -O2. Also, look at the '-q' options. You can view all the options using the program 'xlc' or see IBM's options page.

    One last thing. Make sure the files are world readable (this should be the very last thing you do) since we don't have any privileges on this machine.

    Good luck to everyone. And as Raph said, may the best dgemm win!!!

  • Fri Jan 20 23:01:59 1995

  • In case there is any confusion, you only need to implement C = C + A*B rather than C = A*B. In other words, you only need to accumulate into C rather than zero it first. The key ideas for fast implementation are still the same. There was a typo in the routine interface specification in assignment 1.
  • Fri Jan 20 19:03:43 1995

  • The RS6000/590 accounts are ready and are on the machine alpha-romeo.cs. You should have received mail about your account. If you did not, I did not receive your group information for some reason.
  • Fri Jan 20 13:03:53 1995

  • PLEASE EMAIL ME your groups. I'll be mailing each team RS6000 account information later today. If you're thinking about taking the class and/or not in a group for some reason, please post news to the cs267 newsgroup.
  • There was a bug in the 2x2 matrix code in the notes file. The two lines just before the 2nd accumulation into c11 and c12 should read:
       b1 = bp[0]; b2 = bp[1];
       a = ap[0]; ap += Asep;
    
    Thanks to Manuel Fahndrich and a currently unknown student during discussion for pointing these out.
  • Fri Jan 20 01:54:31 1995

  • Notes for lecture 1 and discussion 1 are now online.
  • The programming contest will be held in the Hogan Room, 5th floor of Cory Hall, Thursday January 26th, from 6:30PM on.
  • Thu Jan 19 17:44:55 1995

  • Please send me mail giving the names and email addresses of each person in your group for the matrix multiplication contest. Thanks. -- Jeff
  • Thu Jan 19 14:38:32 1995

  • We will meet today at 6:30pm in 3:10 Soda to discuss logging on to the CM5 and matrix multiply.
  • A good set of papers on the general issue of blocking algorithms for memory hierarchies is here. Look at the README file for a list of titles of each paper. tile.ps is also interesting.