The
test driver
for the matrix multiply contest is finally ready.
First, it tests that multiplying a few randomly sized matrices
produces the correct answer. If the 'error' is greater than a certain
bound, the routine is disqualified, a horrible fate indeed. Jim will
discuss the error calculation (and what happens to you if you're
disqualified :-) sometime this week.
Second, it does the timing in two groups. Quad-word (16byte) aligned
matrices of various sizes are timed first. In this case, the number of
columns is always even and the matrix pointers are zero in their least
significant 4 bits (you may be able to optimize for this case).
Then, we time arbitrary sized and aligned
matrices where no alignment assumption may be made.
The stdout output of the program using ESSL's dgemm is
as follows ([matsize,mflops] pairs):
16 176.690196
32 240.941176
64 252.189165
128 252.116634
256 247.717283
23 202.783333
43 227.162857
61 241.469149
79 241.041289
99 238.080144
119 238.305313
151 245.925071
Relative rankings will be determined as follows: each teams mflops rating for
each size will be divided by the corresponding ESSL number, and the
resulting vector will be averaged producing an overall score.
You should also be aware that we are essentially ignoring the effects
of compulsory cache misses (i.e., initial data not being in the cache)
in our timings. For the smaller problem sizes, the entire problem
might fit in cache, so we'll be testing in-cache performance after the
first iteration. For the larger problem sizes, the problem will never
all fit in cache, and the effects of your L1 blocking strategy will
start shining through. The 4-way set associativity further complicates
matters. The main point? Just realize that we're treating these two
sets of problem sizes essentially the same in our timings.
You have until this Thursday at 6:00pm to have everything
up and running. By then, in your alfa-romeo.cs home directory,
create a file called 'mul_mfmf_mf.c' containing
a routine conforming to:
void
mul_mfmf_mf(
int matdim,
const double *A,
const double *B,
double *C);
We'll compile and run the routine at the contest, so please
try and get things working beforehand.
Also, create a file in your alfa-romeo home directory
called 'copts'. This file should contain the C compiler
optimizations you wish to use. The file should
be such that
cc `cat copts` -c mul_mfmf_mf.c
will work. Remember, -O3 is not necessarily better
than -O2. Also, look at the '-q' options. You
can view all the options using the program 'xlc'
or see
IBM's options page.
One last thing. Make sure the files are world readable
(this should be the very last thing you do) since
we don't have any privileges on this machine.
Good luck to everyone. And as Raph said, may the best dgemm win!!!