Notes for first discussion

This file is: http://www.cs.berkeley.edu/~demmel/cs267/disc1.html

Outline

  1. Administrative stuff.
  2. How to log in, compile and run "hello world".
  3. Fast 2x2 Matrix Multiply

How to log in, compile and run "hello world".

  1. Machine names rodin, moore
    add to PATH /usr/cm5/bin:/usr/cm5/local/bin
    add to MANPATH /usr/cm5/man:/usr/cm5/local/man
  2. Basic Architecture (Jim will cover this in class, also see http://www.cs.berkeley.edu/~demmel/cs267/cm5docs/tech-summary).
  3. CM5 executables:
    consists of a host program to run on the
    host, and a node program.
    Sometimes the host program is invisible to you (split-C, cm-fortran). Each program is a separate normal object file, each with it's own 'main()' routine.
  4. Programs:
          cmps: display the status of current CM processes
                Useful to tell how loaded the machine is.
                *'d pid tells which one is currently running.
          djm: distributed job manager
             Method to submit batch jobs. Does a nice
             job with redireting I/O appropriately.
               -help to get options
             Main interface 'job'
             Useful Commands:
                 status  Show the status of all running jobs   
                 submit  Submit a background job
                 run     run an interactive job
                 signal  send a signal to a running job
                 kill    kill a running job
          cmjoin: produce a CM5 executable from
             two standard executable files.
          cmld: CM5 link editor for object files.
              This is what you use in a Makefile.
    
  5. Things to be aware of:
         ts-daemon: Liaison process between host and nodes
           Sometimes goes down.
         Mail to fraser@cs if the thing goes down. 
    
         
    
  6. Compiling and linking:
         - host/node programs
         - sometimes it's hidden from you.
         - Standard .o files.
         - Use an already working makefile (take the one we supply)
    

  7. hello-host.c
    #include <stdio.h> #include <cm/cmmd.h> main() { CMMD_enable(); printf("Hello world from host.\n"); }
  8. hello-node.c
    #include <stdio.h> #include <cm/cmmd.h> main() { int myaddress; CMMD_fset_io_mode(stdout, CMMD_independent); myaddress = CMMD_self_address(); printf("Node %d: Hello world\n",myaddress); CMMD_enable_host(); }
  9. Makefile
    TARGET          = hello
    HOST_SRCS       = hello-host.c
    HOST_OBJS       = $(HOST_SRCS:%.c=%.o)
    NODE_SRCS       = hello-node.c
    NODE_OBJS       = $(NODE_SRCS:%.c=%.o)
    INCDIRS         = /usr/cm5/include
    HOST_LIBDIRS    = /usr/cm5/lib
    NODE_LIBDIRS    = /usr/cm5/lib
    HOST_LIBS       =
    NODE_LIBS       =
    CC              = gcc
    
    %-host.o: %-host.c
            $(CC) -DCP_CODE $(CFLAGS) $(INCDIRS:%=-I%) -c $<
    %-node.o: %-node.c
            $(CC) $(CFLAGS) $(INCDIRS:%=-I%) -c $<
    $(TARGET):      $(NODE_OBJS) $(HOST_OBJS)
            /usr/cm5/bin/cmmd-ld \
            -comp $(CC) -o $(TARGET) \
            -node $(NODE_OBJS) $(NODE_LIBDIRS:%=-L%) $(NODE_LIBS:%=-l%) \
            -host $(HOST_OBJS) $(HOST_LIBDIRS:%=-L%) $(HOST_LIBS:%=-l%)
    clean:
            rm -f $(TARGET) $(NODE_OBJS) $(HOST_OBJS) *~ .make.state .nse_depinfo
    
    

See document 'CMMD User's Guide' underneath http://www.cs.berkeley.edu/~demmel/cs267/cm5docs.html

2x2 Matrix Multiply

Suppose you have 4 local variables contained in registers and named c11, c12, c21, c22. You also have two 'float*'s A and B, and you want to do a matrix-matrix accumulate into the matrix defined by cij. Also, Asep and Bsep are integer that define the distance in floats between two rows (i.e., the col dim of A and B resp). Goal: Code this in such a way that the compiler can easily use only 8 memory references, 8 multiply-accumulates, and a minimal number of additional 32 bit fp registers.

Solution:

#define mul_mfmf_mf2x2(c11,c12,c21,c22,A,Asep,B,Bsep)
{
   const float *bp,*ap; 
   float b1,b2;
   float a;

   bp = B;
   b1 = bp[0]; b2 = bp[1]; 
   bp += Bsep; 

   ap = A;
   a = ap[0]; ap += Asep;

   c11 += a*b1; /* c11 += a11*b11 */
   c12 += a*b2; /* c12 += a11*b12 */

   a = ap[0]; ap = &A[1];

   c21 += a*b1; /* c21 += a21*b11 */
   c22 += a*b2; /* c22 += a21*b12 */

   b1 = bp[0]; b2 = bp[1];
   a = ap[0]; ap += Asep;

   c11 += a*b1; /* c11 += a12*b21 */
   c12 += a*b2; /* c12 += a12*b22 */

   a = ap[0];

   c21 += a*b1; /* c21 += a22*b21 */
   c22 += a*b2; /* c22 += a22*b22 */
}
Discuss implications of this for the RS6000?