CMSSL RELEASE NOTES
Version 3.2, April 1994
Copyright (c) 1994 by Thinking Machines Corporation.
1 INTRODUCTION
***************
These release notes summarize the changes and new features in the CM
Fortran interface to Version 3.2 of the CM Scientific Software Library
(CMSSL). One section is devoted to each of the following topics:
o Hardware and software requirements
o New features
o Changes from previous versions
o Limitations and restrictions
o Documentation for this release
o Acknowledgments
CMSSL Version 3.2 includes all the functionality of Version 3.1, as
well as new functionality described in Section 3.
2 HARDWARE AND SOFTWARE REQUIREMENTS
*************************************
2.1 HARDWARE REQUIRED
----------------------
CMSSL Version 3.2 supports CM-5 systems with or without vector units,
and supports the nodal CMSSL library. You can also call this version
of CMSSL from global/local CM Fortran programs. The nodal and
global/local execution models require a CM-5 with vector units.
2.2 SOFTWARE REQUIRED
----------------------
CMSSL Version 3.2 for CM Fortran requires prior installation of the
following software:
o CMOST Version 7.2 (or higher)
o CM Fortran Version 2.1 (or higher)
In addition, for CM Fortran programs that run in the single-node
execution model (-node), you must install CMMD Version 3.0 (or
higher). If the default CMMD is not an appropriate version, then you
may need to use
-cmmd_root /usr/cmmd/3.0
in your link line.
2.3 LINKING WITH CMSSL VERSION 3.2
-----------------------------------
After writing a CM Fortran program that calls CMSSL routines, compile
it and link it with the library. Compiling a CM Fortran CMSSL program
is the same as compiling other CM Fortran programs: use the cmf
command. To compile the program program on a CM-5 and link it with
CMSSL, you can use either of the alternatives indicated below.
Linking with CMSSL Using -lcmssl
If the CM Fortran program program calls CMSSL routines, you can
compile it and link it with CMSSL using the -lcmssl switch. This
switch links with the appropriate CMSSL library based on the execution
modal you specify in the cmf command line. To use this alternative,
issue one of the following command lines at the UNIX prompt:
o In the CM Fortran (SPARC) nodes model:
cmf -cm5 -sparc -o program program.fcm -lcmssl
o In the CM Fortran vector-units model:
cmf -cm5 -vu -o program program.fcm -lcmssl
o In the CM Fortran single-node model:
cmf -cm5 -vu -node -o program program.fcm -lcmssl
In the single-node model, add -cmmd_root /usr/cmmd/3.0 before
-lcmsslcm5vu-node if CMMD Version 3.0 (or higher) is not the
default on your system.
o To compile and link a CM Fortran global/local program that makes
calls to CMSSL routines in both the global and the local program
units:
cmf -cm5 -vu -o program program.fcm -lcmssl -local
local.fcm program.proto -local -lcmssl
Linking with a Selected CMSSL Library Explicitly
Alternatively, if you want to link with the appropriate CMSSL library
explicitly, issue one of the following command lines at the UNIX
prompt:
o In the CM Fortran (SPARC) nodes model:
cmf -cm5 -sparc -o program program.fcm -lcmsslcm5
o In the CM Fortran vector-units model:
cmf -cm5 -vu -o program program.fcm -lcmsslcm5vu
o In the CM Fortran single-node model:
cmf -cm5 -vu -node -o program program.fcm
-lcmsslcm5vu-node
In the CM Fortran single-node model, add -cmmd_root /usr/cmmd/3.0
before -lcmsslcm5vu-node if CMMD Version 3.0 (or higher) is not
the default on your system.
o To compile and link a CM Fortran global/local program that makes
calls to CMSSL routines in both the global and the local program
units:
o cmf -cm5 -vu -o program program.fcm -lcmsslcm5vu
-local local.fcm example.proto -local
-lcmsslcm5vu-node
3 NEW FEATURES
****************
The CM Fortran operations listed below are new since Version 3.1. All
chapter numbers refer to CMSSL for CM Fortran, Version 3.2.
o Gaussian elimination with external storage
The external LU factorization and solver routines have been completely
rewritten and optimized in Version 3.2; they have a new
interface that involves reverse communication. (Chapter 5)
o Fixed-machine-size save and restore routines for LU and QR
solvers
The LU and QR routines are joined by new save and restore
routines: save_gen_lu_fms, restore_gen_lu_fms, save_gen_qr_fms,
and restore_gen_qr_fms. These new routines have the same
calling sequences and perform the same functions as the
corresponding older routines (save_gen_lu, restore_gen_lu,
save_gen_qr, and restore_gen_qr, respectively), except that the
new routines use fixed-machine-size I/O while the older ones use
serial-order I/O. (Chapter 5)
o Other new Gaussian elimination routines
The Gaussian elimination routines described in Chapter 5 of CMSSL
for CM Fortran are joined by three new routines in Version 3.2:
gen_lu_apply_l, gen_lu_apply_l_tra, and gen_lu_zero_rows.
(Chapter 5)
o Bidiagonalization
Version 3.2 introduces the gen_bidiag routine, which transforms
dense real matrices to bidiagonal form. Auxiliary routines
transform arbitrary vectors between the basis of the original
dense matrix and the basis of the bidiagonal matrix. (Chapter 9)
o Singular values of a bidiagonal matrix
The bidiag_svd_singular_values routine computes all the singular
values of one or more real bidiagonal matrices. (Chapter 9)
o Singular vectors of a bidiagonal matrix
The bidiag_svd_singular_vectors routine computes the singular
vectors corresponding to a set of singular values of one or more
real bidiagonal matrices of the same order. (Chapter 9)
o Singular value decomposition of dense real matrices
The gen_bidiag_singular_system routine performs a singular value
decomposition of one or more dense real matrices. (Chapter 9)
o Range histogram with CM array of bins
Version 3.2 introduces a new histogram routine,
histogram_range_cm, that performs the same function as the
histogram_range routine, but stores the bins in a CM array rather
than a front-end array. (Chapter 14)
o Combination of permutations
The new combine_fe_perms routine returns the combination of two
permutations supplied to it in front-end arrays. (Chapter 15)
o Zeroing routine
This release introduces a new routine, zero_elements, that zeroes
a CM array. This routine provides faster performance than the
equivalent CM Fortran code, especially for single-precision real
or complex data. (Chapter 15)
4 CHANGES FROM PREVIOUS VERSIONS
**********************************
The following CM Fortran routines have changed since Version 3.1:
o Arbitrary elementwise sparse matrix operations
The irandom, itrace, and trace arguments to the arbitrary
elementwise sparse matrix operations have been renamed to
mapping, motion, and setup, respectively. The mapping and motion
arguments provide new options for permutation method and
communication algorithm, respectively. The mapping argument now
makes available source and destination array permutations
generated internally using the partition_mesh routine, for
problems with symmetric sparsity. The motion argument provides an
option for communication that uses the part_gather and
part_scatter routines, as an alternative to the sparse_
util_gather and sparse_util_scatter routines. The values accepted
by these arguments are as follows:
Value of mapping Permutation
returned
0 identity (old irandom = 0)
1 random (old irandom = 1)
2 based on partitioning
Value of motion Operations Used by
Communication algorithm
0 get, send, scan-add (old itrace = 0)
1 part_gather, part_scatter
2 sparse_util_gather, sparse_util_scatter
(old itrace = 1)
Additionally, in the Version 3.1 manual, the man page for these
routines had placed the where_is_x and where_is_y arguments in
the wrong order; they have been switched in the Version 3.2 man
page. (Chapter 4)
o Arbitrary elementwise sparse matrix operations
The irandom, itrace, trace, trace_mask, and setup arguments of
the arbitrary block sparse matrix routines have been changed to
mapping, motion, setup1, setup2, and setup3, respectively.
Permutations based on partitioning are not yet available with
these routines, but the motion values are the same as above
(except that motion = 0 uses gets and send-adds). (Chapter 4)
o QR and LU Routines
The QR routines now allow nblock values greater than 1 with
pivoting. The LU routines now allow you to specify m > n in the
no-pivoting case. (However, you may not use the gen_lu_get_l
routine when m > n.) The LU and QR factors are now clearly
defined in the manual for the non-square case. (Chapter 5)
o Range histogram
Beginning with Version 3.2, the histogram_range and
histogram_range_cm routines require the min and range arguments
to have the same data type (integer or real) as the destination
array, A. (Chapter 14)
o Banded System Solvers
The nblock argument of the gen_banded_factor,
block_tridiag_factor, block_tridiag_solve,
block_pentadiag_factor, and block_pentadiag_solve routines has a
new name and meaning. The argument, now called group_size, allows
you to specify (or ask the routine to select) the number of
problem instances per processing element that are treated
together in one step of Gaussian elimination. This feature can
significantly improve performance in comparison to CMSSL Version
3.1. In addition, the work argument is now a scalar integer
rather than a front-end array. (Chapter 6)
o Iterative Solvers
The iterative solvers now support complex data for all algorithms
except CMSSL_cg (which is used for symmetric positive definite
systems) and CMSSL_bicgstab2. Previous to Version 3.2, all
algorithms required real data. (Chapater 7)
o Generalized Eigensystem Analysis
The sym_tred_gen_eigensystem routine now operates on complex
Hermitian matrices; previously it operated only on real symmetric
matrices. (Chapter 8)
o Simplex routine
The gen_simplex routine now checks the value of ier on input. If
ier is set to 2 (reinvert) or 16 (degenerate), gen_simplex
assumes you are reinverting; otherwise, it assumes you are
passing it a new problem.
Two new performance guidelines take effect with Version 3.2.
These new guidelines, together with the two conditions already
listed in the gen_simplex man page, ensure that gen_simplex does
not copy the CM array A:
Lay out A so that the subgrid size of the first axis is not 1.
Compile the main program with -axisreorder.
(Chapter 12)
o Matrix transpose performance enhancements
The axis-length restriction that was imposed by the
gen_matrix_transpose routine in order to achieve enhanced performance has
been lifted in Version 3.2. In addition, Version 3.2 adds another
enhancement for three-dimensional and higher-rank arrays. The
gen_matrix_transpose routine yields superior performance when a
local axis is exchanged with a non-local axis that is distributed
across the vector units on the same processing node. By far the
best performance is obtained when the non-local axis spans only
two vector units (that is, only the lowest off-chip bit is set).
The next best performance results when the non-local axis spans
four vector units on the same processing node (the two lowest
off-chip bits are set). For the general case of an axis that
spans multiple processing nodes, the transpose performance does
not depend significantly of the number of contiguous off-chip
bits (except if the axis spans only two nodes, in which case the
performance is slightly better). You can use this fact to improve
the performance of transposes involving arrays of rank greater
than or equal to three. With nodal CMSSL, you can also exploit
this fact with two-dimensional arrays. (Chapter 15)
o Fast Fourier Transform
The fft_setup and fft routines now perform error checking; see
Chapter 10 of CMSSL for CM Fortran for details. In addition, the
enhancements to gen_matrix_transpose mentioned above may help
users who perform transposes explicitly when performing FFTs on
multidimensional arrays. (Chapter 10)
o Sparse gather, scatter, vector gather, and vector scatter
utilities
In the sparse vector gather and scatter utilities, the vectors
are no longer required to lie along the left-most axis; you
choose the vector axes in the source and destination arrays using
two new arguments, x_vector_axis and y_vector_axis.
In addition, the calling sequences of these routines now more
closely resemble those of the partitioned gather and scatter
utilities. The following changes have been made in Version 3.2:
. The trace and trace_mask arguments of sparse_util_gather and its
associated routines were combined into a single setup argument.
. The y_template and x_mask arguments exchanged places in
sparse_util_scatter_setup and sparse_util_scatter_setup.
. The error code argument ier was added to sparse_util_
scatter_setup and sparse_util_vec_scatter_setup.
. The pointers argument p was removed from sparse_util_scatter.
(Chapater 15)
o Partitioning
The following new features were introduced into the
partition_mesh routine in Version 3.2:
. You can now divide a mesh into partitions that contain multiple
processing elements. One use for this feature would be to divide
a very large mesh (which cannot be handled as one piece by your
application) into several smaller pieces that can be processed
sequentially. The numproc argument is now an input as well as
output argument; it helps determine the number of processing
elements in each partition.
. A new argument, storage_option, provides a low-storage option
that is slower than the default operation, but uses less storage
for working arrays.
. A new argument, verbose, prints statistics to standard output.
In addition, in Version 3.2, you can choose the axis along which
to reorder pointers with the reorder_pointers routine.
Previously, this routine always reordered a pointers array along
its last axis. You can also choose which axis counts the mesh
elements in the ien and idual arrays. Moreover, a new element
type (segment) has been added. (Chapter 15)
o Partitioned gather and scatter utilities
The part_gather, part_scatter, part_vector_gather,
part_vector_scatter, and associated routines changed in the
following ways in Version 3.2:
. The trace argument was renamed setup.
. The routines now take advantage of data locality along any axis;
the reordered axis need not be the last axis. For best
performance, the reordered axis should be non-local and all other
axes should be local. (A local axis is either :serial or laid out
with :procs = 1.)
(Chapter 15)
o Optimization Hints for All-to-All Broadcast and Reduction
The performance guidelines for the all-to-all broadcast and
reduction routines have been updated to take into account the CM
Fortran -noaxisreorder switch. (Chapter 15)
o Vector move (extract and deposit)
The vector move routine has a new error code. Additionally, the
description of the vector move operation in CMSSL for CM Fortran,
CM-5 Edition, Version 3.1, was incorrect. The vector move routine
moves one vector from each subgrid of the source CM array into a
subgrid of the destination CM array. (Chapter 15)
5 LIMITATIONS AND RESTRICTIONS
********************************
The routines listed below cannot be used with the nodal CMSSL library
(the library used when you compile your program in the CM Fortran
single-node execution model). Also, if you are using the CM Fortran
global/local execution model, do not call these routines from the
local program unit.
save_gen_lu gen_lu_solve_ext
gen_qr_solve_ext restore_gen_lu
save_gen_qr gen_matrix_mult_ext
save_gen_lu_fms restore_gen_qr
save_fast_rng_temps restore_gen_lu_fms
save_gen_qr_fms restore_fast_rng_temps
gen_lu_setup_ext restore_gen_qr_fms
save_vp_rng_temps gen_lu_factor_ext
gen_qr_factor_ext restore_vp_rng_temps
6 DOCUMENTATION FOR THIS RELEASE
**********************************
The CM Fortran interface to CMSSL Version 3.2 is documented in CMSSL
for Fortran, Version 3.2. The information that is summarized in these
release notes is presented in more detail in the manual.
Your software tape includes ASCII and PostScript versions of these
release notes and of CMSSL for CM Fortran. The default location for
the release notes is
/usr/doc/cmssl-3.2-releasenotes
The default location for CMSSL for CM Fortran, Version 3.2 is
/usr/doc/cmssl/cmssl-for-cmf-v1-3.2 (Volume I)
/usr/doc/cmssl/cmssl-for-cmf-v2-3.2 (Volume II)
Within each volume directory you will find a PostScript file for each
chapter. The file called README contains more information.
If you do not find the documents in the default locations, check with
your system administrator or the person who installs the CMSSL at your
site.
6.1 ON-LINE SAMPLE CODE AND MAN PAGES
--------------------------------------
Included with CMSSL are sample on-line programs that demonstrate how
to call each CMSSL routine. You are encouraged to experiment with
these sample programs. Also included with CMSSL are on-line man pages
for all routines.
The on-line sample programs are located in subdirectories of the CMSSL
examples directory. The default location for the examples directory is
/usr/examples/cmssl.
Examples for the operation operation are included in the subdirectory
operation/cmf (or operation/sub-operation/cmf)
or
operation/cstar (or operation/sub-operation/cstar)
of the examples directory. For example, CM Fortran sample code for the
routine that performs eigenvector analysis using the Jacobi method is
located in the subdirectory
eigen/jacobi/cmf
of the examples directory. If you do not find the on-line examples in
/usr/examples/cmssl, check with your system administrator (or the
person who installs the CMSSL at your site) to find out where they
were installed.
To read the on-line man page for a routine, enter the command
man routine_name
at the UNIX prompt.
7 : ACKNOWLEDGMENTS
*******************
The bidiagonalization and singular value decomposition routines
introduced in this release are the result of collaborative development
between Thinking Machines Corporation and the Danish Computing Center
for Research and Education (UNI-C).