# U.C. Berkeley Math 221 Home Page

## Matrix Computations / Numerical Linear Algebra

*Fall 2024*

T Th 12:30-2, in 241 Cory Hall

### Instructor:

### Grader:

Sudhanva Kulkarni
See his web page
for contact information and office hours.
###
Administrative Assistants:

Ria Briggs / Tami Chouteau
Email: (to Ria) /
(to Tami)
### Announcements:

(8/26) Welcome to Ma221! We look forward to another normal
semester of in-person lectures, but we will continue
to record and post lectures on bcourses;
see Announcements on
bcourses.berkeley.edu
for information on how to connect.
We will also prepost typed notes before each lecture.
(8/26) Please fill out this on-line
Class Survey
(8/26) Homework 1 has been posted.
It is due on Monday Sep 9 at 11:59pm, on
**gradescope**.
Answers to homework will also be posted at
bcourses.berkeley.edu.
(9/23) A list of possible class projects has been posted
here.
Project proposals (1 page per team), are due on Gradescope on Oct 15.
(The Course Overview said bcourses instead of Gradescope, but we will use Gradescope.)
###
Handouts

Course Overview in
pdf,
including syllabus, prerequisites, pointers to other references, and grading.
###
Textbook

*Applied Numerical Linear Algebra* by J. Demmel, published by
SIAM, 1997.
List of Errata for the Textbook
(Updated May 2024. Suggestions welcome!)
###
Other Online Software, Documentation, and Reference Material

Matlab documentation is available from several sources, most notably
by typing ``help'' into the Matlab command window.
Netlib, a repository of numerical software and
related documentation
Netlib Search Facility,
a way to search for the software on Netlib that you need
GAMS - Guide to Available Math Software, another search facility to find numerical
software
Linear Algebra Software Libraries and Collections
LAPACK, state-of-the-art software for dense numerical linear algebra on
workstations and shared-memory parallel computers. Written in Fortran.
LAPACK Manual
LAPACKE, a C interface to LAPACK.
TLAPACK, or "Templated" LAPACK
is an on-going project to create a C++ version of LAPACK that allows one to use
arbitrary arithmetic data types, eg very long or short precisions.
ScaLAPACK, a partial version of LAPACK for distributed-memory parallel computers.
Written in Fortran.
ScaLAPACK manual
SLATE is a modern version (C++) version
of ScaLAPACK which allows use of hardware accelerators.
MAGMA and PLASMA
are dense linear algebra libraries design to run on GPUs and multicore platforms, resp.
LINPACK and
EISPACK are precursors of
LAPACK, dealing with linear systems and eigenvalue problems, respectively.
SuperLU
is a set of fast implementations of sparse Gaussian elimination for
sequential and parallel computers, respectively.
Updated survey
of sparse direct linear equation solvers, by
Xiaoye Li
SuiteSparse,
another popular collection of sparse matrix software
BEBOP (Berkeley Benchmarking and
Optimization) is a source for automatic generation of high performance
numerical codes, including OSKI,
a system for producing fast implementations of sparse-matrix-vector-multiplication.
(OSKI stands for Optimized Sparse Kernel Interface, and only coincidentally is
also the name of the Cal Bear mascot :) ).
Sources of test matrices for sparse matrix algorithms
Matrix Market
SuiteSparse Matrix Collection
Templates
for the solution of linear systems,
a collection of iterative methods, with advice on which ones to use.
The web site includes on-line versions of the book
(in html
and postscript)
as well as software.
Templates
for the Solution of Algebraic Eigenvalue Problems
is a survey of algorithms and
software for solving eigenvalue problems. The web site points to
an html version of the book, as well as software.
Resources for Parallel and High Performance Computing
NERSC (National Energy Research Scientific Computing Center),
a DOE supercomputer center at neighboring
LBL (Lawrence Berkeley National Lab), that provides
supercomputer resources to problems of interest to DOE
CS 267, Applications of Parallel Computers,
2022 version,
including slides and videos of lectures on parallel linear algebra
PETSc: Portable, Extensible, Toolkit for Scientific Computation
GraphBLAS: Using Linear Algebra for Graph Algorithms
References for Computer Arithmetic and Error Analysis
This article summarizes the changes in
the 2019 version of the IEEE 754 Floating Point Standard and gives a historical
perspective on how it has evolved since its first version in 1985. The
IEEE 754 Floating Point Standard Committee has recently started meeting again,
with the next version due in 2029.
A different floating point standard committee has started meeting recently,
the P3109 Working Group on 8-bit Floating Point for Machine Learning.
8 bits is very low precision for classical uses of floating point, but appears
to be enough to tell dogs from cats and many other ML applications. Here is an
interim report
from the committee. In the meantime, many companies have already agreed on their own
common standard.
This document
describes an on-going project to make the widely used BLAS and LAPACK linear algebra libraries
more consistent and reliable in the way they handle floating point exceptions (possible class projects!).
This paper describes the general relationship between condition numbers
and the distance to the nearest ill-posed problem.
``Accurate and efficient expression evaluation and linear algebra,'',
by J. Demmel, I. Dumitriu, O. Holtz, P. Koev Acta Numerica, V. 17, May 2008. This paper shows how to
exploit the mathematical structure of a problem to get higher accuracy.
Efficient software for very high precision floating point arithmetic
ARPREC
MPFR
GMP
Efficient software for high precision and reproducible Basic Linear Algebra Subroutines (BLAS)
XBLAS
ReproBLAS
Notes on IEEE Floating Point Arithmetic, by
Prof. W. Kahan
Other notes on arithmetic, error analysis, etc. by
Prof. W. Kahan
Report on arithmetic error that cause the Ariane 5 Rocket Crash
Robotic car crash caused by a NaN
The IEEE floating point standard was updated in 2008.
Look here
for a summary.
The IEEE floating point standard was updated again in 2019.
Look here
for a summary.
Another IEEE standard committee started meeting in 2021, to try to
standardize low precision arithmetic for machine learning, we'll see what
happens.
For a variety of papers on solving linear algebra problems with
guaranteed accuracy, by using interval arithmetic,
see Siegfried Rump's web site
For a paper on using mixed precision arithmetic, doing most work in fast, low precision, and a little in
slower, high precision, with the goal of getting the answer to high precision fast, see
A Survey of Numerical Methods Utilizing Mixed Precision Arithmetic.
There is also a recent
Nvidia blog post on this.
For a paper that shows how one might decrease the factor d = problem size in floating point error bounds, see
Stochastic Rounding and its Probabilistic Backward Error Analysis
Variable precision floating point
For a paper on posits, a recently proposed fixed length variable precision arithmetic format, see
Beating Floating Point at its
Own Game by John Gustavson.
I disagree with several claims in this paper. For a paper analyzing posits, see
Posits: The good, the bad and the ugly
For a recorded debate between Gustavson and Kahan on the pros and cons
of Gustavson's earlier proposal for the variable length format called unums, see this
youtube video.
For a (much older) paper showing how to do error analysis with variable precision arithmetic,
many versions of which have been proposed before, see
On Error Analysis in Arithmetic with Varying Relative Precision
References for Communication-Avoiding Algorithms
Minimizing Communication in Numerical Linear Algebra,
G. Ballard, J. Demmel, O. Holtz, O. Schwartz, SIMAX, Sep 2011
(**SIAM Linear Algebra Prize 2012**)
presents a complete proof of the communication lower bound for O(n^3) matrix multiplication
and other classical linear algebra algorithms
Communication-Optimal Parallel and Sequential
QR and LU Factorizations, J. Demmel, L. Grigori, M. Hoemmen, J. Langou,
SIAM J. Sci. Comp., Feb 2012;
(**SIAM Activity Group (SIAG) on Supercomputing, Best Paper Prize, 2016**)
Graph Expansion and Communication Costs of Fast Matrix Multiplication,
G. Ballard, J. Demmel, O. Holtz, O. Schwartz, JACM, Dec 2012;
presents the communication lower bound for Strassen-like matrix multiplication
(**Best Paper Award, at SPAA'11, invited CACM Research Highlight**)
Communication lower bounds and optimal algorithms for numerical linear algebra
G. Ballard, E. Carson, J. Demmel, M. Hoemmen, N. Knight, O. Schwartz,
Acta Numerica, 2014; survey of field, including both direct and iterative methods
For more papers on communication-avoiding algorithms, see the
bebop web page.
References for Randomized Algorithms
RandLAPACK is an ongoing project to extend LAPACK to including randomized algorithms.
The current design document for RandLAPACK is available here.
RandBLAS is the first release of the Randomized BLAS,
a building block for RandLAPACK, just as the BLAS are a building block for LAPACK
More documentation for the RandBLAS can be found here.
Repository for parla (Python Algorithms for Randomized Linear Algebra)
and marla (Matlab Algorithms for Randomized Linear Algebra), are partial draft implementations of some of the RandLAPACK algorithms.
"Randomized Numerical Linear Algebra: Foundations and Algorithms,"
P. G. Martinsson, J. Tropp, Acta Numerica 2020, or
arxiv.org
"Finding Structure with Randomness: Probabilistic Algorithms for
Constructing Approximate Matrix Decompositions,"
N. Halko, P. G. Martinsson, J. A. Tropp,
SIAM Review 2011, or arxiv.org
"An Elementary Proof of a Theorem of Johnson and Lindenstrauss,"
S. Dasgupta, A. Gupta,
Random Structures and Algorithms 2003, or
here
"Randomized algorithms for matrices and data," M. Mahoney,
arxiv.org
"Low Rank Approximation and Regression in Input Sparsity Time",
K. Clarkson, D. Woodruff, STOC 2013 (co-winner, Best Paper Award), or
arxiv.org
"Low-distortion Subspace Embeddings in Input-sparsity Time and
Applications to Robust Linear Regression," X. Meng, M. Mahoney,
arxiv.org
Stat 260/CS294 - Randomized Algorithms for Matrices and Data
was taught by Michael Mahoney in Fall 2013.
Reading group on randomized
numerical linear algebra, with archived slides and video presentations,
was taught by Laura Grigori and
James Demmel in Spring 2015
References for Symmetric Eigenproblem and SVD
``Avoiding Communication in Successive Band Reduction'',
G. Ballard, J. Demmel, N. Knight, UCB EECS Tech Report UCB/EECS-2013-131
``New Fast and Accurate Jacobi SVD Algorithm,
Part I and
Part II,'',
Z. Drmac, K. Veselic, SIAM J. Mat. Anal. Appl., v 29, 2008
(**SIAM Linear Algebra Prize 2009**)
``Orthogonal Eigenvectors and Relative Gaps,''
I. Dhillon, B. Parlett, SIAM J. Mat. Anal. Appl. v. 25:3, Mar 2004
(**SIAM Linear Algebra Prize 2006**)
``Computing the Singular Value Decomposition with High Relative Accuracy,''
J. Demmel, M. Gu, S. Eisenstat, I. Slapnicar, K. Veselic, Z. Drmac, Lin. Alg. Appl., v 299, 1999
``Jacobi's Method is more accurate than QR,''
J. Demmel, K. Veselic, SIAM J. Mat. Anal. Appl., v 27, 1990
``Accurate singular values of bidiagonal matrices,''
J. Demmel, W. Kahan, SIAM J. Sci. Stat. Comp., v 11, 1990
(**SIAM Linear Algebra Prize 1991**)
References for Nonsymmetric Eigenproblem
``Pseudospectral Shattering, the Sign Function, and Diagonalization in Nearly Matrix Multiplication Time,''
J. Banks, J. Garza-Vargas, A. Kulkarni, N. Srivastava, 2022
``Global Convergence of Hessenberg Shifted QR I: Exact Arithmetic,''
J. Banks, J. Garza-Vargas, N. Srivastava, 2023
``Global Convergence of Hessenberg Shifted QR II: Numerical Stability,''
J. Banks, J. Garza-Vargas, N. Srivastava, 2022
``Global Convergence of Hessenberg Shifted QR III: Approximate Ritz Values via Shifted Inverse Iteration,''
J. Banks, J. Garza-Vargas, N. Srivastava, 2022
References for Iterative Methods
An Introduction to the Conjugate Gradient Method Without the Agonizing Pain
by Jonathan Shewchuk,
is a very easy to understand description of one of the most popular
iterative methods for solving A*x=b. In contrast to the terse treatment
in the course text book, you might want to see Shewchuk's answer to
the question "How could fifteen lines of code take fifty pages to explain?"
Lectures notes on Multigrid, in
powerpoint.
References for Autotuning (automatic performance tuning)
The goal of autotuning is to let the computer write code, by searching over a design space of possible implementations
to find the ``best'' one. ``Best'' usually means fastest, but other goals (or constraints) are possible too, eg
memory and accuracy. It is motivated by the need to make programming easier, since these design spaces can be
complicated and high-dimensional, so that even an expert can take a long time to find a good implementation, and
may miss out on some opportunities.
Tuning the BLAS (and some other linear algebra)
PHiPAC (Portable High Performance ANSI C) was one of
the earliest projects in this area (done at Berkeley), for tuning matrix multiplication, and our
1997 conference paper recently won a Test-of-Time award.
ATLAS (Automatically Tuned Linear Algebra Software) is another project
that started about the same time, aimed at tuning all the BLAS, and is ongoing, and also won a Test-of-Time award.
BLIS and
OpenBLAS are two more recent and ongoing projects.
OSKI (Optimized Sparse Kernel Interface) autotunes SpMV (sparse matrix times
dense vector multiplication), in part by choosing an optimal sparse data structure for each sparse matrix.
POSKI is a parallel version.
FFTW (Fastest Fourier Transform in the West) is another prize-winning project for autotuning FFTs.
Spiral is an autotuner for DSP (digital signal processing) more generally, and targets
the tuning of both software and special purpose hardware.
``Black-box'' autotuners: when a large and complicated program exposes a set of tuning parameters, and one can set these parameters,
run the code, and measure its performance (or accuracy, etc.), then it becomes a machine learning problem to choose the best
parameter values. These parameters could be real numbers, like stopping criteria, or integers, like block sizes, or ``categoricals'',
like "Algorithm A" or "Algorithm B". Since the parameter space can be high dimensional and large, one goal of the autotuner is to
minimize the time spent running the program, by building a model to pick the most promising parameter values to test.
Opentuner provide an ensemble of search algorithms, that it runs simulataneously.
HPBandSter was developed to tune hyperparameters
of machine learning algorithms, for example using Bayesian Optimization. It can be used more generally for other applications too.
GPTune is an on-going Berkeley project, that uses Gaussian Process modeling
to tune programs, in particular those that are very expensive to run.
Here is a recent paper.
HiPerBOt is another ongoing project using Bayesian Optimization for autotuning.
Why we are not covering tensors in this class: Most tensor problems are NP-hard