U.C. Berkeley Math 221 Home Page
Matrix Computations / Numerical Linear Algebra
MWF 2-3, in 102 Wheeler Hall
See his web page
for contact information and office hours.
Ria Briggs / Tammy Chouteau
Email: (to Ria) /
(8/18) Welcome to Ma221! We look forward to another normal
semester of in-person lectures, but we will continue
to record and post lectures on bcourses;
see Announcements on
for information on how to connect.
We will also prepost typed notes before each lecture.
(8/18) Please fill out this on-line
(updated Aug 23, 8:55pm).
(8/18) Homework 1 has been posted.
It is due Sep 1 at 11:59pm, on
Answers to homework will also be posted at
(8/29) Homework 1's due date is delayed from Sep 1 to Sep 4.
Course Overview in
including syllabus, prerequisites, pointers to other references, and grading.
Applied Numerical Linear Algebra by J. Demmel, published by
List of Errata for the Textbook
(Updated Fall 2020. Suggestions welcome!)
Other Online Software, Documentation, and Reference Material
Matlab documentation is available from several sources, most notably
by typing ``help'' into the Matlab command window.
Netlib, a repository of numerical software and
Netlib Search Facility,
a way to search for the software on Netlib that you need
GAMS - Guide to Available Math Software, another search facility to find numerical
Linear Algebra Software Libraries and Collections
LAPACK, state-of-the-art software for dense numerical linear algebra on
workstations and shared-memory parallel computers. Written in Fortran.
LAPACKE, a C interface to LAPACK.
ScaLAPACK, a partial version of LAPACK for distributed-memory parallel computers.
EISPACK are precursors of
LAPACK, dealing with linear systems and eigenvalue problems, respectively.
is a set of fast implementations of sparse Gaussian elimination for
sequential and parallel computers, respectively.
of sparse direct linear equation solvers, by
another popular collections of sparse matrix software
BEBOP (Berkeley Benchmarking and
Optimization) is a source for automatic generation of high performance
numerical codes, including OSKI,
a system for producing fast implementations of sparse-matrix-vector-multiplication.
(OSKI stands for Optimized Sparse Kernel Interface, and only coincidentally is
also the name of the Cal Bear mascot :) ).
Sources of test matrices for sparse matrix algorithms
SuiteSparse Matrix Collection
for the solution of linear systems,
a collection of iterative methods, with advice on which ones to use.
The web site includes on-line versions of the book
as well as software.
for the Solution of Algebraic Eigenvalue Problems
is a survey of algorithms and
software for solving eigenvalue problems. The web site points to
an html version of the book, as well as software.
Resources for Parallel and High Performance Computing
NERSC (National Energy Research Scientific Computing Center),
a DOE supercomputer center at neighboring
LBL (Lawrence Berkeley National Lab), that provides
supercomputer resources to problems of interest to DOE
CS 267, Applications of Parallel Computers,
including slides and videos of lectures on parallel linear algebra
PETSc: Portable, Extensible, Toolkit for Scientific Computation
GraphBLAS: Using Linear Algebra for Graph Algorithms
References for Computer Arithmetic and Error Analysis
describes an on-going project to make the widely used BLAS and LAPACK linear algebra libraries
more consistent and reliable in the way they handle floating point exceptions (possible class projects!).
This paper describes the general relationship between condition numbers
and the distance to the nearest ill-posed problem.
``Accurate and efficient expression evaluation and linear algebra,'',
by J. Demmel, I. Dumitriu, O. Holtz, P. Koev Acta Numerica, V. 17, May 2008. This paper shows how to
exploit the mathematical structure of a problem to get higher accuracy.
Efficient software for very high precision floating point arithmetic
Efficient software for high precision and reproducible Basic Linear Algebra Subroutines (BLAS)
Notes on IEEE Floating Point Arithmetic, by
Prof. W. Kahan
Other notes on arithmetic, error analysis, etc. by
Prof. W. Kahan
Report on arithmetic error that cause the Ariane 5 Rocket Crash
Robotic car crash caused by a NaN
The IEEE floating point standard was updated in 2008.
for a summary.
The IEEE floating point standard was updated again in 2019.
for a summary.
Another IEEE standard committee started meeting in 2021, to try to
standardize low precision arithmetic for machine learning, we'll see what
For a variety of papers on solving linear algebra problems with
guaranteed accuracy, by using interval arithmetic,
see Siegfried Rump's web site
For a paper on using mixed precision arithmetic, doing most work in fast, low precision, and a little in
slower, high precision, with the goal of getting the answer to high precision fast, see
A Survey of Numerical Methods Utilizing Mixed Precision Arithmetic.
There is also a recent
Nvidia blog post on this.
For a paper that shows how one might decrease the factor d = problem size in floating point error bounds, see
Stochastic Rounding and its Probabilistic Backward Error Analysis
Variable precision floating point
For a paper on posits, a recently proposed fixed length variable precision arithmetic format, see
Beating Floating Point at its
Own Game by John Gustavson.
I disagree with several claims in this paper. For a paper analyzing posits, see
Posits: The good, the bad and the ugly
For a recorded debate between Gustavson and Kahan on the pros and cons
of Gustavson's earlier proposal for the variable length format called unums, see this
For a (much older) paper showing how to do error analysis with variable precision arithmetic,
many versions of which have been proposed before, see
On Error Analysis in Arithmetic with Varying Relative Precision
References for Communication-Avoiding Algorithms
Minimizing Communication in Numerical Linear Algebra,
G. Ballard, J. Demmel, O. Holtz, O. Schwartz, SIMAX, Sep 2011
(SIAM Linear Algebra Prize 2012)
presents a complete proof of the communication lower bound for O(n^3) matrix multiplication
and other classical linear algebra algorithms
Communication-Optimal Parallel and Sequential
QR and LU Factorizations, J. Demmel, L. Grigori, M. Hoemmen, J. Langou,
SIAM J. Sci. Comp., Feb 2012;
(SIAM Activity Group (SIAG) on Supercomputing, Best Paper Prize, 2016)
Graph Expansion and Communication Costs of Fast Matrix Multiplication,
G. Ballard, J. Demmel, O. Holtz, O. Schwartz, JACM, Dec 2012;
presents the communication lower bound for Strassen-like matrix multiplication
(Best Paper Award, at SPAA'11, invited CACM Research Highlight)
Communication lower bounds and optimal algorithms for numerical linear algebra
G. Ballard, E. Carson, J. Demmel, M. Hoemmen, N. Knight, O. Schwartz,
Acta Numerica, 2014; survey of field, including both direct and iterative methods
For more papers on communication-avoiding algorithms, see the
bebop web page.
References for Randomized Algorithms
RandLAPACK is an outline of a current project to extend LAPACK
to including randomized algorithms.
Repository for parla (Python Algorithms for Randomized Linear Algebra)
and marla (Matlab Algorithms for Randomized Linear Algebra), are partial draft implementations of some of the
"Randomized Numerical Linear Algebra: Foundations and Algorithms,"
P. G. Martinsson, J. Tropp, Acta Numerica 2020, or
"Finding Structure with Randomness: Probabilistic Algorithms for
Constructing Approximate Matrix Decompositions,"
N. Halko, P. G. Martinsson, J. A. Tropp,
SIAM Review 2011, or arxiv.org
"An Elementary Proof of a Theorem of Johnson and Lindenstrauss,"
S. Dasgupta, A. Gupta,
Random Structures and Algorithms 2003, or
"Randomized algorithms for matrices and data," M. Mahoney,
"Low Rank Approximation and Regression in Input Sparsity Time",
K. Clarkson, D. Woodruff, STOC 2013 (co-winner, Best Paper Award), or
"Low-distortion Subspace Embeddings in Input-sparsity Time and
Applications to Robust Linear Regression," X. Meng, M. Mahoney,
Stat 260/CS294 - Randomized Algorithms for Matrices and Data
was taught by Michael Mahoney in Fall 2013.
Reading group on randomized
numerical linear algebra, with archived slides and video presentations,
was taught by Laura Grigori and
James Demmel in Spring 2015
References for Symmetric Eigenproblem and SVD
``Avoiding Communication in Successive Band Reduction'',
G. Ballard, J. Demmel, N. Knight, UCB EECS Tech Report UCB/EECS-2013-131
``New Fast and Accurate Jacobi SVD Algorithm,
Part I and
Z. Drmac, K. Veselic, SIAM J. Mat. Anal. Appl., v 29, 2008
(SIAM Linear Algebra Prize 2009)
``Orthogonal Eigenvectors and Relative Gaps,''
I. Dhillon, B. Parlett, SIAM J. Mat. Anal. Appl. v. 25:3, Mar 2004
(SIAM Linear Algebra Prize 2006)
``Computing the Singular Value Decomposition with High Relative Accuracy,''
J. Demmel, M. Gu, S. Eisenstat, I. Slapnicar, K. Veselic, Z. Drmac, Lin. Alg. Appl., v 299, 1999
``Jacobi's Method is more accurate than QR,''
J. Demmel, K. Veselic, SIAM J. Mat. Anal. Appl., v 27, 1990
``Accurate singular values of bidiagonal matrices,''
J. Demmel, W. Kahan, SIAM J. Sci. Stat. Comp., v 11, 1990
(SIAM Linear Algebra Prize 1991)
References for Iterative Methods
An Introduction to the Conjugate Gradient Method Without the Agonizing Pain
by Jonathan Shewchuk,
is a very easy to understand description of one of the most popular
iterative methods for solving A*x=b. In contrast to the terse treatment
in the course text book, you might want to see Shewchuk's answer to
the question "How could fifteen lines of code take fifty pages to explain?"
Lectures notes on Multigrid, in
References for Autotuning (automatic performance tuning)
The goal of autotuning is to let the computer write code, by searching over a design space of possible implementations
to find the ``best'' one. ``Best'' usually means fastest, but other goals (or constraints) are possible too, eg
memory and accuracy. It is motivated by the need to make programming easier, since these design spaces can be
complicated and high-dimensional, so that even an expert can take a long time to find a good implementation, and
may miss out on some opportunities.
Tuning the BLAS (and some other linear algebra)
PHiPAC (Portable High Performance ANSI C) was one of
the earliest projects in this area (done at Berkeley), for tuning matrix multiplication, and our
1997 conference paper recently won a Test-of-Time award.
ATLAS (Automatically Tuned Linear Algebra Software) is another project
that started about the same time, aimed at tuning all the BLAS, and is ongoing, and also won a Test-of-Time award.
OpenBLAS are two more recent and ongoing projects.
OSKI (Optimized Sparse Kernel Interface) autotunes SpMV (sparse matrix times
dense vector multiplication), in part by choosing an optimal sparse data structure for each sparse matrix.
POSKI is a parallel version.
FFTW (Fastest Fourier Transform in the West) is another prize-winning project for autotuning FFTs.
Spiral is an autotuner for DSP (digital signal processing) more generally, and targets
the tuning of both software and special purpose hardware.
``Black-box'' autotuners: when a large and complicated program exposes a set of tuning parameters, and one can set these parameters,
run the code, and measure its performance (or accuracy, etc.), then it becomes a machine learning problem to choose the best
parameter values. These parameters could be real numbers, like stopping criteria, or integers, like block sizes, or ``categoricals'',
like "Algorithm A" or "Algorithm B". Since the parameter space can be high dimensional and large, one goal of the autotuner is to
minimize the time spent running the program, by building a model to pick the most promising parameter values to test.
Opentuner provide an ensemble of search algorithms, that it runs simulataneously.
HPBandSter was developed to tune hyperparameters
of machine learning algorithms, for example using Bayesian Optimization. It can be used more generally for other applications too.
GPTune is an on-going Berkeley project, that uses Gaussian Process modeling
to tune programs, in particular those that are very expensive to run.
Here is a recent paper.
HiPerBOt is another ongoing project using Bayesian Optimization for autotuning.
Why we are not covering tensors in this class: Most tensor problems are NP-hard