U.C. Berkeley Math 221 Home Page

Matrix Computations / Numerical Linear Algebra

Fall 2024

T Th 12:30-2, in 241 Cory Hall

Instructor:

Offices:
564 Soda Hall ("Virginia"), (510)643-5386
831 Evans Hall

Office Hours: W 10-11 and F 11-12 in 567 Soda Hall (cancelled Sep 6)
You may need to ring the doorbell at the entry to the SLICE Lab, at the SE corner of the 5th floor.

(send email)

Grader:

Sudhanva Kulkarni

See his web page for contact information and office hours.

Administrative Assistants:

Ria Briggs / Tami Chouteau

Email: (to Ria) / (to Tami)

Announcements:

(8/26) Welcome to Ma221! We look forward to another normal semester of in-person lectures, but we will continue to record and post lectures on bcourses; see Announcements on bcourses.berkeley.edu for information on how to connect. We will also prepost typed notes before each lecture.

(8/26) Please fill out this on-line Class Survey

(8/26) Homework 1 has been posted. It is due on Monday Sep 9 at 11:59pm, on gradescope. Answers to homework will also be posted at bcourses.berkeley.edu.

(9/23) A list of possible class projects has been posted here. Project proposals (1 page per team), are due on Gradescope on Oct 15. (The Course Overview said bcourses instead of Gradescope, but we will use Gradescope.)

Handouts

Course Overview in pdf, including syllabus, prerequisites, pointers to other references, and grading.

Textbook

Applied Numerical Linear Algebra by J. Demmel, published by SIAM, 1997.

List of Errata for the Textbook (Updated May 2024. Suggestions welcome!)

Homework assignments

Matlab Programs for Homework Assignments

Lecture Notes

Other Online Software, Documentation, and Reference Material

Matlab documentation is available from several sources, most notably by typing ``help'' into the Matlab command window.

Netlib, a repository of numerical software and related documentation

Netlib Search Facility, a way to search for the software on Netlib that you need

GAMS - Guide to Available Math Software, another search facility to find numerical software

Linear Algebra Software Libraries and Collections

LAPACK, state-of-the-art software for dense numerical linear algebra on workstations and shared-memory parallel computers. Written in Fortran.

LAPACK Manual

LAPACKE, a C interface to LAPACK.

TLAPACK, or "Templated" LAPACK is an on-going project to create a C++ version of LAPACK that allows one to use arbitrary arithmetic data types, eg very long or short precisions.

ScaLAPACK, a partial version of LAPACK for distributed-memory parallel computers. Written in Fortran.

ScaLAPACK manual

SLATE is a modern version (C++) version of ScaLAPACK which allows use of hardware accelerators.

MAGMA and PLASMA are dense linear algebra libraries design to run on GPUs and multicore platforms, resp.

LINPACK and EISPACK are precursors of LAPACK, dealing with linear systems and eigenvalue problems, respectively.

SuperLU is a set of fast implementations of sparse Gaussian elimination for sequential and parallel computers, respectively.

Updated survey of sparse direct linear equation solvers, by Xiaoye Li

SuiteSparse, another popular collection of sparse matrix software

BEBOP (Berkeley Benchmarking and Optimization) is a source for automatic generation of high performance numerical codes, including OSKI, a system for producing fast implementations of sparse-matrix-vector-multiplication. (OSKI stands for Optimized Sparse Kernel Interface, and only coincidentally is also the name of the Cal Bear mascot :) ).

Sources of test matrices for sparse matrix algorithms

Matrix Market

SuiteSparse Matrix Collection

Templates for the solution of linear systems, a collection of iterative methods, with advice on which ones to use. The web site includes on-line versions of the book (in html and postscript) as well as software.

Templates for the Solution of Algebraic Eigenvalue Problems is a survey of algorithms and software for solving eigenvalue problems. The web site points to an html version of the book, as well as software.

Resources for Parallel and High Performance Computing

NERSC (National Energy Research Scientific Computing Center), a DOE supercomputer center at neighboring LBL (Lawrence Berkeley National Lab), that provides supercomputer resources to problems of interest to DOE

CS 267, Applications of Parallel Computers, 2022 version, including slides and videos of lectures on parallel linear algebra

PETSc: Portable, Extensible, Toolkit for Scientific Computation

GraphBLAS: Using Linear Algebra for Graph Algorithms

References for Computer Arithmetic and Error Analysis

This article summarizes the changes in the 2019 version of the IEEE 754 Floating Point Standard and gives a historical perspective on how it has evolved since its first version in 1985. The IEEE 754 Floating Point Standard Committee has recently started meeting again, with the next version due in 2029.

A different floating point standard committee has started meeting recently, the P3109 Working Group on 8-bit Floating Point for Machine Learning. 8 bits is very low precision for classical uses of floating point, but appears to be enough to tell dogs from cats and many other ML applications. Here is an interim report from the committee. In the meantime, many companies have already agreed on their own common standard.

This document describes an on-going project to make the widely used BLAS and LAPACK linear algebra libraries more consistent and reliable in the way they handle floating point exceptions (possible class projects!).

This paper describes the general relationship between condition numbers and the distance to the nearest ill-posed problem.

``Accurate and efficient expression evaluation and linear algebra,'', by J. Demmel, I. Dumitriu, O. Holtz, P. Koev Acta Numerica, V. 17, May 2008. This paper shows how to exploit the mathematical structure of a problem to get higher accuracy.

Efficient software for very high precision floating point arithmetic

ARPREC

MPFR

GMP

Efficient software for high precision and reproducible Basic Linear Algebra Subroutines (BLAS)

XBLAS

ReproBLAS

Notes on IEEE Floating Point Arithmetic, by Prof. W. Kahan

Other notes on arithmetic, error analysis, etc. by Prof. W. Kahan

Report on arithmetic error that cause the Ariane 5 Rocket Crash

Robotic car crash caused by a NaN

The IEEE floating point standard was updated in 2008. Look here for a summary.

The IEEE floating point standard was updated again in 2019. Look here for a summary.

Another IEEE standard committee started meeting in 2021, to try to standardize low precision arithmetic for machine learning, we'll see what happens.

For a variety of papers on solving linear algebra problems with guaranteed accuracy, by using interval arithmetic, see Siegfried Rump's web site

For a paper on using mixed precision arithmetic, doing most work in fast, low precision, and a little in slower, high precision, with the goal of getting the answer to high precision fast, see A Survey of Numerical Methods Utilizing Mixed Precision Arithmetic. There is also a recent Nvidia blog post on this.

For a paper that shows how one might decrease the factor d = problem size in floating point error bounds, see Stochastic Rounding and its Probabilistic Backward Error Analysis

Variable precision floating point

For a paper on posits, a recently proposed fixed length variable precision arithmetic format, see Beating Floating Point at its Own Game by John Gustavson.

I disagree with several claims in this paper. For a paper analyzing posits, see Posits: The good, the bad and the ugly

For a recorded debate between Gustavson and Kahan on the pros and cons of Gustavson's earlier proposal for the variable length format called unums, see this youtube video.

For a (much older) paper showing how to do error analysis with variable precision arithmetic, many versions of which have been proposed before, see On Error Analysis in Arithmetic with Varying Relative Precision

References for Communication-Avoiding Algorithms

Minimizing Communication in Numerical Linear Algebra, G. Ballard, J. Demmel, O. Holtz, O. Schwartz, SIMAX, Sep 2011 (SIAM Linear Algebra Prize 2012) presents a complete proof of the communication lower bound for O(n^3) matrix multiplication and other classical linear algebra algorithms

Communication-Optimal Parallel and Sequential QR and LU Factorizations, J. Demmel, L. Grigori, M. Hoemmen, J. Langou, SIAM J. Sci. Comp., Feb 2012; (SIAM Activity Group (SIAG) on Supercomputing, Best Paper Prize, 2016)

Graph Expansion and Communication Costs of Fast Matrix Multiplication, G. Ballard, J. Demmel, O. Holtz, O. Schwartz, JACM, Dec 2012; presents the communication lower bound for Strassen-like matrix multiplication (Best Paper Award, at SPAA'11, invited CACM Research Highlight)

Communication lower bounds and optimal algorithms for numerical linear algebra G. Ballard, E. Carson, J. Demmel, M. Hoemmen, N. Knight, O. Schwartz, Acta Numerica, 2014; survey of field, including both direct and iterative methods

For more papers on communication-avoiding algorithms, see the bebop web page.

References for Randomized Algorithms

RandLAPACK is an ongoing project to extend LAPACK to including randomized algorithms.

The current design document for RandLAPACK is available here.

RandBLAS is the first release of the Randomized BLAS, a building block for RandLAPACK, just as the BLAS are a building block for LAPACK

More documentation for the RandBLAS can be found here.

Repository for parla (Python Algorithms for Randomized Linear Algebra) and marla (Matlab Algorithms for Randomized Linear Algebra), are partial draft implementations of some of the RandLAPACK algorithms.

"Randomized Numerical Linear Algebra: Foundations and Algorithms," P. G. Martinsson, J. Tropp, Acta Numerica 2020, or arxiv.org

"Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions," N. Halko, P. G. Martinsson, J. A. Tropp, SIAM Review 2011, or arxiv.org

"An Elementary Proof of a Theorem of Johnson and Lindenstrauss," S. Dasgupta, A. Gupta, Random Structures and Algorithms 2003, or here

"Randomized algorithms for matrices and data," M. Mahoney, arxiv.org

"Low Rank Approximation and Regression in Input Sparsity Time", K. Clarkson, D. Woodruff, STOC 2013 (co-winner, Best Paper Award), or arxiv.org

"Low-distortion Subspace Embeddings in Input-sparsity Time and Applications to Robust Linear Regression," X. Meng, M. Mahoney, arxiv.org

Stat 260/CS294 - Randomized Algorithms for Matrices and Data was taught by Michael Mahoney in Fall 2013.

Reading group on randomized numerical linear algebra, with archived slides and video presentations, was taught by Laura Grigori and James Demmel in Spring 2015

References for Symmetric Eigenproblem and SVD

``Avoiding Communication in Successive Band Reduction'', G. Ballard, J. Demmel, N. Knight, UCB EECS Tech Report UCB/EECS-2013-131

``New Fast and Accurate Jacobi SVD Algorithm, Part I and Part II,'', Z. Drmac, K. Veselic, SIAM J. Mat. Anal. Appl., v 29, 2008 (SIAM Linear Algebra Prize 2009)

``Orthogonal Eigenvectors and Relative Gaps,'' I. Dhillon, B. Parlett, SIAM J. Mat. Anal. Appl. v. 25:3, Mar 2004 (SIAM Linear Algebra Prize 2006)

``Computing the Singular Value Decomposition with High Relative Accuracy,'' J. Demmel, M. Gu, S. Eisenstat, I. Slapnicar, K. Veselic, Z. Drmac, Lin. Alg. Appl., v 299, 1999

``Jacobi's Method is more accurate than QR,'' J. Demmel, K. Veselic, SIAM J. Mat. Anal. Appl., v 27, 1990

``Accurate singular values of bidiagonal matrices,'' J. Demmel, W. Kahan, SIAM J. Sci. Stat. Comp., v 11, 1990 (SIAM Linear Algebra Prize 1991)

References for Nonsymmetric Eigenproblem

``Pseudospectral Shattering, the Sign Function, and Diagonalization in Nearly Matrix Multiplication Time,'' J. Banks, J. Garza-Vargas, A. Kulkarni, N. Srivastava, 2022

``Global Convergence of Hessenberg Shifted QR I: Exact Arithmetic,'' J. Banks, J. Garza-Vargas, N. Srivastava, 2023

``Global Convergence of Hessenberg Shifted QR II: Numerical Stability,'' J. Banks, J. Garza-Vargas, N. Srivastava, 2022

``Global Convergence of Hessenberg Shifted QR III: Approximate Ritz Values via Shifted Inverse Iteration,'' J. Banks, J. Garza-Vargas, N. Srivastava, 2022

References for Iterative Methods

An Introduction to the Conjugate Gradient Method Without the Agonizing Pain by Jonathan Shewchuk, is a very easy to understand description of one of the most popular iterative methods for solving A*x=b. In contrast to the terse treatment in the course text book, you might want to see Shewchuk's answer to the question "How could fifteen lines of code take fifty pages to explain?"

Lectures notes on Multigrid, in powerpoint.

References for Autotuning (automatic performance tuning)

The goal of autotuning is to let the computer write code, by searching over a design space of possible implementations to find the ``best'' one. ``Best'' usually means fastest, but other goals (or constraints) are possible too, eg memory and accuracy. It is motivated by the need to make programming easier, since these design spaces can be complicated and high-dimensional, so that even an expert can take a long time to find a good implementation, and may miss out on some opportunities.

Tuning the BLAS (and some other linear algebra)

PHiPAC (Portable High Performance ANSI C) was one of the earliest projects in this area (done at Berkeley), for tuning matrix multiplication, and our 1997 conference paper recently won a Test-of-Time award.

ATLAS (Automatically Tuned Linear Algebra Software) is another project that started about the same time, aimed at tuning all the BLAS, and is ongoing, and also won a Test-of-Time award.

BLIS and OpenBLAS are two more recent and ongoing projects.

OSKI (Optimized Sparse Kernel Interface) autotunes SpMV (sparse matrix times dense vector multiplication), in part by choosing an optimal sparse data structure for each sparse matrix. POSKI is a parallel version.

FFTW (Fastest Fourier Transform in the West) is another prize-winning project for autotuning FFTs.

Spiral is an autotuner for DSP (digital signal processing) more generally, and targets the tuning of both software and special purpose hardware.

``Black-box'' autotuners: when a large and complicated program exposes a set of tuning parameters, and one can set these parameters, run the code, and measure its performance (or accuracy, etc.), then it becomes a machine learning problem to choose the best parameter values. These parameters could be real numbers, like stopping criteria, or integers, like block sizes, or ``categoricals'', like "Algorithm A" or "Algorithm B". Since the parameter space can be high dimensional and large, one goal of the autotuner is to minimize the time spent running the program, by building a model to pick the most promising parameter values to test.

Opentuner provide an ensemble of search algorithms, that it runs simulataneously.

HPBandSter was developed to tune hyperparameters of machine learning algorithms, for example using Bayesian Optimization. It can be used more generally for other applications too.

GPTune is an on-going Berkeley project, that uses Gaussian Process modeling to tune programs, in particular those that are very expensive to run. Here is a recent paper.

HiPerBOt is another ongoing project using Bayesian Optimization for autotuning.

Why we are not covering tensors in this class: Most tensor problems are NP-hard