Fourier Features Let Networks Learn
High Frequency Functions in Low Dimensional Domains
We show that passing input points through a simple Fourier feature mapping enables a multilayer perceptron (MLP) to learn high-frequency functions in low-dimensional problem domains. These results shed light on recent advances in computer vision and graphics that achieve state-of-the-art results by using MLPs to represent complex 3D objects and scenes. Using tools from the neural tangent kernel (NTK) literature, we show that a standard MLP fails to learn high frequencies both in theory and in practice. To overcome this spectral bias, we use a Fourier feature mapping to transform the effective NTK into a stationary kernel with a tunable bandwidth. We suggest an approach for selecting problem-specific Fourier features that greatly improves the performance of MLPs for low-dimensional regression tasks relevant to the computer vision and graphics communities.
Training a network without and with Fourier features
In this paper, we train MLP networks to learn low dimensional functions, such as the function defined by an image that maps each (x, y) pixel coordinate to an output (r, g, b) color. A standard MLP is not able to learn such functions (blue border image). Simply applying a Fourier feature mapping to the input (x, y) points before passing them to the network allows for rapid convergence (orange border image).
This Fourier feature mapping is very simple. For an input point v (for the example above, (x, y) pixel coordinates) and a random Gaussian matrix B, where each entry is drawn independently from a normal distribution N(0, σ2), we use
to map input coordinates into a higher dimensional feature space before passing them through the network.
Fourier features and the Neural Tangent Kernel
Recent theoretical work describes the behavior of deep networks in terms of the neural tangent kernel (NTK), showing that the network's predictions over the course of training closely track the outputs of kernel regression problem being optimized by gradient descent. In our paper, we show that using a Fourier feature mapping transforms the NTK into a stationary kernel in our low-dimensional problem domains. In this context, the bandwidth of the NTK limits the spectrum of the recovered function.
In the video above, we show how scaling the Fourier feature frequencies provides direct control over the width of the NTK. This allows us to traverse a regime from underfitting (low scale, recovered function too low frequency) to overfitting (high scale, recovered function too high frequency), with the best generalization performance in the middle. Note that each image shown is the output of a different trained MLP network. The networks are supervised on a subsampled 256 x 256 image and tested at the full 512 x 512 resolution.
Random Fourier features were first proposed in the seminal work of Rahimi & Recht (2007).
The neural tangent kernel was introduced in Jacot et al. (2018).
In own previous work on neural radiance fields (NeRF), we were surprised to find that a "positional encoding" of input coordinates helped networks learn significantly higher frequency details, inspiring our exploration in this project.
Sitzmann et al. (2020) concurrently introduced sinusoidal representation networks (SIREN), demonstrating exciting progress in coordinate based MLP representations by using a sine function as the nonlinearity between all layers in the network. This allows the MLPs to accurately represent first and second order derivatives of low dimensional signals.
You can find code to replicate all our experiments on GitHub, but if you just want to try experimenting with the images used on this webpage you can find the uncompressed originals here: Lion, Greece, Fox.
We thank Ben Recht for advice, and Cecilia Zhang, Tim Brooks, Jascha Sohl-Dickstein, Preetum Nakkiran, and Serena Wang for their comments on the text.
BM is funded by a Hertz Foundation Fellowship and acknowledges support from the Google BAIR Commons program. MT, PS, and SFK are funded by NSF Graduate Fellowships. RR was supported in part by ONR grants N000141712687 and N000142012529 and the Ronald L. Graham Chair. RN was supported in part by an FHL Vive Center Seed Grant. Google University Relations provided a generous donation of compute credits.
The website template was borrowed from Michaël Gharbi.