-ReQ : Assessing Representation Quality in SSL
with LincLab
The success of self-supervised learning algorithms has drastically changed the landscape of training deep neural networks. With well-engineered architectures and training objectives, SSL models learn "useful" representations from large datasets without label-dependency Self-supervised learning: The dark matter of intelligence, Lecun & Misra 2021 Advancing Self-Supervised and Semi-Supervised Learning with SimCLR, Chen & Hinton 2020. Notably, when finetuned on an appropriate downstream task, these models can achieve state-of-the-art performance, often with less data than supervised models. However, the success of SSL models has its challenges. In particular, SSL algorithms often require careful model selection to avoid representation collapse Understanding self-supervised Learning Dynamics without Contrastive Pairs, Tian 2021. Further, assessing the goodness of a learned representation is difficult, usually requiring additional computing resources to train linear probes with a labeled dataset.
Task-Agnostic measures for representation quality

The central thesis of this work explores the following
In perception, how do we efficiently determine the goodness of representations learned with SSL across a wide range of potential downstream tasks?
To answer this question, one must formally define the notion of goodness and examine statistical
estimators that measure this property without needing downstream evaluation. Beyond theoretical
interest, such a metric would be highly desirable for model selection and designing novel SSL
algorithms.
In search of such a measure, we focus on one of the more efficient learning machines – the mammalian
brain. The hierarchical and distributed organization of neural circuits, especially in the cortex,
provides neural representations that support many behaviors across many domains. For example, to support
downstream behaviors ranging from object categorization to movement detection and motor control Distributed hierarchical processing in the
primate cerebral cortex, Felleman & Essen 1991 .
Recent breakthroughs in systems neuroscience enable large-scale recordings of such neural activity. By
recording and analyzing the response to visual stimuli, High-dimensional geometry of population
responses in visual cortex, Stringer 2018 Increasing neural network robustness
improves
match to macaque V1 eigenspectrum, spatial frequency preference and predictivity
, Kong 2022, find that activations in the mouse and macaque monkey V1 exhibit a
characteristic information geometric structure. In particular, these representations are
high-dimensional, yet the amount of information encoded along the different principal directions varies
significantly. Notably, this variance (computed by measuring the eigenspectrum of the empirical
covariance matrix) is well-approximated by a power-law distribution with decay coefficient

Simultaneously, advances in theoretical machine learning provide insights into the generalization error
bounds for linear regression models in overparameterized regimes. For instance, recent studies on benign
overfitting in linear regression Benign
Overfitting in Linear Regression,
Bartlett 2019 identify that the minimum norm in the infinite-dimensional regime
Measuring Eigenspectrum Decay & Generalization
Inspired by these results, we dig deeper into the structure of representations learned by well-trained
feature encoders. One view of pretrained encoders, is that they learn mapping
from high-dimensional input-spaces to a lower-dimensional manifold, say
The spectral properties of

Building on the intuition from linear-regression, we examine
An immediate striking observation of this experiment suggests that intermediate representations in
canonical deep models follows a power-law distribution, where


These results suggest that the coefficient of spectral decay
Mapping the Design Landscape of Barlow-Twins
With empirical evidence of
In particular, consider the Barlow Twins learning objective (
The SSL loss (fixing #gradient-steps) is no longer useful to distinguish models with superior downstream performance. However, decay coefficient

Model Selection on a Compute Budget
Learning robust representations under resource constraints has the potential to democratize and
accelerate the development of AI systems. To study this, we consider the task of model selection on a
compute budget. Consider a scenario where we have fixed computing to select the most generalizable model
from a set of pretrained models with different hyper-parameters. In particular, we consider the setting
where we can train M models in parallel and require H sequential steps to evaluate
The standar linear evaluation protocol would require us to train

Open Problems
Efficient measures of representation quality learned with SSL, that generalize well across
a suite of downstream tasks, are still in their infancy. In this
work, we identify the eigenspectrum decay coefficient
Such metrics could also serve as proxies for neural architecture searchEvolving neural networks through augmenting
topologies, Stanley 2002 Neural
Architecture Search with Reinforcement Learning
, Zoph & Le 2016. For instance, estimating
Further natural questions include: How do we design learning objectives that implicitly or
explicitly optimize these metrics during training? These are exciting research directions, we hope our
preliminary investigation sparks community efforts in pursuing a principled approach to designing
self-supervised neural networks.