Abstract

Pseudo-labels are confident predictions made on unlabeled target data by a classifier trained on labeled source data. They are widely used for adapting a model to unlabeled data, e.g., in a semi-supervised learning setting.
Our key insight is that pseudo-labels are naturally imbalanced due to intrinsic data similarity, even when a model is trained on balanced source data and evaluated on balanced target data. If we address this previously unknown imbalanced classification problem arising from pseudo-labels instead of ground-truth training labels, we could remove model biases towards false majorities created by pseudo-labels.
We propose a novel and effective debiased learning method with pseudo-labels, based on counterfactual reasoning and adaptive margins: The former removes the classifier response bias, whereas the latter adjusts the margin of each class according to the imbalance of pseudo-labels. Validated by extensive experimentation, our simple debiased learning delivers significant accuracy gains over the state-of-the-art on ImageNet-1K: 26% for semi-supervised learning with 0.2% annotations and 9% for zero-shot learning.

Pseudo-labels are Naturally Imbalanced!

Surprisingly, we find that pseudo-labels of target data produced by typical Semi-Supervised Learning (SSL) and transductive Zero-Shot Learning (ZSL) methods (i.e., FixMatch and CLIP) are highly biased, even when both source and target data are class-balanced or even sampled from the same domain. This imbalanced pseudo-label issue is not unique, and can be observed across almost all experimented datasets. Please check our paper for more details.

Model

A simple yet effective method DebiasPL is proposed to dynamically alleviate biased pseudo-labels' influence on a student model, without leveraging any prior knowledge of true data distribution. We propose a novel and effective debiased learning method with pseudo-labels, based on counterfactual reasoning and adaptive margins: The former removes the classifier response bias, whereas the latter adjusts the margin of each class according to the imbalance of pseudo-labels.

[Code]

Results

As a universal add-on, DebiasPL delivers significantly better performance than previous state-of-the-arts on both semisupervised learning and transductive zero-shot learning tasks. Experimented on ImageNet: we deliver 26% performance gains to semi-supervised learning with 0.2% annotations and 9% for zero-shot learning.

For zero-shot learning, DebiasPL exhibits stronger robustness to domain shifts. For more results, please check our paper.

Materials


Paper		Slides

CITATION

If you find our work inspiring or use our codebase in your research, please cite our work:

@inproceedings{wang2022debiased,
author={Wang, Xudong and Wu, Zhirong and Lian, Long and Yu, Stella X},
title={Debiased Learning from Naturally Imbalanced Pseudo-Labels},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2022},
}

Acknowledgments

This work was supported, in part, by Berkeley Deep Drive.