Learning to Transform for Generalizable Instance-wise Invariance

Learning to Transform for Generalizable Instance-wise Invariance
ICCV 2023

Utkarsh Singhal
UC Berkeley
Carlos Esteves
Google Research
Ameesh Makadia
Google Research
Stella Yu
University of Michigan

TL;DR

We predict a distribution of spatial transformations for any input image. This can be used for data augmentation, aligning instances before classification, and adapting to out-of-distribution poses.

Abstract

Computer vision research has long aimed to build systems that are robust to spatial transformations found in natural data. Traditionally, this is done using data augmentation or hard-coding invariances into the architecture. However, too much or too little invariance can hurt, and the correct amount is unknown a priori and dependent on the instance. Ideally, the appropriate invariance would be learned from data and inferred at test-time.
We treat invariance as a prediction problem. Given any image, we use a normalizing flow to predict a distribution over transformations and average the predictions over them. Since this distribution only depends on the instance, we can align instances before classifying them and generalize invariance across classes. The same distribution can also be used to adapt to out-of-distribution poses. This normalizing flow is trained end-to-end and can learn a much larger range of transformations than Augerino and InstaAug. When used as data augmentation, our method shows accuracy and robustness gains on CIFAR 10, CIFAR10-LT, and TinyImageNet.

Citation

                        @InProceedings{Singhal_2023_flowinv_ICCV,
                            author    = {Singhal, Utkarsh and Esteves, Carlos and Makadia, Ameesh and Yu, Stella X.},
                            title     = {Learning to Transform for Generalizable Instance-wise Invariance},
                            booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
                            month     = {October},
                            year      = {2023},
                            pages     = {6211-6221}
                        }

Acknowledgements

This work was supported, in part, by the BAIR/Google fund.

The website template was borrowed from the Fourier Feature Networks project page and Michaël Gharbi.