DRCT

Saving Image Super-Resolution away from Information Bottlenecks

CVPR 2024

arXiv Code
Descriptive text for the image
The feature map intensity on various benchmark datasets. We observed that feature map intensities decrease sharply at the end of SISR network, indicating potential information loss. In this paper, we propose DRCT to address this issue by enhancing receptive fields and adding dense-connections within residual blocks to mitigate information bottlenecks, thereby improving performance with a simpler model design.
Abstract
In recent years, Vision Transformer-based approaches for low-level vision tasks have achieved widespread success. Unlike CNN-based models, Transformers are more adept at capturing long-range dependencies, enabling the reconstruction of images utilizing non-local information. In the domain of super-resolution, Swin-transformer-based models have become mainstream due to their capability of global spatial information modeling and their shifting-window attention mechanism that facilitates the interchange of information between different windows. Many researchers have enhanced model performance by expanding the receptive fields or designing meticulous networks, yielding commendable results. However, we observed that it is a general phenomenon for the feature map intensity to be abruptly suppressed to small values towards the network’s end. This implies an information bottleneck and a diminishment of spatial information, implicitly limiting the model’s potential. To address this, we propose the Dense-residual-connected Transformer (DRCT), aimed at mitigating the loss of spatial information through dense-residual connections between layers, thereby unleashing the model’s potential and saving model away from information bottleneck. Experiment results indicate that our approach surpasses state-of-the-art methods on benchmarks dataset and performs commendably at NTIRE-2024 Image Super-Resolution (x4) Challenge.
Descriptive text for the image
The feature map visualization displays, from top to bottom, SwinIR, HAT, and DRCT, with positions further to the right representing deeper layers within the network. For both SwinIR and HAT, the intensity of the feature maps is significant in the shallower layers but diminishes towards the network’s end. We consider this phenomenon implies the loss of spatial information, leading to the limitation and information bottleneck with SISR tasks. (zoom in to better observe the color-bar besides feature maps.)
Descriptive text for the image
Quantitative comparison with the several peer-methods on benchmark datasets. "" indicates that methods adopt pre-training strategy on ImageNet. "" represents that methods use same-task progressive-training strategy. The top three results are marked in red, blue, and orange, respectively.
Descriptive text for the image
The model complexity comparison between SwinIR, HAT, and proposed DRCT evaluated on Urban100 dataset.
If you find our work helpful, please consider citing the following:

BibTeX

 
        @misc{hsu2024drct,
            title={DRCT: Saving Image Super-resolution away from Information Bottleneck}, 
            author={Chih-Chung Hsu and Chia-Ming Lee and Yi-Shiuan Chou},
            year={2024},
            eprint={2404.00722},
            archivePrefix={arXiv},
            primaryClass={cs.CV}
        }