TBC: Tied Block Convolution

ABSTRACT

-----

Convolution is the main building block of convolutional neural networks (CNN). We observe that an optimized CNN often has highly correlated filters as the number of channels increases with depth, reducing the expressive power of feature representations. We propose Tied Block Convolution (TBC) that shares the same thinner filters over equal blocks of channels and produces multiple responses with a single filter. The concept of TBC can also be extended to group convolution and fully connected layers, and can be applied to various backbone networks and attention modules.

Our extensive experimentation on classification, detection, instance segmentation, and attention demonstrates TBC’s significant across-the-board gain over standard convolution and group convolution. The proposed TiedSE attention module can even use 64 $\times$ fewer parameters than the SE module to achieve comparable performance. In particular, standard CNNs often fail to accurately aggregate information in the presence of occlusion and result in multiple redundant partial object proposals. By sharing filters across channels, TBC reduces correlation and can effectively handle highly overlapping instances. TBC increases the average precision for object detection on MS-COCO by 6% when the occlusion ratio is 80%. Our code will be released.

METHOD

-----

Method overview

To generate two activation maps, standard convolution requires two full-size filters and group convolution requires two half-size filters, however, our tied block convolution only requires one half-size filter, that is, the parameters are reduced by 4 $\times$ . The idea of TBC can also be applied to fully connected and group convolutional layers.

Fig 1. Standard Conv vs. Group Conv vs. Tied Block Conv

Standard Convolution

Let the input feature be denoted by $X \in R^{c_{i} \times h_{i} \times w_{i}}$ and the output feature $\tilde{X} \in R^{c_{o} \times h_{o} \times w_{o}}$ , where $c, h, w$ are the number of channels, the height and width of feature maps respectively. The kernel size is $k \times k$ and the bias term is ignored for clarity. Standard Convolution, denoted by $*$ , can be formulated as:

\tilde{X} = X * W

where $W \in R^{c_{o} \times c_{i} \times k \times k}$ is the SC kernel. The parameters for SC is thus: $c_{o} \times c_{i} \times k \times k .$

Group Convolution

Group Convolution first divides input feature $X$ into $G$ equal-sized groups $X_{1}, . . ., X_{G}$ with size $c_{i} / G \times h_{i} \times w_{i}$ per group. Each group shares the same convolutional filters $W_{g}$ . The output of GC is computed as:

\tilde{X} = X_{1} * W_{1} \oplus X_{2} * W_{2} \oplus \dots \oplus X_{G} * W_{G}

where $\oplus$ is the concatenation operation along the channel dimension, $W_{g}$ is the convolution filters for group $g$ , where $g \in {1, \dots, G}$ , $W_{g} \in R^{\frac{c_{o}}{G} \times \frac{c_{i}}{G} \times k \times k}$ . The number of parameters for GC is: $G \times \frac{c_{o}}{G} \times \frac{c_{i}}{G} \times k \times k .$

Tied Block Convolution (TBC)

Tied Block Convolution reduces the effective number of filters by reusing filters across different feature groups with the following formula:

\tilde{X} = X_{1} * W^{'} \oplus X_{2} * W^{'} \oplus \dots \oplus X_{B} * W^{'}

where $W^{'} \in R^{\frac{c_{o}}{B} \times \frac{c_{i}}{B} \times k \times k}$ is the TBC filters shared among all the groups. The parameter number is: $\frac{c_{o}}{B} \times \frac{c_{i}}{B} \times k \times k .$

Tied Block Group Convolution (TGC)

The idea of tied block filtering can also be directly applied to group convolution, formulated as:

\tilde{X} = (X_{11} * W_{1}^{'} \oplus \dots \oplus X_{1 B} * W_{1}^{'}) \oplus \dots \oplus (X_{G 1} * W_{G}^{'} \oplus \dots \oplus X_{G B} * W_{G}^{'})

where $W_{g}^{'} \in R^{\frac{c_{o}}{B G} \times \frac{c_{i}}{B G} \times k \times k}$ , $X_{g b} \in R^{\frac{c_{i}}{B G} \times h_{i} \times w_{i}}$ is the divided feature map, $g \in [1, G]$ and $b \in [1, B]$ .

Tied Block Fully Connected Layer (TFC)

Convolution is a special case of fully connected (FC) layer, just as FC is a special case of convolution. We apply the same tied block filtering idea to FC. Tied block fully connected layer (TFC) shares the FC connections between equal blocks of input channels. Like TBC, TFC could reduce $B^{2}$ times parameters and $B$ times computational cost.

BottleNeck — Fig 2. Diagram of bottleneck modules for (a) TiedResNet with 4 splits (b) TiedResNeXt with 4 splits and (c) TiedResNeSt. Each tied block convolution (TBC) and tied block group convolution (TGC) has a specific block number.

Atten — Fig 3. Diagram of Tied attention modules. (a) TiedSEmodule replaces FC in the original squeeze-and-excitation (SE) module to be TFC. (b) TiedGCB module replaces standard convolution in global context block (GCB) with TBC.

Results

-----

DetectionFig — Fig 4. #params of backbones vs. their Average Precision on object detection and instance segmentation tasks of MS-COCO val-2017.

Occ — Fig 5. We evaluate TiedResNet and ResNet performance on object detection task of MS-COCO with different occlusion ratio $r$ .

Fig 6. Recognition accuracy and model size comparison on ImageNet-1k.

Fig 7. Comparison on instance segmentation task of Cityscapes val set.

Fig 8. Comparison on #params of attention module SE/TiedSE with various backbones.

Fig 9. Comparison on #params of attention module GCB/TiedGCB.

GradCAM — Fig 10. Additional Grad-CAM visualization comparison among ResNet50, ResNeXt50 and TiedRes-Net50 in Rows 2-4 respectively for images in Row 1.

PDF

-----

PUBLICATION

-----

Tied Block Convolution: Leaner and Better CNNs with Shared Thinner Filters
Xudong Wang and Stella X. Yu
The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI), 2021.

CITATION

-----

@article{wang2020unsupervised,
title={Tied Block Convolution: Leaner and Better CNNs with Shared Thinner Filters},
author={Wang, Xudong and Yu, Stella X},
journal={arXiv preprint arXiv:2009.12021},
year={2020}
}

Tied Block Convolution (TBC):
Leaner and Better CNNs with Shared Thinner Filters

Xudong Wang^1,2 Stella Yu^1,2

¹University of California, Berkeley ²International Computer Science Institute

[Preprint] [PDF] [Github] [BibTex]

ABSTRACT

METHOD

Method overview

Fig 1. Standard Conv vs. Group Conv vs. Tied Block Conv

Standard Convolution

Group Convolution

Tied Block Convolution (TBC)

Tied Block Group Convolution (TGC)

Tied Block Fully Connected Layer (TFC)

Fig 2. Diagram of bottleneck modules for (a) TiedResNet with 4 splits (b) TiedResNeXt with 4 splits and (c) TiedResNeSt. Each tied block convolution (TBC) and tied block group convolution (TGC) has a specific block number.

Fig 3. Diagram of Tied attention modules. (a) TiedSEmodule replaces FC in the original squeeze-and-excitation (SE) module to be TFC. (b) TiedGCB module replaces standard convolution in global context block (GCB) with TBC.

Results

Fig 4. #params of backbones vs. their Average Precision on object detection and instance segmentation tasks of MS-COCO val-2017.

Fig 5. We evaluate TiedResNet and ResNet performance on object detection task of MS-COCO with different occlusion ratio $r$ .

Fig 6. Recognition accuracy and model size comparison on ImageNet-1k.

Fig 7. Comparison on instance segmentation task of Cityscapes val set.

Fig 8. Comparison on #params of attention module SE/TiedSE with various backbones.

Fig 9. Comparison on #params of attention module GCB/TiedGCB.

Fig 10. Additional Grad-CAM visualization comparison among ResNet50, ResNeXt50 and TiedRes-Net50 in Rows 2-4 respectively for images in Row 1.

PDF

PUBLICATION

CITATION

ABSTRACT

METHOD

Method overview

Fig 1. Standard Conv vs. Group Conv vs. Tied Block Conv

Standard Convolution

Group Convolution

Tied Block Convolution (TBC)

Tied Block Group Convolution (TGC)

Tied Block Fully Connected Layer (TFC)

Fig 2. Diagram of bottleneck modules for (a) TiedResNet with 4 splits (b) TiedResNeXt with 4 splits and (c) TiedResNeSt. Each tied block convolution (TBC) and tied block group convolution (TGC) has a specific block number.

Fig 3. Diagram of Tied attention modules. (a) TiedSEmodule replaces FC in the original squeeze-and-excitation (SE) module to be TFC. (b) TiedGCB module replaces standard convolution in global context block (GCB) with TBC.

Results

Fig 4. #params of backbones vs. their Average Precision on object detection and instance segmentation tasks of MS-COCO val-2017.

Fig 5. We evaluate TiedResNet and ResNet performance on object detection task of MS-COCO with different occlusion ratio r.

Fig 6. Recognition accuracy and model size comparison on ImageNet-1k.

Fig 7. Comparison on instance segmentation task of Cityscapes val set.

Fig 8. Comparison on #params of attention module SE/TiedSE with various backbones.

Fig 9. Comparison on #params of attention module GCB/TiedGCB.

Fig 10. Additional Grad-CAM visualization comparison among ResNet50, ResNeXt50 and TiedRes-Net50 in Rows 2-4 respectively for images in Row 1.

PDF

PUBLICATION

CITATION

Fig 5. We evaluate TiedResNet and ResNet performance on object detection task of MS-COCO with different occlusion ratio $r$ .