Rethinking Long-Tailed Visual Recognition with Dynamic Probability Smoothing and Frequency Weighted Focusing

Wan Jun Nah1, Chun Chet Ng1,, Che-Tsung Lin2,, Yeong Khang Lee3,
Jie Long Kew1, Zhi Qin Tan1, Chee Seng Chan1, Christopher Zach2, Shang Hong Lai4
1Universiti Malaya, Kuala Lumpur, Malaysia
2Chalmers University of Technology, Gothenburg, Sweden
3ViTrox Corporation Berhad, Penang, Malaysia
4National Tsing Hua University, Hsinchu, Taiwan
Paper Poster Slides GitHub ICText-LT Dataset BibTex

Abstract

Deep learning models trained on long-tailed (LT) datasets often exhibit bias towards head classes with high frequency. This paper highlights the limitations of existing solutions that combine class- and instance-level re-weighting loss in a naive manner. Specifically, we demonstrate that such solutions re- sult in overfitting the training set, significantly impacting the rare classes. To address this issue, we propose a novel loss function that dynamically reduces the influence of outliers and assigns class-dependent focusing parameters. We also introduce a new long-tailed dataset, ICText-LT, featuring var- ious image qualities and greater realism than artificially sam- pled datasets. Our method has proven effective, outperform- ing existing methods through superior quantitative results on CIFAR-LT, Tiny ImageNet-LT, and our new ICText-LT datasets.

Video

Contributions

FFDS & D-FFDS Loss

Through our experiments with the CB-FL Loss, which combines instance- and class-level re-weighting, we observe that it is prone to overfit with a large gap between training and testing accuracy and deteriorates as class frequency decreases (Fig. 1). We hypothesize this is due to an unintentional focus on outliers with significant instance-level weights. Furthermore, overfitting is exacerbated when the number of samples decreases as class-level weights amplify the loss of outliers.

To address these issues, we introduce Frequency-weighted Focusing with Dynamic Smoothing Loss (FFDS), comprising two modules: (i) frequency-weighted focusing (FreqFocus) to address intra-class imbalance by re-weighting hard examples of each class based on class frequency, and (ii) dynamic probability smoothing (DynaSmooth) to alleviate overfitting observed in CB-FL. The equation of FFDS is shown in Eq. 1.

To address convergence difficulties and training instability, our deferred variant, D-FFDS, fine-tunes models only after learning meaningful representations with Cross Entropy (CE). In other words, the training process is separated into two phases, with CE used in the first phase and FFDS in the second phase.

Figure 1: Overfitting of CB-FL on CIFAR-100-LT, IF = 100. The colored areas between the upper bound (training acc.) and lower bound (testing acc.) indicate the difference of acc. on training and testing sets. As the class frequency decreases, the gap gets larger. The smaller colored areas prove the superiority of our method (FFDS).

$$L_{\text{FFDS}}(z,y) = - w_{y} (1-\hat{p}_{y})^{\gamma_{y}} \sum\nolimits^{C}_{j} Q(j)\log(p_{j})$$

Equation 1: Equation of our proposed loss function \(L_\text{FFDS}\).

FreqFocus

In many cases, improving the performance of head classes is limited due to the presence of challenging examples, while tail classes suffer from limited data availability. To tackle this, we introduce a focusing parameter, \(\gamma_y\), aimed at encouraging the model to give more attention to hard examples in head classes while treating examples in tail classes equally. We formulate this idea using three possible curves: linear (Eq. 2), convex (Eq.3) and concave (Eq. 4). Fig. 2 illustrates the relationship between class frequency and \(\gamma_y\), along with their respective accuracy on CIFAR100-LT, IF=100.

$$\gamma_{y}=\gamma_{\text{min}}+(\gamma_{\text{max}}-\gamma_{\text{min}})\left( \tfrac{N_{y}-N_{\text{min}}}{N_{\text{max}}-N_{\text{min}}} \right)$$

Equation 2: Linear form defining \(\gamma_y\).

$$\gamma_{y}=\gamma_{\text{min}}+(\gamma_{\text{max}}-\gamma_{\text{min}})\left( \tfrac{N_{y}-N_{\text{min}}}{N_{\text{max}}-N_{\text{min}}} \right)^3$$

Equation 3: Convex form defining \(\gamma_y\).

$$\gamma_{y}=\gamma_{\text{min}}+(\gamma_{\text{max}}-\gamma_{\text{min}})\tanh\left( 4\cdot \tfrac{N_{y}-N_{\text{min}}}{N_{\text{max}}-N_{\text{min}}} \right)$$

Equation 4: Concave form defining \(\gamma_y\).
Figure 2: Relationship between class frequency and \(\gamma_y\), along with their respective accuracy on CIFAR-100-LT, IF = 100.

DynaSmooth

DynaSmooth addresses the issue of over-focusing on outliers by dynamically reducing instance-level weights in proportion to their likelihood of being outliers. To maintain training stability, we compute the likelihood of being outliers within groups partitioned according to class frequency. This likelihood is defined as proportional to the difference between the predicted probability and the mean predicted probability of the group representing the ground-truth class. Calculating the square root difference prioritizes small deviations when both values are small. The instance-wise loss is then computed on the smoothed predicted probability, \(\hat{p}\), which is shifted towards the mean. This effectively reduces the extremely high loss on outliers, mitigating overfitting. As the boxplot demonstrated in Fig. 3, the number of outliers is significantly reduced.

Figure 3: Relationship between class frequency and \(\gamma_y\), along with their respective accuracy on CIFAR-100-LT, IF = 100.

ICText-LT Dataset

ICText is an industrial-based dataset focused on detecting printed characters on chip components with varying qualities. It comprises 62 classes (A-Z, a-z, 0-9) and exhibits long-tail distribution in both training and testing sets . Herein, we resample and balance the distribution of the testing set by removing lower-case letters. As a result, the new ICText-LT dataset comprises 36 classes, including capital letters A to Z and digits 0 to 9. It contains 68,000 imbalanced training images and 6,300 balanced testing images, resulting in a natural imbalance factor of 18. To achieve an imbalance factor of 100, further data re-sampling can be applied. The distributions of ICText-LT’s training set are shown in Fig. 4, along with a few samples to illustrate the variation of images and their level of challenge.

Figure 4: Distribution of ICText-LT’s training set with IF ∈ {18, 100}. Images shown are good (left) and bad (right) quality samples.

Experimental Results

Tab. 1 shows the comparisons of our approach to various one-stage and two-stage competing methods. In both training schemes, our methods (FFDS and D-FFDS) outperform the others, with D-FFDS having the best performance across all datasets.

Table 1: Comparison of testing accuracy on public CIFAR-10-LT, CIFAR-100-LT, Tiny ImageNet-LT and ICText-LT datasets.

BibTeX

If you find our paper and repository useful, please cite:

@inproceedings{icip2023_ffds,
author = {Nah, Wan Jun and Ng, Chun Chet and Lin, Che-Tsung and Lee, Yeong Khang
and Kew, Jie Long and Tan, Zhi Qin and Chan, Chee Seng
and Zach, Christopher and Lai, Shang-Hong},
booktitle = {2023 30th IEEE International Conference on Image Processing (ICIP)},
title = {Rethinking Long-Tailed Visual Recognition with Dynamic Probability Smoothing
and Frequency Weighted Focusing},
year = {2023}
}