Rethinking Long-Tailed Visual Recognition with Dynamic Probability Smoothing and Frequency Weighted Focusing

Abstract

Deep learning models trained on long-tailed (LT) datasets often exhibit bias towards head classes with high frequency. This paper highlights the limitations of existing solutions that combine class- and instance-level re-weighting loss in a naive manner. Specifically, we demonstrate that such solutions re- sult in overfitting the training set, significantly impacting the rare classes. To address this issue, we propose a novel loss function that dynamically reduces the influence of outliers and assigns class-dependent focusing parameters. We also introduce a new long-tailed dataset, ICText-LT, featuring var- ious image qualities and greater realism than artificially sam- pled datasets. Our method has proven effective, outperform- ing existing methods through superior quantitative results on CIFAR-LT, Tiny ImageNet-LT, and our new ICText-LT datasets.

Contributions

FFDS & D-FFDS

ICText-LT Dataset

Promising Results

FFDS & D-FFDS Loss

Through our experiments with the CB-FL Loss, which combines instance- and class-level re-weighting, we observe that it is prone to overfit with a large gap between training and testing accuracy and deteriorates as class frequency decreases (Fig. 1). We hypothesize this is due to an unintentional focus on outliers with significant instance-level weights. Furthermore, overfitting is exacerbated when the number of samples decreases as class-level weights amplify the loss of outliers.

To address these issues, we introduce Frequency-weighted Focusing with Dynamic Smoothing Loss (FFDS), comprising two modules: (i) frequency-weighted focusing (FreqFocus) to address intra-class imbalance by re-weighting hard examples of each class based on class frequency, and (ii) dynamic probability smoothing (DynaSmooth) to alleviate overfitting observed in CB-FL. The equation of FFDS is shown in Eq. 1.

To address convergence difficulties and training instability, our deferred variant, D-FFDS, fine-tunes models only after learning meaningful representations with Cross Entropy (CE). In other words, the training process is separated into two phases, with CE used in the first phase and FFDS in the second phase.

Figure 1: Overfitting of CB-FL on CIFAR-100-LT, IF = 100. The colored areas between the upper bound (training acc.) and lower bound (testing acc.) indicate the difference of acc. on training and testing sets. As the class frequency decreases, the gap gets larger. The smaller colored areas prove the superiority of our method (FFDS).

$$L_{\text{FFDS}}(z,y) = - w_{y} (1-\hat{p}_{y})^{\gamma_{y}} \sum\nolimits^{C}_{j} Q(j)\log(p_{j})$$

Equation 1: Equation of our proposed loss function $L_\text{FFDS}$.

FreqFocus

In many cases, improving the performance of head classes is limited due to the presence of challenging examples, while tail classes suffer from limited data availability. To tackle this, we introduce a focusing parameter, $\gamma_y$, aimed at encouraging the model to give more attention to hard examples in head classes while treating examples in tail classes equally. We formulate this idea using three possible curves: linear (Eq. 2), convex (Eq.3) and concave (Eq. 4). Fig. 2 illustrates the relationship between class frequency and $\gamma_y$, along with their respective accuracy on CIFAR100-LT, IF=100.

$$\gamma_{y}=\gamma_{\text{min}}+(\gamma_{\text{max}}-\gamma_{\text{min}})\left( \tfrac{N_{y}-N_{\text{min}}}{N_{\text{max}}-N_{\text{min}}} \right)$$

Equation 2: Linear form defining $\gamma_y$.

$$\gamma_{y}=\gamma_{\text{min}}+(\gamma_{\text{max}}-\gamma_{\text{min}})\left( \tfrac{N_{y}-N_{\text{min}}}{N_{\text{max}}-N_{\text{min}}} \right)^3$$

Equation 3: Convex form defining $\gamma_y$.

$$\gamma_{y}=\gamma_{\text{min}}+(\gamma_{\text{max}}-\gamma_{\text{min}})\tanh\left( 4\cdot \tfrac{N_{y}-N_{\text{min}}}{N_{\text{max}}-N_{\text{min}}} \right)$$

Equation 4: Concave form defining $\gamma_y$.

Figure 2: Relationship between class frequency and $\gamma_y$, along with their respective accuracy on CIFAR-100-LT, IF = 100.

DynaSmooth

DynaSmooth addresses the issue of over-focusing on outliers by dynamically reducing instance-level weights in proportion to their likelihood of being outliers. To maintain training stability, we compute the likelihood of being outliers within groups partitioned according to class frequency. This likelihood is defined as proportional to the difference between the predicted probability and the mean predicted probability of the group representing the ground-truth class. Calculating the square root difference prioritizes small deviations when both values are small. The instance-wise loss is then computed on the smoothed predicted probability, $\hat{p}$, which is shifted towards the mean. This effectively reduces the extremely high loss on outliers, mitigating overfitting. As the boxplot demonstrated in Fig. 3, the number of outliers is significantly reduced.

Figure 3: Relationship between class frequency and $\gamma_y$, along with their respective accuracy on CIFAR-100-LT, IF = 100.

ICText-LT Dataset

ICText is an industrial-based dataset focused on detecting printed characters on chip components with varying qualities. It comprises 62 classes (A-Z, a-z, 0-9) and exhibits long-tail distribution in both training and testing sets . Herein, we resample and balance the distribution of the testing set by removing lower-case letters. As a result, the new ICText-LT dataset comprises 36 classes, including capital letters A to Z and digits 0 to 9. It contains 68,000 imbalanced training images and 6,300 balanced testing images, resulting in a natural imbalance factor of 18. To achieve an imbalance factor of 100, further data re-sampling can be applied. The distributions of ICText-LT’s training set are shown in Fig. 4, along with a few samples to illustrate the variation of images and their level of challenge.

Figure 4: Distribution of ICText-LT’s training set with IF ∈ {18, 100}. Images shown are good (left) and bad (right) quality samples.

Rethinking Long-Tailed Visual Recognition with Dynamic Probability Smoothing and Frequency Weighted Focusing

Abstract

Video

Contributions

FFDS & D-FFDS

ICText-LT Dataset

Promising Results

FFDS & D-FFDS Loss

FreqFocus

DynaSmooth

ICText-LT Dataset

Experimental Results

BibTeX