CN116129215A

CN116129215A - Long-tail target detection method based on deep learning

Info

Publication number: CN116129215A
Application number: CN202211677431.2A
Authority: CN
Inventors: 陶莹
Original assignee: Zhengzhou University
Current assignee: Zhengzhou University
Priority date: 2022-12-26
Filing date: 2022-12-26
Publication date: 2023-05-16

Abstract

The invention discloses a long tail target detection method based on deep learning, which comprises the following steps: s1, acquiring an image dataset: acquiring an image data set conforming to long tail distribution, dividing the data set, and dividing a training set and a testing set; s2, preprocessing a data set: preprocessing the data set, and calculating the effective sample number of each category; s3, outputting logit: training on a training set by adopting a pre-training model to obtain logit output by a training network; s4, screening semantic similarity types: setting a threshold value of the network output logic, and only inhibiting the tail class which is semantically similar to the head class, so that the attention degree of the network to the tail class is improved; and then, adaptively adjusting the inhibition gradient between semantic similar categories according to the logit output by the network, enhancing the distinction of tail categories, and carrying out experiments on a long tail data set LVIS with obvious difference of data quantity between the categories, thereby improving the detection precision of the tail categories.

Description

Long-tail target detection method based on deep learning

Technical Field

The invention relates to the technical field of machine learning, in particular to a long tail target detection method based on deep learning.

Background

The rapid growth of large-scale data sets in the real world makes it a challenging task for deep learning-based methods to solve the problem of target detection of long-tail data distributions. Deep long tail learning is one of the most challenging problems in visual recognition, aiming at training a well-behaved deep model from a large number of images following long tail class distribution. The long tail data distribution, i.e., the few head classes, accounts for most of the data, while most tail classes are not adequately represented. Aiming at the problems that the long-tail data set is inaccurate in tail category prediction, low in detection precision, a model obtained through training lacks discrimination on head and tail categories with similar semantics, and the like, the main stream long-tail solving method reduces the suppression on the tail category to different degrees, but in the methods, categories with similar semantics in any data set are ignored. Particularly, when there is a category similar to the head category in the tail category, in order to reduce the suppression of the tail category, the negative gradient generated by the head category is ignored, and the network is hard to learn the characteristic with discrimination, so that a long tail target detection method based on deep learning is needed.

Disclosure of Invention

The invention aims to provide a long tail target detection method based on deep learning, which aims to solve the problems in the background technology.

In order to achieve the above purpose, the present invention provides the following technical solutions: a long tail target detection method based on deep learning comprises the following steps:

s1, acquiring an image dataset: acquiring an image data set conforming to long tail distribution, dividing the data set, and dividing a training set and a testing set;

s2, preprocessing a data set: preprocessing the data set, and calculating the effective sample number of each category;

s3, outputting logit: training on a training set by adopting a pre-training model to obtain logit output by a training network;

s4, screening semantic similarity types: setting a threshold value of the network output logic, and only inhibiting tail classes similar to head class semantics;

s5, weight setting: setting weights according to the inverse of the effective sample number;

s6, testing a target model: and after the weight is set, a final target detection model is obtained, and the final detection model is tested on a test set to obtain a test result.

Preferably, the dataset is an LVIS dataset.

Preferably, in the step S2, the effective sample number of the class is calculated to obtain the effective sample number actually available for each data class, and the formula is used as follows:

E _n ＝(1-β ⁿ )/(1-β)，β＝(N-1)/N；

wherein the sample data set is S, the number of effective samples is N, en represents the expected value of the nth sample, N is the number of samples, and when N exceeds the relative threshold, the effective number of samples is the same as the number of samples N, so as to reduce data overlap, so that each sample is unique.

Preferably, in the step S3, the image generates a feature map through a backbone network, generates an interested region on the feature map, extracts a feature for each region, and finally processes the output of the network to obtain the distribution of each category, and calculates the cross entropy between the estimated distribution and the real frame distribution;

the cross entropy is a Sigmoid cross entropy function, the Sigmoid cross entropy function is used for calculating a logic value Zi output by a full-connection layer, the Zi is a class i logic value output by a network, the probability pi of each class is independently estimated through the Sigmoid cross entropy function and mapped between (0 and 1), and the probability that the current sample k belongs to the class i is obtained, wherein the formula is as follows:

class-real labels y are classified into two classes, y when the candidate region is a background class or does not belong to class i _k =0, y when the candidate region belongs to class i _k =0, the cross entropy loss formula is as follows:

obtaining a gradient of a logic value Zi output to a full-connection layer according to a cross entropy loss formula, and generating a negative inhibition gradient for a cross entropy loss function to force a classifier i to output low confidence coefficient for a tail class i not equal to k;

deriving from the gradient formula of the sigmoid cross entropy loss function with respect to the logic value zi of the network output, when the class distribution of long tails is not considered, for the tail class i not belonging to the k class, generating a negative suppression gradient to force the classifier to output a low confidence level will produce a beneficial gradient suppression mechanism.

Preferably, the logic value output by the network in S4 is compared with a set threshold, when the logic value is greater than the set threshold, the current class is inhibited, and when the logic value is less than the threshold, the current class is not inhibited, so that the negative inhibition gradient of the received network becomes smaller, and the accuracy of the tail class is improved.

Preferably, the step S5 selects the category to be inhibited in the step S4, and calculates the extent of inhibition that each category should be subjected to according to the number of valid samples of the category, that is, the reciprocal of each valid sample, where a specific calculation formula is as follows:

wherein, xi is the super parameter of the threshold value, beta' is the super parameter of the effective sample, and w _i Is the overall weight.

Compared with the prior art, the invention has the beneficial effects that:

in the invention, the reciprocal of the number of the effective samples of each category is used as the quantity factor among the categories, and different categories are weighted in the classification task of target detection, so that the attention degree of the network to the tail category is improved; and then, adaptively adjusting the inhibition gradient between semantic similar categories according to the logit output by the network, enhancing the distinction of tail categories, and carrying out experiments on a long tail data set LVIS with obvious difference of data quantity between the categories, thereby improving the detection precision of the tail categories.

Drawings

FIG. 1 is a schematic flow chart of a long tail target detection method based on deep learning;

FIG. 2 is a schematic diagram of a calculation flow of a long tail target detection method based on deep learning;

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1-2, the present invention provides a technical solution:

a long tail target detection method based on deep learning comprises the following steps: the method comprises the following steps:

Specifically, the dataset is an LVIS dataset.

Specifically, in S2, the effective sample number of the class is calculated to obtain the effective sample number actually available for each data class, and the formula is used as follows:

E _n ＝(1-β ⁿ )/(1-β)，β＝(N-1)/N；

Specifically, in S3, the image generates a feature map through a backbone network, generates an interested region on the feature map, extracts a feature for each region, and finally processes the output of the network to obtain the distribution of each category, and calculates the cross entropy between the estimated distribution and the real frame distribution;

Specifically, the logic value output by the network in S4 is compared with a set threshold, when the logic value is greater than the set threshold, the current class is inhibited, and when the logic value is less than the threshold, the current class is not inhibited, so that the negative inhibition gradient of the received network becomes smaller, and the accuracy of the tail class is improved.

Specifically, S5 selects the class to be inhibited in S4, and calculates the extent of inhibition that each class should be subjected to according to the number of valid samples of the class, that is, the reciprocal of each valid sample, where a specific calculation formula is:

According to the technical scheme, the working steps of the scheme are summarized and carded: in the invention, the reciprocal of the effective sample number of each category is used as the quantity factor among the categories, and different categories are weighted in the classification task of target detection, so that the attention degree of the network to the tail category is improved; then adaptively adjusting the suppression gradient among the semantic similar categories according to the logit output by the network, enhancing the distinction of the tail categories, carrying out experiments on a long tail data set LVIS with obvious difference in data quantity among the categories, improving the detection precision of the tail categories, simultaneously calculating the effective sample number of the categories by using a reasonable theoretical framework, restricting the weight based on an effective sample number design quantity factor, and then adaptively generating the suppression gradient according to the learning state of the network, thereby avoiding the problems caused by resampling, ensuring the training consistency from the data set, improving the discrimination of the tail categories and the semantic similar categories, enabling the model to pay more attention to the tail categories in the whole calculation process, and improving the discrimination of the head categories and the tail categories among the semantic similar categories.

None of the inventions are related to the same or are capable of being practiced in the prior art. Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. The long tail target detection method based on deep learning is characterized by comprising the following steps of:

2. The long-tail target detection method based on deep learning as claimed in claim 1, wherein the method comprises the following steps: the dataset is an LVIS dataset.

3. The long-tail target detection method based on deep learning as claimed in claim 1, wherein the method comprises the following steps: in the step S2, the effective sample number of the category is calculated, the effective sample number of each category is obtained, and the formula is used as follows:

E _n ＝(1-β ⁿ )/(1-β)，β＝(N-1)/N；

4. The long-tail target detection method based on deep learning as claimed in claim 1, wherein the method comprises the following steps: in the step S3, the image generates a feature map through a backbone network, an interested region is generated on the feature map, a feature is extracted for each region, finally, the output of the network is processed to obtain the distribution of each category, and the cross entropy between the estimated distribution and the real frame distribution is calculated;

deriving from the gradient formula of the sigmoid cross entropy loss function with respect to the logic value zi of the network output, when the long-tailed class distribution is not considered, for the tail class i+.k not belonging to the k class, a negative rejection gradient is generated to force the classifier to output a low confidence.

5. The long-tail target detection method based on deep learning as claimed in claim 1, wherein the method comprises the following steps: and S4, comparing the logic value output by the network with a set threshold value, and suppressing the current class when the logic value is larger than the set threshold value and not suppressing the current class when the logic value is smaller than the threshold value, so that the negative suppression gradient of the received network becomes smaller, and the accuracy of the tail class is improved.

6. The long-tail target detection method based on deep learning as claimed in claim 1, wherein the method comprises the following steps: and S5, selecting the category to be inhibited in S4, and calculating the inhibition degree of each category according to the effective sample number of the category, namely the reciprocal of each effective sample number, wherein the specific calculation formula is as follows:

wherein, xi is the super parameter of the threshold value, beta is the super parameter of the effective sample, and w _i Is the overall weight.