CN116129215A - Long-tail target detection method based on deep learning - Google Patents
Long-tail target detection method based on deep learning Download PDFInfo
- Publication number
- CN116129215A CN116129215A CN202211677431.2A CN202211677431A CN116129215A CN 116129215 A CN116129215 A CN 116129215A CN 202211677431 A CN202211677431 A CN 202211677431A CN 116129215 A CN116129215 A CN 116129215A
- Authority
- CN
- China
- Prior art keywords
- class
- tail
- network
- output
- long
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/7715—Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a long tail target detection method based on deep learning, which comprises the following steps: s1, acquiring an image dataset: acquiring an image data set conforming to long tail distribution, dividing the data set, and dividing a training set and a testing set; s2, preprocessing a data set: preprocessing the data set, and calculating the effective sample number of each category; s3, outputting logit: training on a training set by adopting a pre-training model to obtain logit output by a training network; s4, screening semantic similarity types: setting a threshold value of the network output logic, and only inhibiting the tail class which is semantically similar to the head class, so that the attention degree of the network to the tail class is improved; and then, adaptively adjusting the inhibition gradient between semantic similar categories according to the logit output by the network, enhancing the distinction of tail categories, and carrying out experiments on a long tail data set LVIS with obvious difference of data quantity between the categories, thereby improving the detection precision of the tail categories.
Description
Technical Field
The invention relates to the technical field of machine learning, in particular to a long tail target detection method based on deep learning.
Background
The rapid growth of large-scale data sets in the real world makes it a challenging task for deep learning-based methods to solve the problem of target detection of long-tail data distributions. Deep long tail learning is one of the most challenging problems in visual recognition, aiming at training a well-behaved deep model from a large number of images following long tail class distribution. The long tail data distribution, i.e., the few head classes, accounts for most of the data, while most tail classes are not adequately represented. Aiming at the problems that the long-tail data set is inaccurate in tail category prediction, low in detection precision, a model obtained through training lacks discrimination on head and tail categories with similar semantics, and the like, the main stream long-tail solving method reduces the suppression on the tail category to different degrees, but in the methods, categories with similar semantics in any data set are ignored. Particularly, when there is a category similar to the head category in the tail category, in order to reduce the suppression of the tail category, the negative gradient generated by the head category is ignored, and the network is hard to learn the characteristic with discrimination, so that a long tail target detection method based on deep learning is needed.
Disclosure of Invention
The invention aims to provide a long tail target detection method based on deep learning, which aims to solve the problems in the background technology.
In order to achieve the above purpose, the present invention provides the following technical solutions: a long tail target detection method based on deep learning comprises the following steps:
s1, acquiring an image dataset: acquiring an image data set conforming to long tail distribution, dividing the data set, and dividing a training set and a testing set;
s2, preprocessing a data set: preprocessing the data set, and calculating the effective sample number of each category;
s3, outputting logit: training on a training set by adopting a pre-training model to obtain logit output by a training network;
s4, screening semantic similarity types: setting a threshold value of the network output logic, and only inhibiting tail classes similar to head class semantics;
s5, weight setting: setting weights according to the inverse of the effective sample number;
s6, testing a target model: and after the weight is set, a final target detection model is obtained, and the final detection model is tested on a test set to obtain a test result.
Preferably, the dataset is an LVIS dataset.
Preferably, in the step S2, the effective sample number of the class is calculated to obtain the effective sample number actually available for each data class, and the formula is used as follows:
E n =(1-β n )/(1-β),β=(N-1)/N;
wherein the sample data set is S, the number of effective samples is N, en represents the expected value of the nth sample, N is the number of samples, and when N exceeds the relative threshold, the effective number of samples is the same as the number of samples N, so as to reduce data overlap, so that each sample is unique.
Preferably, in the step S3, the image generates a feature map through a backbone network, generates an interested region on the feature map, extracts a feature for each region, and finally processes the output of the network to obtain the distribution of each category, and calculates the cross entropy between the estimated distribution and the real frame distribution;
the cross entropy is a Sigmoid cross entropy function, the Sigmoid cross entropy function is used for calculating a logic value Zi output by a full-connection layer, the Zi is a class i logic value output by a network, the probability pi of each class is independently estimated through the Sigmoid cross entropy function and mapped between (0 and 1), and the probability that the current sample k belongs to the class i is obtained, wherein the formula is as follows:
class-real labels y are classified into two classes, y when the candidate region is a background class or does not belong to class i k =0, y when the candidate region belongs to class i k =0, the cross entropy loss formula is as follows:
obtaining a gradient of a logic value Zi output to a full-connection layer according to a cross entropy loss formula, and generating a negative inhibition gradient for a cross entropy loss function to force a classifier i to output low confidence coefficient for a tail class i not equal to k;
deriving from the gradient formula of the sigmoid cross entropy loss function with respect to the logic value zi of the network output, when the class distribution of long tails is not considered, for the tail class i not belonging to the k class, generating a negative suppression gradient to force the classifier to output a low confidence level will produce a beneficial gradient suppression mechanism.
Preferably, the logic value output by the network in S4 is compared with a set threshold, when the logic value is greater than the set threshold, the current class is inhibited, and when the logic value is less than the threshold, the current class is not inhibited, so that the negative inhibition gradient of the received network becomes smaller, and the accuracy of the tail class is improved.
Preferably, the step S5 selects the category to be inhibited in the step S4, and calculates the extent of inhibition that each category should be subjected to according to the number of valid samples of the category, that is, the reciprocal of each valid sample, where a specific calculation formula is as follows:
wherein, xi is the super parameter of the threshold value, beta' is the super parameter of the effective sample, and w i Is the overall weight.
Compared with the prior art, the invention has the beneficial effects that:
in the invention, the reciprocal of the number of the effective samples of each category is used as the quantity factor among the categories, and different categories are weighted in the classification task of target detection, so that the attention degree of the network to the tail category is improved; and then, adaptively adjusting the inhibition gradient between semantic similar categories according to the logit output by the network, enhancing the distinction of tail categories, and carrying out experiments on a long tail data set LVIS with obvious difference of data quantity between the categories, thereby improving the detection precision of the tail categories.
Drawings
FIG. 1 is a schematic flow chart of a long tail target detection method based on deep learning;
FIG. 2 is a schematic diagram of a calculation flow of a long tail target detection method based on deep learning;
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1-2, the present invention provides a technical solution:
a long tail target detection method based on deep learning comprises the following steps: the method comprises the following steps:
s1, acquiring an image dataset: acquiring an image data set conforming to long tail distribution, dividing the data set, and dividing a training set and a testing set;
s2, preprocessing a data set: preprocessing the data set, and calculating the effective sample number of each category;
s3, outputting logit: training on a training set by adopting a pre-training model to obtain logit output by a training network;
s4, screening semantic similarity types: setting a threshold value of the network output logic, and only inhibiting tail classes similar to head class semantics;
s5, weight setting: setting weights according to the inverse of the effective sample number;
s6, testing a target model: and after the weight is set, a final target detection model is obtained, and the final detection model is tested on a test set to obtain a test result.
Specifically, the dataset is an LVIS dataset.
Specifically, in S2, the effective sample number of the class is calculated to obtain the effective sample number actually available for each data class, and the formula is used as follows:
E n =(1-β n )/(1-β),β=(N-1)/N;
wherein the sample data set is S, the number of effective samples is N, en represents the expected value of the nth sample, N is the number of samples, and when N exceeds the relative threshold, the effective number of samples is the same as the number of samples N, so as to reduce data overlap, so that each sample is unique.
Specifically, in S3, the image generates a feature map through a backbone network, generates an interested region on the feature map, extracts a feature for each region, and finally processes the output of the network to obtain the distribution of each category, and calculates the cross entropy between the estimated distribution and the real frame distribution;
the cross entropy is a Sigmoid cross entropy function, the Sigmoid cross entropy function is used for calculating a logic value Zi output by a full-connection layer, the Zi is a class i logic value output by a network, the probability pi of each class is independently estimated through the Sigmoid cross entropy function and mapped between (0 and 1), and the probability that the current sample k belongs to the class i is obtained, wherein the formula is as follows:
class-real labels y are classified into two classes, y when the candidate region is a background class or does not belong to class i k =0, y when the candidate region belongs to class i k =0, the cross entropy loss formula is as follows:
obtaining a gradient of a logic value Zi output to a full-connection layer according to a cross entropy loss formula, and generating a negative inhibition gradient for a cross entropy loss function to force a classifier i to output low confidence coefficient for a tail class i not equal to k;
deriving from the gradient formula of the sigmoid cross entropy loss function with respect to the logic value zi of the network output, when the class distribution of long tails is not considered, for the tail class i not belonging to the k class, generating a negative suppression gradient to force the classifier to output a low confidence level will produce a beneficial gradient suppression mechanism.
Specifically, the logic value output by the network in S4 is compared with a set threshold, when the logic value is greater than the set threshold, the current class is inhibited, and when the logic value is less than the threshold, the current class is not inhibited, so that the negative inhibition gradient of the received network becomes smaller, and the accuracy of the tail class is improved.
Specifically, S5 selects the class to be inhibited in S4, and calculates the extent of inhibition that each class should be subjected to according to the number of valid samples of the class, that is, the reciprocal of each valid sample, where a specific calculation formula is:
wherein, xi is the super parameter of the threshold value, beta' is the super parameter of the effective sample, and w i Is the overall weight.
According to the technical scheme, the working steps of the scheme are summarized and carded: in the invention, the reciprocal of the effective sample number of each category is used as the quantity factor among the categories, and different categories are weighted in the classification task of target detection, so that the attention degree of the network to the tail category is improved; then adaptively adjusting the suppression gradient among the semantic similar categories according to the logit output by the network, enhancing the distinction of the tail categories, carrying out experiments on a long tail data set LVIS with obvious difference in data quantity among the categories, improving the detection precision of the tail categories, simultaneously calculating the effective sample number of the categories by using a reasonable theoretical framework, restricting the weight based on an effective sample number design quantity factor, and then adaptively generating the suppression gradient according to the learning state of the network, thereby avoiding the problems caused by resampling, ensuring the training consistency from the data set, improving the discrimination of the tail categories and the semantic similar categories, enabling the model to pay more attention to the tail categories in the whole calculation process, and improving the discrimination of the head categories and the tail categories among the semantic similar categories.
None of the inventions are related to the same or are capable of being practiced in the prior art. Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.
Claims (6)
1. The long tail target detection method based on deep learning is characterized by comprising the following steps of:
s1, acquiring an image dataset: acquiring an image data set conforming to long tail distribution, dividing the data set, and dividing a training set and a testing set;
s2, preprocessing a data set: preprocessing the data set, and calculating the effective sample number of each category;
s3, outputting logit: training on a training set by adopting a pre-training model to obtain logit output by a training network;
s4, screening semantic similarity types: setting a threshold value of the network output logic, and only inhibiting tail classes similar to head class semantics;
s5, weight setting: setting weights according to the inverse of the effective sample number;
s6, testing a target model: and after the weight is set, a final target detection model is obtained, and the final detection model is tested on a test set to obtain a test result.
2. The long-tail target detection method based on deep learning as claimed in claim 1, wherein the method comprises the following steps: the dataset is an LVIS dataset.
3. The long-tail target detection method based on deep learning as claimed in claim 1, wherein the method comprises the following steps: in the step S2, the effective sample number of the category is calculated, the effective sample number of each category is obtained, and the formula is used as follows:
E n =(1-β n )/(1-β),β=(N-1)/N;
wherein the sample data set is S, the number of effective samples is N, en represents the expected value of the nth sample, N is the number of samples, and when N exceeds the relative threshold, the effective number of samples is the same as the number of samples N, so as to reduce data overlap, so that each sample is unique.
4. The long-tail target detection method based on deep learning as claimed in claim 1, wherein the method comprises the following steps: in the step S3, the image generates a feature map through a backbone network, an interested region is generated on the feature map, a feature is extracted for each region, finally, the output of the network is processed to obtain the distribution of each category, and the cross entropy between the estimated distribution and the real frame distribution is calculated;
the cross entropy is a Sigmoid cross entropy function, the Sigmoid cross entropy function is used for calculating a logic value Zi output by a full-connection layer, the Zi is a class i logic value output by a network, the probability pi of each class is independently estimated through the Sigmoid cross entropy function and mapped between (0 and 1), and the probability that the current sample k belongs to the class i is obtained, wherein the formula is as follows:
class-real labels y are classified into two classes, y when the candidate region is a background class or does not belong to class i k =0, y when the candidate region belongs to class i k =0, the cross entropy loss formula is as follows:
obtaining a gradient of a logic value Zi output to a full-connection layer according to a cross entropy loss formula, and generating a negative inhibition gradient for a cross entropy loss function to force a classifier i to output low confidence coefficient for a tail class i not equal to k;
deriving from the gradient formula of the sigmoid cross entropy loss function with respect to the logic value zi of the network output, when the long-tailed class distribution is not considered, for the tail class i+.k not belonging to the k class, a negative rejection gradient is generated to force the classifier to output a low confidence.
5. The long-tail target detection method based on deep learning as claimed in claim 1, wherein the method comprises the following steps: and S4, comparing the logic value output by the network with a set threshold value, and suppressing the current class when the logic value is larger than the set threshold value and not suppressing the current class when the logic value is smaller than the threshold value, so that the negative suppression gradient of the received network becomes smaller, and the accuracy of the tail class is improved.
6. The long-tail target detection method based on deep learning as claimed in claim 1, wherein the method comprises the following steps: and S5, selecting the category to be inhibited in S4, and calculating the inhibition degree of each category according to the effective sample number of the category, namely the reciprocal of each effective sample number, wherein the specific calculation formula is as follows:
wherein, xi is the super parameter of the threshold value, beta is the super parameter of the effective sample, and w i Is the overall weight.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211677431.2A CN116129215A (en) | 2022-12-26 | 2022-12-26 | Long-tail target detection method based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211677431.2A CN116129215A (en) | 2022-12-26 | 2022-12-26 | Long-tail target detection method based on deep learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116129215A true CN116129215A (en) | 2023-05-16 |
Family
ID=86294885
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211677431.2A Pending CN116129215A (en) | 2022-12-26 | 2022-12-26 | Long-tail target detection method based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116129215A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117253095A (en) * | 2023-11-16 | 2023-12-19 | 吉林大学 | Image classification system and method based on biased shortest distance criterion |
-
2022
- 2022-12-26 CN CN202211677431.2A patent/CN116129215A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117253095A (en) * | 2023-11-16 | 2023-12-19 | 吉林大学 | Image classification system and method based on biased shortest distance criterion |
CN117253095B (en) * | 2023-11-16 | 2024-01-30 | 吉林大学 | Image classification system and method based on biased shortest distance criterion |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109035149B (en) | License plate image motion blur removing method based on deep learning | |
CN107316061B (en) | Deep migration learning unbalanced classification integration method | |
CN108256482B (en) | Face age estimation method for distributed learning based on convolutional neural network | |
CN109086799A (en) | A kind of crop leaf disease recognition method based on improvement convolutional neural networks model AlexNet | |
CN110287777B (en) | Golden monkey body segmentation algorithm in natural scene | |
CN111652317B (en) | Super-parameter image segmentation method based on Bayes deep learning | |
Zhao et al. | Adaptive logit adjustment loss for long-tailed visual recognition | |
CN112766334B (en) | Cross-domain image classification method based on pseudo label domain adaptation | |
CN111209907B (en) | Artificial intelligent identification method for product characteristic image in complex light pollution environment | |
CN110120064A (en) | A kind of depth related objective track algorithm based on mutual reinforcing with the study of more attention mechanisms | |
CN103761726B (en) | Block adaptive image partition method based on FCM | |
CN116012337A (en) | Hot rolled strip steel surface defect detection method based on improved YOLOv4 | |
CN112381030A (en) | Satellite optical remote sensing image target detection method based on feature fusion | |
CN114004333A (en) | Oversampling method for generating countermeasure network based on multiple false classes | |
CN114998603A (en) | Underwater target detection method based on depth multi-scale feature factor fusion | |
CN116129215A (en) | Long-tail target detection method based on deep learning | |
CN114821299B (en) | Remote sensing image change detection method | |
CN110991554B (en) | Improved PCA (principal component analysis) -based deep network image classification method | |
CN110032973B (en) | Unsupervised parasite classification method and system based on artificial intelligence | |
CN110879985A (en) | Anti-noise data face recognition model training method | |
CN111144462A (en) | Unknown individual identification method and device for radar signals | |
CN116704208B (en) | Local interpretable method based on characteristic relation | |
CN114022469A (en) | Cigarette appearance defect image classification method and system | |
CN116433909A (en) | Similarity weighted multi-teacher network model-based semi-supervised image semantic segmentation method | |
CN110136098B (en) | Cable sequence detection method based on deep learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |