CN116129215A - Long-tail target detection method based on deep learning - Google Patents

Long-tail target detection method based on deep learning Download PDF

Info

Publication number
CN116129215A
CN116129215A CN202211677431.2A CN202211677431A CN116129215A CN 116129215 A CN116129215 A CN 116129215A CN 202211677431 A CN202211677431 A CN 202211677431A CN 116129215 A CN116129215 A CN 116129215A
Authority
CN
China
Prior art keywords
class
tail
network
output
long
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211677431.2A
Other languages
Chinese (zh)
Inventor
陶莹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou University
Original Assignee
Zhengzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou University filed Critical Zhengzhou University
Priority to CN202211677431.2A priority Critical patent/CN116129215A/en
Publication of CN116129215A publication Critical patent/CN116129215A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a long tail target detection method based on deep learning, which comprises the following steps: s1, acquiring an image dataset: acquiring an image data set conforming to long tail distribution, dividing the data set, and dividing a training set and a testing set; s2, preprocessing a data set: preprocessing the data set, and calculating the effective sample number of each category; s3, outputting logit: training on a training set by adopting a pre-training model to obtain logit output by a training network; s4, screening semantic similarity types: setting a threshold value of the network output logic, and only inhibiting the tail class which is semantically similar to the head class, so that the attention degree of the network to the tail class is improved; and then, adaptively adjusting the inhibition gradient between semantic similar categories according to the logit output by the network, enhancing the distinction of tail categories, and carrying out experiments on a long tail data set LVIS with obvious difference of data quantity between the categories, thereby improving the detection precision of the tail categories.

Description

Long-tail target detection method based on deep learning
Technical Field
The invention relates to the technical field of machine learning, in particular to a long tail target detection method based on deep learning.
Background
The rapid growth of large-scale data sets in the real world makes it a challenging task for deep learning-based methods to solve the problem of target detection of long-tail data distributions. Deep long tail learning is one of the most challenging problems in visual recognition, aiming at training a well-behaved deep model from a large number of images following long tail class distribution. The long tail data distribution, i.e., the few head classes, accounts for most of the data, while most tail classes are not adequately represented. Aiming at the problems that the long-tail data set is inaccurate in tail category prediction, low in detection precision, a model obtained through training lacks discrimination on head and tail categories with similar semantics, and the like, the main stream long-tail solving method reduces the suppression on the tail category to different degrees, but in the methods, categories with similar semantics in any data set are ignored. Particularly, when there is a category similar to the head category in the tail category, in order to reduce the suppression of the tail category, the negative gradient generated by the head category is ignored, and the network is hard to learn the characteristic with discrimination, so that a long tail target detection method based on deep learning is needed.
Disclosure of Invention
The invention aims to provide a long tail target detection method based on deep learning, which aims to solve the problems in the background technology.
In order to achieve the above purpose, the present invention provides the following technical solutions: a long tail target detection method based on deep learning comprises the following steps:
s1, acquiring an image dataset: acquiring an image data set conforming to long tail distribution, dividing the data set, and dividing a training set and a testing set;
s2, preprocessing a data set: preprocessing the data set, and calculating the effective sample number of each category;
s3, outputting logit: training on a training set by adopting a pre-training model to obtain logit output by a training network;
s4, screening semantic similarity types: setting a threshold value of the network output logic, and only inhibiting tail classes similar to head class semantics;
s5, weight setting: setting weights according to the inverse of the effective sample number;
s6, testing a target model: and after the weight is set, a final target detection model is obtained, and the final detection model is tested on a test set to obtain a test result.
Preferably, the dataset is an LVIS dataset.
Preferably, in the step S2, the effective sample number of the class is calculated to obtain the effective sample number actually available for each data class, and the formula is used as follows:
E n =(1-β n )/(1-β),β=(N-1)/N;
wherein the sample data set is S, the number of effective samples is N, en represents the expected value of the nth sample, N is the number of samples, and when N exceeds the relative threshold, the effective number of samples is the same as the number of samples N, so as to reduce data overlap, so that each sample is unique.
Preferably, in the step S3, the image generates a feature map through a backbone network, generates an interested region on the feature map, extracts a feature for each region, and finally processes the output of the network to obtain the distribution of each category, and calculates the cross entropy between the estimated distribution and the real frame distribution;
the cross entropy is a Sigmoid cross entropy function, the Sigmoid cross entropy function is used for calculating a logic value Zi output by a full-connection layer, the Zi is a class i logic value output by a network, the probability pi of each class is independently estimated through the Sigmoid cross entropy function and mapped between (0 and 1), and the probability that the current sample k belongs to the class i is obtained, wherein the formula is as follows:
Figure BDA0004017606120000021
class-real labels y are classified into two classes, y when the candidate region is a background class or does not belong to class i k =0, y when the candidate region belongs to class i k =0, the cross entropy loss formula is as follows:
Figure BDA0004017606120000022
Figure BDA0004017606120000031
obtaining a gradient of a logic value Zi output to a full-connection layer according to a cross entropy loss formula, and generating a negative inhibition gradient for a cross entropy loss function to force a classifier i to output low confidence coefficient for a tail class i not equal to k;
Figure BDA0004017606120000032
deriving from the gradient formula of the sigmoid cross entropy loss function with respect to the logic value zi of the network output, when the class distribution of long tails is not considered, for the tail class i not belonging to the k class, generating a negative suppression gradient to force the classifier to output a low confidence level will produce a beneficial gradient suppression mechanism.
Preferably, the logic value output by the network in S4 is compared with a set threshold, when the logic value is greater than the set threshold, the current class is inhibited, and when the logic value is less than the threshold, the current class is not inhibited, so that the negative inhibition gradient of the received network becomes smaller, and the accuracy of the tail class is improved.
Preferably, the step S5 selects the category to be inhibited in the step S4, and calculates the extent of inhibition that each category should be subjected to according to the number of valid samples of the category, that is, the reciprocal of each valid sample, where a specific calculation formula is as follows:
Figure BDA0004017606120000033
Figure BDA0004017606120000034
wherein, xi is the super parameter of the threshold value, beta' is the super parameter of the effective sample, and w i Is the overall weight.
Compared with the prior art, the invention has the beneficial effects that:
in the invention, the reciprocal of the number of the effective samples of each category is used as the quantity factor among the categories, and different categories are weighted in the classification task of target detection, so that the attention degree of the network to the tail category is improved; and then, adaptively adjusting the inhibition gradient between semantic similar categories according to the logit output by the network, enhancing the distinction of tail categories, and carrying out experiments on a long tail data set LVIS with obvious difference of data quantity between the categories, thereby improving the detection precision of the tail categories.
Drawings
FIG. 1 is a schematic flow chart of a long tail target detection method based on deep learning;
FIG. 2 is a schematic diagram of a calculation flow of a long tail target detection method based on deep learning;
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1-2, the present invention provides a technical solution:
a long tail target detection method based on deep learning comprises the following steps: the method comprises the following steps:
s1, acquiring an image dataset: acquiring an image data set conforming to long tail distribution, dividing the data set, and dividing a training set and a testing set;
s2, preprocessing a data set: preprocessing the data set, and calculating the effective sample number of each category;
s3, outputting logit: training on a training set by adopting a pre-training model to obtain logit output by a training network;
s4, screening semantic similarity types: setting a threshold value of the network output logic, and only inhibiting tail classes similar to head class semantics;
s5, weight setting: setting weights according to the inverse of the effective sample number;
s6, testing a target model: and after the weight is set, a final target detection model is obtained, and the final detection model is tested on a test set to obtain a test result.
Specifically, the dataset is an LVIS dataset.
Specifically, in S2, the effective sample number of the class is calculated to obtain the effective sample number actually available for each data class, and the formula is used as follows:
E n =(1-β n )/(1-β),β=(N-1)/N;
wherein the sample data set is S, the number of effective samples is N, en represents the expected value of the nth sample, N is the number of samples, and when N exceeds the relative threshold, the effective number of samples is the same as the number of samples N, so as to reduce data overlap, so that each sample is unique.
Specifically, in S3, the image generates a feature map through a backbone network, generates an interested region on the feature map, extracts a feature for each region, and finally processes the output of the network to obtain the distribution of each category, and calculates the cross entropy between the estimated distribution and the real frame distribution;
the cross entropy is a Sigmoid cross entropy function, the Sigmoid cross entropy function is used for calculating a logic value Zi output by a full-connection layer, the Zi is a class i logic value output by a network, the probability pi of each class is independently estimated through the Sigmoid cross entropy function and mapped between (0 and 1), and the probability that the current sample k belongs to the class i is obtained, wherein the formula is as follows:
Figure BDA0004017606120000051
class-real labels y are classified into two classes, y when the candidate region is a background class or does not belong to class i k =0, y when the candidate region belongs to class i k =0, the cross entropy loss formula is as follows:
Figure BDA0004017606120000052
Figure BDA0004017606120000053
obtaining a gradient of a logic value Zi output to a full-connection layer according to a cross entropy loss formula, and generating a negative inhibition gradient for a cross entropy loss function to force a classifier i to output low confidence coefficient for a tail class i not equal to k;
Figure BDA0004017606120000054
deriving from the gradient formula of the sigmoid cross entropy loss function with respect to the logic value zi of the network output, when the class distribution of long tails is not considered, for the tail class i not belonging to the k class, generating a negative suppression gradient to force the classifier to output a low confidence level will produce a beneficial gradient suppression mechanism.
Specifically, the logic value output by the network in S4 is compared with a set threshold, when the logic value is greater than the set threshold, the current class is inhibited, and when the logic value is less than the threshold, the current class is not inhibited, so that the negative inhibition gradient of the received network becomes smaller, and the accuracy of the tail class is improved.
Specifically, S5 selects the class to be inhibited in S4, and calculates the extent of inhibition that each class should be subjected to according to the number of valid samples of the class, that is, the reciprocal of each valid sample, where a specific calculation formula is:
Figure BDA0004017606120000061
Figure BDA0004017606120000062
wherein, xi is the super parameter of the threshold value, beta' is the super parameter of the effective sample, and w i Is the overall weight.
According to the technical scheme, the working steps of the scheme are summarized and carded: in the invention, the reciprocal of the effective sample number of each category is used as the quantity factor among the categories, and different categories are weighted in the classification task of target detection, so that the attention degree of the network to the tail category is improved; then adaptively adjusting the suppression gradient among the semantic similar categories according to the logit output by the network, enhancing the distinction of the tail categories, carrying out experiments on a long tail data set LVIS with obvious difference in data quantity among the categories, improving the detection precision of the tail categories, simultaneously calculating the effective sample number of the categories by using a reasonable theoretical framework, restricting the weight based on an effective sample number design quantity factor, and then adaptively generating the suppression gradient according to the learning state of the network, thereby avoiding the problems caused by resampling, ensuring the training consistency from the data set, improving the discrimination of the tail categories and the semantic similar categories, enabling the model to pay more attention to the tail categories in the whole calculation process, and improving the discrimination of the head categories and the tail categories among the semantic similar categories.
None of the inventions are related to the same or are capable of being practiced in the prior art. Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (6)

1. The long tail target detection method based on deep learning is characterized by comprising the following steps of:
s1, acquiring an image dataset: acquiring an image data set conforming to long tail distribution, dividing the data set, and dividing a training set and a testing set;
s2, preprocessing a data set: preprocessing the data set, and calculating the effective sample number of each category;
s3, outputting logit: training on a training set by adopting a pre-training model to obtain logit output by a training network;
s4, screening semantic similarity types: setting a threshold value of the network output logic, and only inhibiting tail classes similar to head class semantics;
s5, weight setting: setting weights according to the inverse of the effective sample number;
s6, testing a target model: and after the weight is set, a final target detection model is obtained, and the final detection model is tested on a test set to obtain a test result.
2. The long-tail target detection method based on deep learning as claimed in claim 1, wherein the method comprises the following steps: the dataset is an LVIS dataset.
3. The long-tail target detection method based on deep learning as claimed in claim 1, wherein the method comprises the following steps: in the step S2, the effective sample number of the category is calculated, the effective sample number of each category is obtained, and the formula is used as follows:
E n =(1-β n )/(1-β),β=(N-1)/N;
wherein the sample data set is S, the number of effective samples is N, en represents the expected value of the nth sample, N is the number of samples, and when N exceeds the relative threshold, the effective number of samples is the same as the number of samples N, so as to reduce data overlap, so that each sample is unique.
4. The long-tail target detection method based on deep learning as claimed in claim 1, wherein the method comprises the following steps: in the step S3, the image generates a feature map through a backbone network, an interested region is generated on the feature map, a feature is extracted for each region, finally, the output of the network is processed to obtain the distribution of each category, and the cross entropy between the estimated distribution and the real frame distribution is calculated;
the cross entropy is a Sigmoid cross entropy function, the Sigmoid cross entropy function is used for calculating a logic value Zi output by a full-connection layer, the Zi is a class i logic value output by a network, the probability pi of each class is independently estimated through the Sigmoid cross entropy function and mapped between (0 and 1), and the probability that the current sample k belongs to the class i is obtained, wherein the formula is as follows:
Figure FDA0004017606110000021
class-real labels y are classified into two classes, y when the candidate region is a background class or does not belong to class i k =0, y when the candidate region belongs to class i k =0, the cross entropy loss formula is as follows:
Figure FDA0004017606110000022
Figure FDA0004017606110000023
obtaining a gradient of a logic value Zi output to a full-connection layer according to a cross entropy loss formula, and generating a negative inhibition gradient for a cross entropy loss function to force a classifier i to output low confidence coefficient for a tail class i not equal to k;
Figure FDA0004017606110000024
deriving from the gradient formula of the sigmoid cross entropy loss function with respect to the logic value zi of the network output, when the long-tailed class distribution is not considered, for the tail class i+.k not belonging to the k class, a negative rejection gradient is generated to force the classifier to output a low confidence.
5. The long-tail target detection method based on deep learning as claimed in claim 1, wherein the method comprises the following steps: and S4, comparing the logic value output by the network with a set threshold value, and suppressing the current class when the logic value is larger than the set threshold value and not suppressing the current class when the logic value is smaller than the threshold value, so that the negative suppression gradient of the received network becomes smaller, and the accuracy of the tail class is improved.
6. The long-tail target detection method based on deep learning as claimed in claim 1, wherein the method comprises the following steps: and S5, selecting the category to be inhibited in S4, and calculating the inhibition degree of each category according to the effective sample number of the category, namely the reciprocal of each effective sample number, wherein the specific calculation formula is as follows:
Figure FDA0004017606110000031
Figure FDA0004017606110000032
wherein, xi is the super parameter of the threshold value, beta is the super parameter of the effective sample, and w i Is the overall weight.
CN202211677431.2A 2022-12-26 2022-12-26 Long-tail target detection method based on deep learning Pending CN116129215A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211677431.2A CN116129215A (en) 2022-12-26 2022-12-26 Long-tail target detection method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211677431.2A CN116129215A (en) 2022-12-26 2022-12-26 Long-tail target detection method based on deep learning

Publications (1)

Publication Number Publication Date
CN116129215A true CN116129215A (en) 2023-05-16

Family

ID=86294885

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211677431.2A Pending CN116129215A (en) 2022-12-26 2022-12-26 Long-tail target detection method based on deep learning

Country Status (1)

Country Link
CN (1) CN116129215A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117253095A (en) * 2023-11-16 2023-12-19 吉林大学 Image classification system and method based on biased shortest distance criterion

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117253095A (en) * 2023-11-16 2023-12-19 吉林大学 Image classification system and method based on biased shortest distance criterion
CN117253095B (en) * 2023-11-16 2024-01-30 吉林大学 Image classification system and method based on biased shortest distance criterion

Similar Documents

Publication Publication Date Title
CN109035149B (en) License plate image motion blur removing method based on deep learning
CN107316061B (en) Deep migration learning unbalanced classification integration method
CN108256482B (en) Face age estimation method for distributed learning based on convolutional neural network
CN109086799A (en) A kind of crop leaf disease recognition method based on improvement convolutional neural networks model AlexNet
CN110287777B (en) Golden monkey body segmentation algorithm in natural scene
CN111652317B (en) Super-parameter image segmentation method based on Bayes deep learning
Zhao et al. Adaptive logit adjustment loss for long-tailed visual recognition
CN112766334B (en) Cross-domain image classification method based on pseudo label domain adaptation
CN111209907B (en) Artificial intelligent identification method for product characteristic image in complex light pollution environment
CN110120064A (en) A kind of depth related objective track algorithm based on mutual reinforcing with the study of more attention mechanisms
CN103761726B (en) Block adaptive image partition method based on FCM
CN116012337A (en) Hot rolled strip steel surface defect detection method based on improved YOLOv4
CN112381030A (en) Satellite optical remote sensing image target detection method based on feature fusion
CN114004333A (en) Oversampling method for generating countermeasure network based on multiple false classes
CN114998603A (en) Underwater target detection method based on depth multi-scale feature factor fusion
CN116129215A (en) Long-tail target detection method based on deep learning
CN114821299B (en) Remote sensing image change detection method
CN110991554B (en) Improved PCA (principal component analysis) -based deep network image classification method
CN110032973B (en) Unsupervised parasite classification method and system based on artificial intelligence
CN110879985A (en) Anti-noise data face recognition model training method
CN111144462A (en) Unknown individual identification method and device for radar signals
CN116704208B (en) Local interpretable method based on characteristic relation
CN114022469A (en) Cigarette appearance defect image classification method and system
CN116433909A (en) Similarity weighted multi-teacher network model-based semi-supervised image semantic segmentation method
CN110136098B (en) Cable sequence detection method based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination