CN116229333A

CN116229333A - Difficulty target decoupling detection method based on difficulty level self-adaptive dynamic adjustment

Info

Publication number: CN116229333A
Application number: CN202310505377.1A
Authority: CN
Inventors: 孙自伟; 华泽玺
Original assignee: Southwest Jiaotong University
Current assignee: Southwest Jiaotong University
Priority date: 2023-05-08
Filing date: 2023-05-08
Publication date: 2023-06-06
Anticipated expiration: 2043-05-08
Also published as: CN116229333B

Abstract

The invention relates to a difficulty target decoupling detection method based on difficulty level self-adaptive dynamic adjustment, which comprises the following steps: constructing a difficult-to-easy sample decoupling convolutional neural network model, wherein the difficult-to-easy sample decoupling convolutional neural network model comprises a feature extraction main network and n+1 difficult-to-easy target decoupling detection heads; collecting a plurality of difficulty samples, and labeling the targets in all the difficulty samples with labels with difficulty grades from 0 to n to obtain a training sample set; calculating an initial prior score of a target, and weighting the loss of the difficult sample decoupling convolutional neural network model so as to train the difficult sample decoupling convolutional neural network model; calculating a posterior confidence score based on the initial prior score of the target; and dynamically adjusting the initial prior score through the posterior confidence score to obtain a normalized score, and updating the initial prior score by using the normalized score until the decoupling convolutional neural network model of the difficult sample is stable. The invention aims to reduce the omission rate and false detection rate of target identification.

Description

Difficulty target decoupling detection method based on difficulty level self-adaptive dynamic adjustment

Technical Field

The invention relates to the technical field of video target detection, in particular to a difficulty target decoupling detection method based on difficulty level self-adaptive dynamic adjustment.

Background

When identifying the target to be detected in the video, the data needs to be marked manually, and when the target which is difficult to identify or the data which is poor is encountered, the data is generally discarded directly so as to prevent the problem that the training is not converged due to oscillation caused by the data in the model training process.

The traditional problem related to difficult sample detection refers to that a classifier of the neural network is difficult to distinguish whether the sample is a target or a background, and the classifier is easy to distinguish manually, so that the neural network can pay more attention to the samples when training a model, and the samples can be correctly classified.

However, for samples which are difficult to distinguish manually, such as targets which are easy to identify and difficult to identify exist in the video (such as when a bird in the video is relatively close to the video and flies slowly, the targets are relatively clear and easy to identify, and when the bird is relatively far from the video, or the background is complex, or flies relatively fast, the targets are difficult to identify), if no historical prior information exists, the targets cannot be marked correctly. If targets which are not easy to identify are abandoned and do not participate in model training, missed detection is caused during model deployment; if targets which are not easy to identify participate in model training and are forcedly marked as positive samples, the model training is unstable, the model training is caused to oscillate, and false detection occurs during reasoning.

Disclosure of Invention

The invention aims to provide a difficult target decoupling detection method based on adaptive dynamic adjustment of difficult grades, which is used for avoiding the influence of difficult samples on model training and reducing the omission rate and false detection rate of target identification on the premise that the difficult samples are not discarded.

In order to achieve the above object, the embodiment of the present invention provides the following technical solutions:

a difficulty target decoupling detection method based on difficulty level self-adaptive dynamic adjustment comprises the following steps:

step 1, constructing a difficult and easy sample decoupling convolutional neural network model, wherein the difficult and easy sample decoupling convolutional neural network model comprises a feature extraction main network and n+1 difficult and easy target decoupling detection heads;

step 2, collecting a plurality of difficult samples, and labeling the targets in all the difficult samples with labels with difficulty grades from 0 to n to obtain a training sample set;

step 3, calculating an initial prior score of the target, and weighting the loss of the difficult sample decoupling convolutional neural network model by using the initial prior score so as to train the difficult sample decoupling convolutional neural network model;

step 4, calculating posterior confidence score based on the initial prior score of the target; and (3) dynamically adjusting the initial prior score through the posterior confidence score to obtain a normalized score, updating the initial prior score by using the normalized score, and repeating the step (3) until the decoupling convolutional neural network model of the difficult sample reaches stability.

Compared with the prior art, the invention has the beneficial effects that:

according to the invention, for the targets which are difficult to identify, the processing path is increased, and the detection rate of the targets which are difficult to identify is improved; weighting the model training process by using the difficulty degree of the identification target, preventing the target which is difficult to identify from vibrating in the model training process, and reducing the false detection rate; meanwhile, the model is used for training the prediction score, the difficulty degree of target identification is subjected to self-adaptive dynamic adjustment, and the subjectivity of manual difficulty degree marking is eliminated.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a model structure of a convolutional neural network with decoupling of difficult and easy samples;

FIG. 2 is a schematic diagram of a structure of a decoupling detection head for a difficult target according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of weighting the loss of a hard-easy sample decoupled convolutional neural network model according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of dynamic adjustment of an initial prior score using posterior confidence scores in an embodiment of the present invention;

fig. 5 is a schematic diagram illustrating the partitioning of initial prior scores according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by a person skilled in the art without making any inventive effort, are intended to be within the scope of the present invention.

It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures. Also, in the description of the present invention, the terms "first," "second," and the like are used merely to distinguish one from another, and are not to be construed as indicating or implying a relative importance or implying any actual such relationship or order between such entities or operations. In addition, the terms "connected," "coupled," and the like may be used to denote a direct connection between elements, or an indirect connection via other elements.

Examples:

the invention discloses a difficult target decoupling detection method based on difficult grade self-adaptive dynamic adjustment, which comprises the following steps:

step 1, a difficult and easy sample decoupling convolutional neural network model is built, wherein the difficult and easy sample decoupling convolutional neural network model comprises a feature extraction main network and n+1 difficult and easy target decoupling detection heads.

Referring to fig. 1, the feature extraction backbone network is the first subsection of a hard and easy sample decoupling convolutional neural network model, and adopts a network structure such as a DarkNet 53. The second part is formed by sequentially connecting n+1 difficult target decoupling detection heads in series, wherein the 0 th difficult target decoupling detection head has the lowest difficulty level and is connected with a feature extraction backbone network; the difficulty level corresponding to the nth difficult target decoupling detection head is highest. The difficult target decoupling detection head with high difficulty level is arranged behind the target decoupling detection head, the processing path is longer, and the target with higher difficulty level is predicted.

The structure of each difficult and easy target decoupling detection head is the same, the internal network design of the difficult and easy target decoupling detection head adopts a U-shaped network, the U-shaped network comprises z downsampled convolution feature layers and z-1 upsampled convolution feature layers, the upsampled convolution feature layers and the downsampled convolution feature layers with the same dimension are spliced in the channel dimension so as to fully fuse shallow layer positioning information and high layer semantic information, and each difficult and easy target decoupling detection head predicts the confidence level, the category and the position of a target.

In this embodiment, please refer to fig. 2, which includes 4 downsampled convolution feature layers and 3 upsampled convolution feature layers, which are respectively a first convolution feature layer, a second convolution feature layer, a third convolution feature layer, a fourth convolution feature layer, a fifth convolution feature layer, a sixth convolution feature layer, and a seventh convolution feature layer, wherein the first convolution feature layer to the fourth convolution feature layer are downsampled in sequence, the fifth convolution feature layer to the seventh convolution feature layer are upsampled in sequence, and the third convolution feature layer and the fifth convolution feature layer with the same scale are spliced in the channel dimension, and otherwise the same is not repeated.

And 2, collecting a plurality of difficulty samples, and labeling the targets in all the difficulty samples with labels with difficulty grades from 0 to n to obtain a training sample set.

And (3) collecting a plurality of difficulty samples, wherein one difficulty sample possibly comprises 1 or more targets, labeling the labels of categories and positions of the targets in all the difficulty samples, and labeling the difficulty levels of 0-n for each target according to manual experience on the basis, wherein the difficulty level of 0 represents the easiest recognition, and the difficulty level of n represents the least easy recognition. Although the artificial experience is marked with subjective influence, the influence can be eliminated to a certain extent by the follow-up self-adaptive dynamic adjustment of the difficult sample.

Assuming that n=3, there are 4 difficult targets decoupling detection heads, and 4 difficulty levels are also designed, namely, a target with a difficulty level of 0 (a simple and easily identifiable target), a target with a difficulty level of 1 (a general target), a target with a difficulty level of 2 (a predicted target which is slightly difficult to identify), and a target with a difficulty level of 3 (a difficult-to-identify target); it is easy to understand that targets in the same difficulty level are also easily distinguished.

And step 3, calculating an initial prior score of the target, and weighting the loss of the difficult sample decoupling convolutional neural network model by using the initial prior score so as to train the difficult sample decoupling convolutional neural network model.

According to the classified difficulty levels, calculating initial prior scores corresponding to the difficulty levels, wherein the higher the difficulty level is, the lower the corresponding initial prior score is, and vice versa, but the initial prior scores are all between (0 and 1). The initial prior score is used for weighting the loss function during the adaptive dynamic adjustment of the subsequent difficult sample.

First, a range of initial prior scores of different difficulty levels is set, for example, a target difficulty level of which the initial prior score is between (0.75, 1) is 0, a target difficulty level of which the initial prior score is between (0.50, 0.75) is 1, a target difficulty level of which the initial prior score is between (0.25, 0.50) is 2, and a target difficulty level of which the initial prior score is between (0, 0.25) is 3.

Then calculating initial prior scores under different difficulty levels according to the formula, wherein the formula is

Where j represents the jth target in a difficulty sample, the difficulty sample has m targets in total, and the difficulty level of the target j is i (i=0, 1,2, 3), S _j Is an initial a priori score of target j. The calculated difficulty level isThe initial prior score of 0 is 0.875, the initial prior score of difficulty level 1 is 0.625, the initial prior score of difficulty level 2 is 0.375, and the initial prior score of difficulty level 3 is 0.175.

Randomly taking out a sample picture x (i.e. a difficult sample) and a corresponding label gt from a training sample set _j (Label gt) _j The method comprises the steps of classifying the targets j, locating the targets j and the difficulty levels), distributing the targets j to corresponding difficult target decoupling detection heads according to the difficulty levels of the targets in the sample picture x, and distributing the targets j to the difficult target decoupling detection heads 1 if the difficulty level of the targets j is 1.

And finally outputting a predicted target out= HardEasyDeCoupleModel (x) of the sample picture x by the difficult sample decoupling convolutional neural network model, wherein the HardEasyDecoupleModel represents the difficult sample decoupling convolutional neural network model, and out contains all the predicted targets in the sample picture x.

Referring to fig. 3, calculating a target loss L between the output of the difficult and easy target decoupling detection head i and the tag in the difficult and easy sample decoupling convolutional neural network model by using a confidence loss function, a category classification loss function and a positioning regression loss function _i,j ：

Wherein L is _i,j Representing a target loss between the output of the target j assigned to the difficulty target decoupling detection head i and the tag; l (L) _conf Representing a confidence loss function, L _cls Representing class classification loss function, L _loc Representing a locating regression loss function; out of _conf Output corresponding to the confidence loss function, out _cls Output corresponding to the class classification loss function, out _loc Representing the output corresponding to the positioning regression loss function;

the label representing the target j in the confidence loss function,

tag representing object j in class classification loss function, < ->

Representing the labels that locate target j in the regression loss function.

Weighting all target losses distributed to the difficult target decoupling detection head i by using the initial prior score to obtain positive sample losses:

（1）

wherein L is _i,pos Representing positive sample loss of the difficult target decoupling detection head i, wherein m 'represents the target number distributed to the difficult target decoupling detection head i, and m' is less than or equal to m;

calculating the total loss L of all difficult target decoupling detection heads:

（2）

wherein L is _i,neg Negative sample loss of the detection head i is difficult to be decoupled from the target; count () represents the total number, i.e., number m.

The total loss L is back propagated to update the weights of the difficulty sample decoupling convolutional neural network model. And randomly taking a plurality of sample pictures x from the training sample set, repeating the training on the difficult sample decoupling convolutional neural network model, if N sample pictures are taken in total, obtaining N total losses L, and adjusting the weight of the difficult sample decoupling convolutional neural network model through a gradient descent optimization algorithm. After a certain step length or loss is trained to be converged to an expected value, fixing weights to obtain the weights of the well-trained difficult and easy sample decoupling convolutional neural network model.

Assuming that N sample pictures are shared in the training sample set, sharing M targets, j epsilon M, and counting the ratio M of the target numbers corresponding to the difficulty level in the labels of the training sample set ₀ :M ₁ :M ₂ :M ₃ Ensuring that the target number of each difficulty level is unchanged, wherein M ₀ Represents the target number of difficulty level 0, M ₀ +M ₁ +M ₂ +M ₃ =M。

Referring to fig. 4, N sample pictures are input into a trained difficulty sample decoupling convolutional neural network model, and the output is decoded to obtain a prediction score F of each target _j . Combining an initial prior score S of a target j _j Calculating posterior confidence scores

Wherein->

。

After running all sample pictures, sorting all targets from high to low according to the posterior confidence scores of the targets, and according to the ratio M of the target numbers of the difficulty grades in the training sample set ₀ :M ₁ :M ₂ :M ₃ And reclassifying the difficulty level of all targets. Ordered in front

The difficulty level of the target division of (2) is 0, < th->

To->

The difficulty level of the target division of (1)/(1)>

To the point of

The difficulty level of the target division of (2) < th->

To the point of

The difficulty level of the target division is 3; count (M) represents the target total number M.

Ranking M targets from high to low according to a posterior confidence score, assuming ranking to the first

To->

Is aimed at [ P ] _t ,...,P _j ,...,P _k ]Wherein P is _t Representation ordering to->

Posterior confidence score, P, of target t of bit _k Representation ordering to the th

Posterior confidence score, P, of target k of bits _j Representation ordering to the th

To->

Posterior confidence score of any target j in (j= [ t, k)]). According to the initial prior score, the initial prior score of the object with the difficulty level of 2 is in the range of (0.25, 0.50]But after recalculation, a posterior confidence score [ P ] _t ,...,P _j ,...,P _k ]Is not necessarily at (0.25, 0.50]Within the scope, the present approach therefore works by scoring the posterior confidence score [ P ] _t ,...,P _j ,...,P _k ]And the classification is within the classification score of the difficulty level 2, so that the self-adaptive dynamic adjustment is realized.

In detail, referring to fig. 5, assume that the initial a priori score of difficulty level 2 is maximum at S _i2 Minimum value is S _i1 （i=2），[P _t ,...,P _j ,...,P _k ]Confidence score of (a)The number is not necessarily in (0.25, 0.50]The following four cases are within the scope:

1. when P _t <S _i2 And P is _k <S _i1 When in use;

setting P _max =S _i2 ，P _min =P _k ，S _max =S _i2 ，S _min =S _i1 The posterior confidence score P for the target j can be calculated using equation (3) _j And normalized score S _j Is a function of:

（3）

from equation (3), each P is selected _j Obtaining a corresponding S _j . Then the obtained S _j S brought into formula (1) _j The loss weighting for the difficulty sample decoupled convolutional neural network model may be updated again. Through multiple pairs of normalized scores S _j The adaptive dynamic adjustment of the model can train the decoupling convolutional neural network model of the difficult and easy sample more stably.

2. When S is _i1 ≤P _t ≤S _i2 And S is _i1 ≤P _k ≤S _i2 When in use;

setting P _max =P _t ，P _min =P _k ，S _max =P _t ，S _min =P _k Each P is selected using equation (3) _j Obtaining a corresponding S _j The loss weighting of the difficulty sample decoupled convolutional neural network model is updated in the same way.

3. When S is _i2 <P _t And P is _k <S _i1 When in use;

setting P _max =P _t ，P _min =P _k ，S _max =S _i2 ，S _min =S _i1 Each P is selected using equation (3) _j Obtaining a corresponding S _j Updating the model of the convolution neural network for decoupling the difficult samples in the same wayIs added to the loss weighting of (2).

4. When S is _i2 <P _t And S is _i1 <P _k When in use;

setting P _max =P _t ，P _min =P _k ，S _max =S _i2 ，S _min =P _k Each P is selected using equation (3) _j Obtaining a corresponding S _j The loss weighting of the difficulty sample decoupled convolutional neural network model is updated in the same way.

To sum up, a normalized score S' is obtained _j And (3) repeating the training process in the step until the training of the difficult sample decoupling convolutional neural network model is stable, and accurately identifying the target of the real-time sample by using the difficult sample decoupling convolutional neural network model.

The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. The difficulty target decoupling detection method based on the difficulty level self-adaptive dynamic adjustment is characterized by comprising the following steps of: the method comprises the following steps:

2. The difficulty target decoupling detection method based on the difficulty level self-adaptive dynamic adjustment according to claim 1, wherein the method is characterized in that: n+1 difficult and easy target decoupling detection heads are sequentially connected in series, the difficulty level corresponding to the front difficult and easy target decoupling detection head is the lowest, and the difficulty level corresponding to the rear difficult and easy target decoupling detection head is the highest.

3. The difficulty target decoupling detection method based on the difficulty level self-adaptive dynamic adjustment according to claim 1, wherein the method is characterized in that: each difficult and easy target decoupling detection head comprises z downsampled convolution feature layers and z-1 upsampled convolution feature layers, and the upsampled convolution feature layers and the downsampled convolution feature layers with the same scale are spliced in the channel dimension.

4. The difficulty target decoupling detection method based on the difficulty level self-adaptive dynamic adjustment according to claim 1, wherein the method is characterized in that: in the step 2, the target with the difficulty level of 0 is the easiest to identify, and the target with the difficulty level of n is the least easy to identify.

5. The difficulty target decoupling detection method based on the difficulty level self-adaptive dynamic adjustment according to claim 1, wherein the method is characterized in that: in the step 3, an initial prior score of the target is calculated, and the loss of the hard and easy sample decoupling convolutional neural network model is weighted by using the initial prior score, so that the step of training the hard and easy sample decoupling convolutional neural network model comprises the following steps:

calculating an initial prior score for the target:

wherein j represents the j-th target in one difficulty sample, and m targets are in total in the difficulty sample; the difficulty rating of target j is i, i=0, 1,.. _j An initial prior score for target j;

randomly taking out a sample picture x and a corresponding label gt from a training sample set _j According to the difficulty level of each target in the sample picture x, distributing the targets to corresponding difficult target decoupling detection heads;

calculating target loss L between the output of a difficult and easy target decoupling detection head i and a label in the difficult and easy sample decoupling convolution neural network model by using a confidence loss function, a category classification loss function and a positioning regression loss function _i,j ：

tag representing object j in confidence loss function,/">

Tag representing object j in class classification loss function, < ->

A tag representing a target j in the location regression loss function;

（1）

（2）

wherein L is _i,neg Negative sample loss of the detection head i is difficult to be decoupled from the target; count () represents the total number, i.e., number m;

and back-propagating the total loss L to update the weight of the difficult and easy sample decoupling convolutional neural network model, and fixing the weight to obtain the weight of the trained difficult and easy sample decoupling convolutional neural network model after training a certain step length or converging the loss to an expected value.

6. The difficulty target decoupling detection method based on the difficulty level self-adaptive dynamic adjustment according to claim 5, wherein the method is characterized in that: in the step 4, the step of calculating the posterior confidence score based on the initial prior score of the target includes:

n difficult samples in the training sample set, M targets in total, j E M, and the ratio M of the target numbers corresponding to the difficulty level in the labels of the training sample set ₀ :...:M _i :...:M _n Wherein M is _i Represents the target number of difficulty level i, M ₀ +...+M _i +...+M _n =M；

Inputting N difficult and easy samples into a trained difficult and easy sample decoupling convolutional neural network model, and decoding the output to obtain the prediction score F of each target _j The method comprises the steps of carrying out a first treatment on the surface of the Combining an initial prior of target jScore S _j Calculating posterior confidence scores:

wherein the method comprises the steps of

。

7. The difficulty target decoupling detection method based on the difficulty level self-adaptive dynamic adjustment according to claim 6, wherein the method is characterized in that: in the step 4, the initial prior score is dynamically adjusted through the posterior confidence score to obtain a normalized score, the normalized score is used for updating the initial prior score, the step 3 is repeated until the decoupling convolutional neural network model of the difficult sample reaches stability, and the method comprises the following steps:

after running all the difficult and easy samples, sorting all the targets from high to low according to the posterior confidence scores of the targets, and according to the ratio M of the target numbers of the difficulty grades in the training sample set ₀ :...:M _i :...:M _n Reclassifying the difficulty level of all targets; ordered in front

The difficulty level of the target division of (2) is 0, < th->

To the point of

The difficulty level of the target division of (1)/(1)>

To the point of

The difficulty level of the target division of (2) and so on, the>

To->

The difficulty level of the target division is n; count (M) represents the target total number M;

posterior confidence score [ P ] of object classified into difficulty level i _t ,...,P _j ,...,P _k ]And the degree of difficulty is classified into a classification score of the difficulty grade i.

8. The difficulty target decoupling detection method based on the difficulty level self-adaptive dynamic adjustment according to claim 7, wherein the method is characterized in that: the posterior confidence score P of the object to be classified into the difficulty level i _t ,...,P _j ,...,P _k ]A step of categorizing into a categorization score of a difficulty level i, comprising:

let the initial prior score maximum of the difficulty level i be S _i2 Minimum value is S _i1 ；

When P _t <S _i2 And P is _k <S _i1 At the time, P is set _max =S _i2 ，P _min =P _k ，S _max =S _i2 ，S _min =S _i1 Calculating by using a formula (3) to obtain a normalized score S _j ：

（3）

When S is _i1 ≤P _t ≤S _i2 And S is _i1 ≤P _k ≤S _i2 At the time, P is set _max =P _t ，P _min =P _k ，S _max =P _t ，S _min =P _k Obtaining a normalized score S' by using a formula (3) _j ；

When S is _i2 <P _t And P is _k <S _i1 At the time, P is set _max =P _t ，P _min =P _k ，S _max =S _i2 ，S _min =S _i1 Obtaining a normalized score S' by using a formula (3) _j ；

When S is _i2 <P _t And S is _i1 <P _k At the time, P is set _max =P _t ，P _min =P _k ，S _max =S _i2 ，S _min =P _k Obtaining a normalized score S' by using a formula (3) _j ；

The obtained normalized score S _j Updating S of equation (1) _j And updating the loss weighting of the uncoupling convolutional neural network model of the difficulty sample.