CN117437465A

CN117437465A - Improved soft-NMS target detection method based on unbalanced data

Info

Publication number: CN117437465A
Application number: CN202311375631.7A
Authority: CN
Inventors: 林小泸; 陈建明; 杨建国; 张小东; 彭金龙
Original assignee: CHANGXUN COMMUNICATION SERVICE CO LTD
Current assignee: CHANGXUN COMMUNICATION SERVICE CO LTD
Priority date: 2023-10-23
Filing date: 2023-10-23
Publication date: 2024-01-23
Anticipated expiration: 2043-10-23
Also published as: CN117437465B

Abstract

The invention discloses an improved soft-NMS target detection method based on unbalanced data, which comprises the following steps: step one, acquiring unbalanced training data of target detection, and calculating a balance coefficient of each type of target; training a target detection model based on a Faster RCNN; step three, inputting unbalanced target detection data to be detected into a Faster RCNN target detection model to obtain a plurality of target candidate frames; step four, grouping all target candidate frames according to different category labels, and sorting all target candidate frames in each category according to confidence; step five, for each category, adopting an improved soft-NMS method based on unbalanced data to update the confidence coefficient of a target detection frame in the category and screen the detection frame; and step six, finally, screening target detection frames of all types of frames to be detected to obtain target detection results. The invention redefines the updated formula of the confidence coefficient of the target detection frame, and effectively reduces the omission ratio of the tail class targets.

Description

Improved soft-NMS target detection method based on unbalanced data

Technical Field

The invention relates to the technical field of target detection of image processing, in particular to an improved soft-NMS target detection method based on unbalanced data.

Background

Object detection is an important research direction in the field of computer vision, and the object is to determine whether a specific object exists in an image or video and determine its category and position. In practical systems, unbalanced distribution data is widely used in various application fields of target detection, such as security monitoring, automatic driving, remote sensing, image segmentation, target tracking, and the like. An unbalanced distribution is a particular problem of unbalance in a large data background, meaning that most classes in a data set occupy a large number of samples, while few classes have only a small number of samples.

The common target detection model is based on balanced sample types in the training data set, so that most samples are trained excessively in the training process, and few samples are not trained sufficiently, so that the target detection model trained by unbalanced data is often biased to most samples with a large number of training samples, and the target detection model has poor performance on few samples with limited sample size, thereby reducing the overall performance of the target detection model, and particularly has poorer detection precision on small-size targets of few types. In practical application, even if a higher detection rate can be obtained on the whole, false detection of a few types of examples may cause serious consequences or high cost, such as false industrial process fault diagnosis, no hacker intrusion detection, remote sensing detection omission detection or false judgment of obstacles in automatic driving, etc., all belong to the false detection of few types, and great harm is caused to industrial production and network security.

The non-maximum suppression NMS algorithm and soft-NMS algorithm are technical methods for selecting detection frames frequently used in target detection algorithms, and when unbalanced data characteristics are not considered, the technical methods can cause missed detection and false detection of target detection due to excessive suppression of few target detection frames, so that the detection accuracy is reduced. Therefore, it is important to study non-maximum suppression NMS-related algorithms under unbalanced data.

Disclosure of Invention

(one) solving the technical problems

Aiming at the defects of the prior art, the invention provides an improved soft-NMS target detection method based on unbalanced data, which considers the long-tail data phenomenon widely existing in the actual application of target detection and redefines an updating formula of the confidence coefficient of a target detection frame, thereby protecting the tail type target detection frame from being excessively restrained and effectively reducing the omission ratio of the tail type target.

(II) technical scheme

In order to achieve the above purpose, the present invention provides the following technical solutions: an improved soft-NMS target detection method based on imbalance data, comprising the steps of:

step one, acquiring unbalanced training data of target detection, and calculating a balance coefficient of each type of target;

training a target detection model based on a Faster RCNN;

step three, inputting unbalanced target detection data to be detected into a Faster RCNN target detection model to obtain a plurality of target candidate frames;

step four, grouping all target candidate frames according to different category labels, and sorting all target candidate frames in each category according to confidence;

step five, for each category, adopting an improved soft-NMS method based on unbalanced data to update the confidence coefficient of a target detection frame in the category and screen the detection frame;

and step six, finally, screening target detection frames of all types of frames to be detected to obtain target detection results.

Preferably, in the first step, calculating the balance coefficient of each type of object includes the following operations:

according to the characteristics of unbalanced data, calculating a balance coefficient k (i) of each type of target, wherein the formula is as follows;

where i=1, 2, …, C is the number of target categories in the imbalance data; n (i) is the total number of class i targets.

Preferably, the training of the target detection model based on the fast RCNN in the second step includes the following operations:

s1, inputting an unbalanced data training set picture into a Regnet backbone network, and extracting feature graphs C2, C3, C4 and C5 of 4 different stages;

s2, inputting the characteristic graphs C2, C3, C4 and C5 of the 4 different stages into an FPN characteristic pyramid for fusion to obtain new characteristic graphs P2, P3, P4 and P5 with 4 different resolutions;

s3, extracting suggestion frames of the feature graphs P2, P3, P4 and P5 through an RPN (remote procedure network) area suggestion network;

s4, inputting the feature graphs P2, P3, P4 and P5 and the suggestion boxes into the RoI align for pooling synthesis, and then classifying and regressing through a fully connected network to obtain a detection result and corresponding loss;

and S5, repeating the steps from S1 to S4 after the training of one round is completed until the set iteration round is reached, and outputting and storing parameters in a network to obtain a target detection model.

Preferably, in the fifth step, the improved soft-NMS method based on unbalance data comprises the following operations:

(a) Selecting a target detection frame with highest confidence in each type of targets, marking as M, and adding the M into the set D;

(b) Calculating the cross ratio between the rest target detection frames and M, and processing the confidence s of the target detection frames according to the following formula _j The threshold is set in a segmented threshold mode, and the formula is as follows:

where j=1, 2, …, B (i) -1, represents the j-th detection frame of the i-th detection target, and B (i) is the number of detection frames of the i-th detection target; u (U) _Iou (M,b _j ) Is the intersection ratio of the detection frame M and the jth detection frame; k (i) is the balance coefficient of the unbalanced data;and alpha < beta;

(c) And (c) if all the types of target detection frames obtained in the step (b) are empty, finishing the screening of the detection frames.

(III) beneficial effects

Compared with the prior art, the invention provides an improved soft-NMS target detection method based on unbalanced data, which has the following beneficial effects:

1. the invention provides an improved method for target detection aiming at unbalanced data characteristics. Compared with the prior art, the invention considers the phenomenon of unbalanced data widely existing in the practical application of target detection, the unbalanced data of most types of targets and few types of targets causes the reduction of the accuracy of the class identification of the target detection, and the omission ratio of the tail class target frame in the non-maximum value inhibition algorithm is high. The improved method provided by the invention can effectively reduce the omission ratio of tail class targets;

2. the invention provides an improved non-maximum suppression algorithm, wherein the algorithm introduces the balance coefficient of unbalanced data, redefines an updating formula of the confidence coefficient of a target detection frame, sets a threshold value in a sectional threshold value mode, reduces the influence of the unbalance of the data on the performance of the algorithm, and improves the application flexibility and the performance of the non-maximum suppression algorithm;

3. the improved soft-NMS target detection method based on unbalanced data has universality, can be popularized to the target detection field of other computer vision, and has better popularization.

Drawings

Fig. 1 is a training flow chart of the method of the present invention.

Detailed Description

For a better understanding of the objects, structures and functions of the present invention, the improved soft-NMS object detection method based on unbalance data of the present invention will be described in further detail with reference to the specific embodiments and the accompanying drawings.

Experiments the balanced CIFAR10, CIFAR100 and ImageNet2012 datasets may be downsampled using exponential decay to generate an imbalance dataset for the experiment, referred to as CIFAR10-LT, CIFAR100-LT and ImageNet-LT. CIFAR10-LT and CIFAR100-LT generate three different types of training sets according to imbalance rates {10,50,100 }. The imbalance ratio of ImageNet-LT was 256. The category of the maximum sample includes 1280 pictures, and only contains 5 pictures at least. The validation sets for all data sets are balanced.

As shown in fig. 1, the improved soft-NMS target detection method based on unbalanced data provided by the present invention comprises the following steps:

training a target detection model based on a Faster RCNN;

Further, in step one, calculating the balance coefficient of each type of object includes the following operations:

Further, training the target detection model based on the Faster RCNN in the second step comprises the following operations:

Specifically, in the second step, a fast RCNN model is used to train a target detection model, which includes the following operations:

defining a bottleneck layer, wherein one branch sequentially passes through a group convolution layer with the convolution kernel size of 1x1, a regularized BN layer, a group convolution layer with the convolution kernel size of 3x3, a regularized BN layer, a convolution layer with the convolution kernel size of 1x1, a regularized BN layer and a relu activation function, the other branch is an identity mapping, and the outputs of the two branches are added to obtain the output of the bottleneck; the pictures in the training set are normalized and input into a regnet backbone network, the pictures are sequentially subjected to convolution layers with the convolution kernel size of 3x3, the step length of 2, the filling of 1 and the output channel of 48, a regularized BN layer, an activation function relu, a characteristic diagram C2 is obtained through 2 bottleck with the output channel of 96, a characteristic diagram C3 is obtained through 6 bottleck with the output channel of 192, a characteristic diagram C4 is obtained through 15 bottleck with the output channel of 432, and a characteristic diagram C5 is obtained through 2 bottleck with the output channel of 1008.

C5 is subjected to a convolution layer with a convolution kernel size of 1x1 and an output channel of 256 to obtain a feature map M5, and then is subjected to a convolution layer with a convolution kernel size of 3x3 to output as a feature map P5;

c4 is added with a characteristic diagram obtained by up-sampling of M5 with a sampling multiplying power of 2 through a convolution layer with a convolution kernel size of 1x1 and an output channel of 256 to obtain a characteristic diagram M4, and then the characteristic diagram P4 is output through a convolution layer with a convolution kernel size of 3x 3;

c3 is added with a characteristic diagram obtained by up-sampling M4 with a sampling multiplying power of 2 through a convolution layer with a convolution kernel size of 1x1 and an output channel of 256 to obtain a characteristic diagram M3, and then the characteristic diagram P3 is output through the convolution layer with a convolution kernel size of 3x 3;

c2 is added with a characteristic map obtained by up-sampling M3 with a sampling multiplying power of 2 through a convolution layer output channel with a convolution kernel size of 1x1 and an output channel of 256 to obtain a characteristic map M2, and then the characteristic map P2 is output through a convolution layer with a convolution kernel size of 3x 3.

Determining the proportion and the size of the anchors, and extracting suggestion boxes from the feature maps P2, P3, P4 and P5.

The RPN firstly carries out 3X3 convolution on the feature map obtained by the feature extraction network to fuse the feature map information; then, according to anchors; setting two groups of parallel 1X1 convolutions, and respectively classifying and regressing the feature images; the classification layer carries out two classifications on each anchor box, judges whether the anchor box belongs to the foreground or the background, and outputs a result for each anchor on the feature map; the regression layer predicts the offset between each anchor box and the real labeling frame, and comprises the offsets deltax and deltay of the center coordinates and the offsets deltaw and deltah of the width and the height, so that the regression layer outputs a result for each anchor on the feature map; and finally, integrating the results of the two branches, selecting an anchor box with a classification layer prediction result as a foreground, and adjusting the central point and the length and the width of the anchor box by using the offset obtained by the regression layer calculation.

Inputting the suggestion boxes output by each feature map and the RPN area suggestion network into the RoI align for pooling synthesis to obtain suggestion boxes of each feature map with the size of 7x7, merging and inputting the suggestion boxes into the full-connection network to obtain 6-channel and 24-channel outputs, representing classification and regression results, and calculating target detection loss.

After the training of one round is completed, repeating the steps until the set iteration round is reached, outputting and storing parameters in the network, and obtaining the FaterRCNN target detection model.

In step five, an improved soft-NMS method based on imbalance data, comprising the operations of:

According to the improved soft-NMS target detection method based on unbalanced data, which is provided by the invention, the phenomenon of long-tail data widely existing in the actual application of target detection is considered, the balance coefficient of the unbalanced data is introduced, the updating formula of the confidence coefficient of the target detection frame is redefined, the threshold is set in a sectional threshold mode, the influence of the unbalance of the data on the algorithm performance is reduced, the target detection frame of the tail type is protected from being excessively restrained, and the problem of target omission of the tail type is reduced.

It will be understood that the invention has been described in terms of several embodiments, and that various changes and equivalents may be made to these features and embodiments by those skilled in the art without departing from the spirit and scope of the invention. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the invention without departing from the essential scope thereof. Therefore, it is intended that the invention not be limited to the particular embodiment disclosed, but that the invention will include all embodiments falling within the scope of the appended claims.

Claims

1. An improved soft-NMS target detection method based on imbalance data, comprising the steps of:

training a target detection model based on a Faster RCNN;

2. The improved soft-NMS object detection method based on unbalanced data of claim 1, wherein in the first step, the balance coefficient of each class of objects is calculated, comprising the following operations:

3. The improved soft-NMS object detection method based on unbalanced data of claim 1, wherein training the object detection model based on Faster RCNN in the second step comprises the following operations:

4. The improved soft-NMS object detection method based on unbalanced data according to claim 1, wherein in the fifth step, the improved soft-NMS method based on unbalanced data comprises the following operations: