CN112561801A

CN112561801A - Target detection model training method based on SE-FPN, target detection method and device

Info

Publication number: CN112561801A
Application number: CN202011560657.5A
Authority: CN
Inventors: 谷晓琳; 杨敏; 张燚; 刘科
Original assignee: Beijing Sunwise Space Technology Ltd
Current assignee: Beijing Sunwise Space Technology Ltd
Priority date: 2020-12-25
Filing date: 2020-12-25
Publication date: 2021-03-26

Abstract

The SE-FPN-based target detection model training method, the target detection method and the device comprise the following steps: zooming a plurality of training pictures according to different zooming coefficients, and splicing the training pictures into a new picture, wherein the new picture comprises a plurality of targets with different sizes; distributing the targets with different sizes to different pyramid feature layers of the SE-FPN target detection model according to a preset distribution strategy; in each pyramid feature layer, finding the nearest center point according to the truth value of the training sample of the layermA position, calculatemAll anchors and truth values for a positionDIoUD _gCalculatingD _gMean valuem _gStandard deviation ofv _gTo obtain a threshold valuet _gIs selected to be greater thant _gAnd the center position and the anchor output in the target frame; and calculating a classification loss function and a position regression function, and training the model through a back propagation algorithm. And constructing a target detection network model based on SE-FPN, improving an image preprocessing mode and a sample selection strategy, training the model, applying the model to target detection and improving the target detection efficiency.

Description

Target detection model training method based on SE-FPN, target detection method and device

Technical Field

The invention relates to the field of computer vision, in particular to a target detection model training method based on SE-FPN, a target detection method and a target detection device.

Background

Target detection is a fundamental research topic in the field of computer vision, and is widely applied to the fields of unmanned driving, intelligent monitoring, automatic target identification and the like. The traditional target detection method mainly comprises the steps of carrying out region selection and positioning through a sliding window, and then classifying the target through a classifier such as a Support Vector Machine (SVM) and the like. With the development of deep learning, a series of achievements are obtained in target detection by the convolutional neural network, and compared with the traditional target detection method, the detection method based on the deep convolutional network model has the advantages of autonomous feature extraction, strong generalization capability and the like, and becomes one of important research subjects in the field of target detection.

At present, target detection methods based on deep learning are mainly divided into two types: one type is a two-stage object detector, such as fast-RCNN, that first extracts candidate regions from a region-generating network, and then sends the candidate regions to a detection network for object classification and location regression. The two-stage detector has high detection precision, but has low speed, and is difficult to meet the real-time requirement. Another type is a single stage detector, such as Yolo, that divides the image into multiple meshes, directly predicts the probability that each mesh contains the target by detecting the mesh, regresses bounding boxes and class information. The single-stage detector has high detection speed, can detect the target in real time, but has lower detection precision, particularly for dense targets and small targets.

Two main problems exist in the current single-stage detector:

(1) the target prediction is carried out by the grid where the target center is located, when the targets are dense, one grid may contain a plurality of targets, but only one set of prediction parameters is finally output, only one target can be accurately predicted, and other targets are ignored or have larger errors.

(2) The detector detects the image by using a characteristic pyramid structure hierarchy, and each characteristic layer is predicted by using a specific anchor. However, in a data set acquired by an actual project, the scales of targets are often unevenly distributed, most of the targets are intensively trained on one layer of the pyramid, and the other two layers cannot be well trained, so that not only is resource waste caused, but also the detection precision is influenced.

Disclosure of Invention

In view of the above situation, the present invention provides a target detection model training method based on SE-FPN, a target detection method and apparatus, an electronic device, and a readable storage medium, constructs a target detection network model based on SE-FPN, and improves the target detection efficiency by improving an image preprocessing method and a sample selection strategy and training the model for application to target detection.

In order to realize the purpose of the invention, the following scheme is adopted:

a target detection model training method based on SE-FPN comprises the following steps:

zooming a plurality of training pictures acquired from a data set according to different zooming coefficients, splicing the training pictures into a new picture, and performing identical zooming and splicing processing on target labels corresponding to the training pictures, wherein the new picture comprises a plurality of targets with different sizes;

distributing a plurality of targets with different sizes to different pyramid feature layers of the SE-FPN target detection model according to a preset distribution strategy;

in each pyramid feature layer, finding the nearest central point according to the truth value of the training sample distributed to the layermEach position, respectively calculatingmAll anchors and truth values for a positionDIoUD _gCalculatingD _gMean value ofm _gAnd standard deviation ofv _gTo obtain a threshold valuet _g=max(0.2，|m _g-v _g|), choose to be greater thant _gAnd the center position and the anchor output in the target frame; if there is no qualified anchorThen selectD _gMaximum anchor and center position output;

and respectively calculating a classification loss function and a position regression function, and training the model through a back propagation algorithm.

Further, the SE-FPN target detection model comprises three layers of feature pyramids, and a plurality of targets with different sizes are distributed to different pyramid feature layers of the SE-FPN target detection model according to the following distribution strategy:

if the width and height of the target are both greater than or equal to the first allocation thresholdT _lThe pyramid is distributed to a first layer of characteristic pyramid, namely the uppermost pyramid layer;

if the width and height of the target are both greater than the second allocation thresholdT _mBut less than the first allocation thresholdT _lThen, the second pyramid layer is allocated;

otherwise, distributing to a third pyramid layer;

first allocation thresholdT _lGreater than the second allocation thresholdT _m。

Further, in the present invention,DIoUD _gthe calculation formula of (2) is as follows:

，

wherein the content of the first and second substances,IoUbetween presentation of prediction and sample truthIoUThe values of the number of the first and second,

representing a center of predictionbWith the true center of the targetb ^gtThe Euclidean distance of (a) is,drepresents the diagonal distance of the smallest rectangle that can cover both the anchor and target boxes.

Further, the classification loss function adopts a cross entropy loss function, and the position regression function adopts a CIoU loss function.

Furthermore, the SE-FPN target detection model comprises a backbone network, a SE-FPN and a head detection module

The backbone network is used for extracting features of the image input into the SE-FPN target detection model to obtain a plurality of feature layers with different scaling scales and used for constructing the SE-FPN;

the head detection module comprises two parts: a classification module for first usingwCarrying out feature extraction on the layer convolution layer, then classifying by using a full-connection layer, and outputting a classification result; position regression module for usingsPerforming final position regression on the layer convolution layer, and outputting the central position coordinates and scale information of the target;

SE-FPN was constructed by the following steps:

extracting features of different layers through backbone networkf _kWherein, in the step (A),k1,2,3, from top to bottom respectivelyf ₁、f ₂、f ₃Is provided fromiThe layers begin to build up a pyramid of features,istarting from 1:

then, a multilayer characteristic pyramid FPN is constructed from top to bottomiLayer construction FPN, i starts from 1:

s201: if it is noti=1, the original feature is directly sent to S202 for execution, and a new feature is obtained; if it isi> 1, first the upper layer of features

Generating features with different channel weights via SE module

Will be

Sending into a 1 × 1 convolution module, and generating and comparing the first and second samples by up-samplingiNew feature with same layer feature resolution and channel number

(ii) a Will have new characteristics

And original characteristicsf _iFusing to obtain theiLayer newIs characterized by

Then, S202 is executed;

s202: new features in S201

After n sets of convolutional layers, the final features are generated

As a firstiThe characteristics of the layers are sent to a head detection module for classification and position regression; will be provided withiAnd after adding 1, returning to the step S201.

An object detection model training device based on SE-FPN comprises:

the image preprocessing module is used for splicing a plurality of training pictures acquired from the data set into a new picture after the training pictures are zoomed according to different zoom coefficients, and is used for carrying out the same zooming and splicing processing on target labels corresponding to the training pictures, wherein the new picture comprises a plurality of targets with different sizes;

the target distribution module is used for distributing a plurality of targets with different sizes to different pyramid feature layers of the SE-FPN target detection model according to a preset distribution strategy;

a sample selection module for finding the nearest training sample from the center point according to the true value of the training sample distributed to the pyramid feature layer in each pyramid feature layermEach position, respectively calculatingmAll anchors and truth values for a positionDIoUD _gCalculatingD _gMean value ofm _gAnd standard deviation ofv _gTo obtain a threshold valuet _g=max(0.2，|m _g-v _g|), choose to be greater thant _gAnd the center position and the anchor output in the target frame; if there is no eligible anchor, then selectD _gMaximum anchor and center position output;

and the calculation module is used for calculating a classification loss function and a position regression function respectively and training the model through a back propagation algorithm.

A target detection method based on a SE-FPN target detection model comprises the following steps:

acquiring an image to be detected;

inputting an image to be detected into a SE-FPN target detection model obtained by pre-training; the SE-FPN target detection model is obtained by training through a SE-FPN-based target detection model training method;

detecting an image to be detected through an SE-FPN target detection model to obtain a target detection result; the target detection result comprises position information of the target object in the image to be detected.

An object detection device based on an object detection model of SE-FPN, comprising:

the acquisition module is used for acquiring an image to be detected;

the input module is used for inputting the image to be detected into a pre-trained SE-FPN target detection model; the SE-FPN target detection model is obtained by training through a SE-FPN-based target detection model training method;

the detection module is used for detecting the image to be detected through the SE-FPN target detection model to obtain a target detection result; the target detection result comprises position information of the target object in the image to be detected.

An electronic device, comprising: at least one processor and memory; wherein the memory stores computer execution instructions; execution of computer-executable instructions stored in the memory at the at least one processor causes the at least one processor to perform a SE-FPN based target detection model training method or to perform a SE-FPN based target detection model target detection method.

A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, controls an apparatus in which the storage medium is located to perform a method for training a target detection model based on SE-FPN, or to perform a method for target detection based on a SE-FPN target detection model.

The invention has the beneficial effects that:

1. the SE module is introduced into the FPN network structure, and the importance of different channels is learned in the training process, so that the network can pay more attention to the channels containing more effective information, the unimportant channels are restrained, the fusion of high-level features and low-level features is better guided, and the low-level feature map not only has accurate position information, but also contains rich semantic information.

2. The method comprises the steps of calculating a scaling coefficient according to the distribution condition of targets in a data set, enabling the targets to be distributed uniformly after training samples are scaled, distributing the targets to different feature layers according to a target distribution strategy, ensuring that each feature pyramid layer can obtain sufficient training samples, and reducing the condition that some layers in a network model are insufficiently trained due to the fact that the targets are not uniformly distributed.

3. A sample selection strategy is provided, wherein training samples are in m areas of a grid where the center is located and the nearest neighbors around the grid, and positive samples are selected according to the mean value and the variance of the DIoU, so that dense targets can be dispersed into a plurality of positive samples, and the learning probability of the dense targets is increased.

4. Mean valuem _gIndicating the degree of matching between the preset anchor and the true value, if the mean value is high, the threshold value should be increased to adjust the positive sample, and if the mean value is low, the threshold value is decreased to adjust the positive sample. Standard deviation ofv _gThe dispersion degree of the layer of targets is shown, the standard deviation is high and shows that the layer of targets are relatively dispersed, the threshold value is reduced to ensure that each target has a proper anchor, and the standard deviation is low and shows that the layer of targets are relatively concentrated, so that the threshold value can be improved to ensure the high quality of the anchors. Through the combination of the mean value and the variance, the threshold value does not need to be designed manually, and the positive and negative samples can be selected independently by better utilizing data information.

Drawings

The drawings described herein are for illustrative purposes only of selected embodiments and not all possible implementations, and are not intended to limit the scope of the present disclosure.

Fig. 1 is a flow chart of a target detection model training method according to an embodiment of the present application.

FIG. 2 is a schematic structural diagram of a SE-FPN target detection model according to an embodiment of the present application.

Fig. 3 is a schematic structural diagram of a CSP-Darknet network structure model used in the backbone network according to the embodiment of the present application.

FIG. 4 is a schematic diagram of a SE-FPN structure according to an embodiment of the present application.

Fig. 5 is a schematic structural diagram of an SE module according to an embodiment of the present application.

Fig. 6 is a final feature map obtained by multiplying the original feature map according to the embodiment of the present application.

Fig. 7 is a block diagram of a structure of a target detection model training apparatus according to an embodiment of the present application.

Fig. 8 is a block diagram of a target detection method according to an embodiment of the present application.

Fig. 9 is a block diagram of a target detection apparatus according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings.

Example one

The present embodiment provides a target detection model training method based on SE-FPN, as shown in fig. 1, the process includes the following steps:

s100: firstly, randomly taking out four training pictures from a data set, carrying out scaling according to different scaling coefficients, and then splicing into a new picture, wherein the resolution is the input resolution set by the network model, and the new picture comprises a plurality of targets with different sizes.

And zooming and splicing the target labels corresponding to the four pictures in the same way as the pictures, and keeping the consistency of the new image and the target labels.

When the target is a plurality of classes, firstly counting the number of samples of each class, calculating the probability of each sample being extracted according to the statistical result, generating the sampling probability of the data set, and extracting the samples according to the sampling probability each time.

S200: and distributing a plurality of targets with different sizes to different pyramid feature layers of the SE-FPN target detection model according to a preset distribution strategy.

The SE-FPN target detection model comprises three layers of feature pyramids, and the multiple targets with different sizes are distributed to different pyramid feature layers of the SE-FPN target detection model according to the following distribution strategy:

if the width and height of the target are both greater than or equal to the first allocation thresholdT _lThe pyramid is distributed to a first layer of characteristic pyramid, namely the uppermost pyramid layer; if the width and height of the target are both greater than the second allocation thresholdT _mBut less than the first allocation thresholdT _lThen, the second pyramid layer is allocated; otherwise, distributing to a third pyramid layer; first allocation thresholdT _lGreater than the second allocation thresholdT _m. In this exampleT _m=16，T _l=32, see the following equation:

wherein the content of the first and second substances,xthe object is represented by a representation of the object,w ^gtthe width of the object is represented by,h ^gtindicating a high for the target.

The SE-FPN target detection model comprises a backbone network, a SE-FPN and a head detection module, and is shown in FIG. 2.

And the backbone network is used for extracting the features of the image input into the SE-FPN target detection model to obtain a plurality of feature layers with different scaling scales. In this example, the backbone network uses the CSP-dark net model to extract three sets of features at different scaling scales (typically with scaling factors of 8, 16, 32) for constructing the multi-level feature pyramid network FPN. The CSP-Darknet is a combined network structure, the concrete network structure is shown in figure 3, and a CSP (Cross Stage Partial model) module is added into an original Darknet network, so that the learning capacity of the convolutional network is effectively enhanced, and the calculated amount is reduced.

Conventional FPNs directly connect the high level features to the low level features by upsampling, and then output as new features. The invention adds an SE module in an FPN module, and highlights the lower layer in the high-layer characteristics by introducing an attention mechanismThe more guiding channel enables the two characteristics to be better fused, and the specific network structure of the SE module is shown in figure 5. The SE module first passes through the global pooling layer to compress the feature map into

The vector of (2) is obtained, the global feature of the channel level is obtained, and the global feature is compressed into a channel level through a first full connection layer

Is expanded into by activating the function and then passing through a second fully-connected layer

The vector of (2). The relationship among the channels is learned to obtain the weights of different channels, and the weights are multiplied by the original characteristic diagram to obtain a final characteristic diagram, and the specific implementation is shown in fig. 6. This attention mechanism allows the model to focus more on channel features with large amounts of information, while suppressing those channel features that are not important.

The head detection module comprises two parts: a classification module for first usingwCarrying out feature extraction on the layer convolution layer, then classifying by using a full-connection layer, and outputting a classification result; position regression module for usingsAnd (4) performing final position regression on the layer convolution layer, and outputting the central position coordinates and scale information of the target.

The structural diagram of SE-FPN is shown in FIG. 4, and is constructed by the following steps:

s201: if it is noti=1, the original feature is directly sent to S202 for execution, and a new feature is obtained; if it isi1, firstly, a layer of super-fine powder is addedSign for

Generating features with different channel weights via SE module

Will be

Sending into a 1 × 1 convolution module, and performing upsampling by various methods such as single linear interpolation, bilinear interpolation, deconvolution and the likeiNew feature with same layer feature resolution and channel number

(ii) a Will have new characteristics

And original characteristicsf _iThe fusion can be carried out in various ways, such as concat, element-wise, sum, etc., and the concat method is adopted in the embodiment to obtainiFeatures of layer

Then, S202 is executed;

s202: new features in S201

After n sets of convolutional layers, the final features are generated

S300: in each pyramid feature layer, finding the nearest central point according to the truth value of the training sample distributed to the layermEach position, respectively calculatingmAll anchors and truth values for a positionDIoUD _gCalculatingD _gMean value ofm _gAnd standard deviation ofv _gTo obtain a threshold valuet _g=max(0.2，|m _g-v _g|), choose to be greater thant _gAnd the center position and the anchor output in the target frame; if there is no eligible anchor, then selectD _gMaximum anchor and center position output.

DIoUD _gThe calculation formula of (2) is as follows:

，

S400: and respectively calculating a classification loss function and a position regression function by using the samples obtained in the step S300, and training a model by using a back propagation algorithm. In this example, the classification loss function is a cross entropy loss function, and the position regression function is a CIoU loss function, and the formula is as follows:

in the formula (I), the compound is shown in the specification,Cthe category of the sample is represented by,

representing a sample class of network predictions.ɑAnd is a weight coefficient, upsilon is used for measuring the similarity of the aspect ratio, and is specifically defined as shown in a formula,w ^gtwhich represents the width of the sample or samples,h ^gtwhich is indicative of the height of the sample,h ^pthe height of the sample is predicted and,w ^prepresenting the prediction sample width.

Example two

The example provides a target detection model training device based on SE-FPN, and as shown in FIG. 7, the device comprises an image preprocessing module, a feature extraction module, a target distribution module, a sample selection module and a calculation module.

The image preprocessing module zooms a plurality of training pictures acquired from a data set according to different zoom coefficients, then splices the training pictures into a new picture, and performs the same zooming and splicing processing on target labels corresponding to the training pictures, wherein the new picture comprises a plurality of targets with different sizes.

And the target distribution module distributes a plurality of targets with different sizes to different pyramid characteristic layers of the SE-FPN target detection model according to a preset distribution strategy.

The sample selection module finds the nearest central point in each pyramid feature layer according to the truth value of the training sample distributed to the pyramid feature layermEach position, respectively calculatingmAll anchors and truth values for a positionDIoUD _gCalculatingD _gMean value ofm _gAnd standard deviation ofv _gTo obtain a threshold valuet _g=max(0.2，|m _g-v _g|), choose to be greater thant _gAnd the center position and the anchor output in the target frame; if there is no eligible anchor, then selectD _gMaximum anchor and center position output.

The calculation module respectively calculates a classification loss function and a position regression function, and trains the model through a back propagation algorithm.

EXAMPLE III

The present embodiment provides a target detection method based on a SE-FPN target detection model, as shown in fig. 8, including the following steps:

s100: and acquiring an image to be detected.

S200: inputting an image to be detected into a SE-FPN target detection model obtained by pre-training; the SE-FPN target detection model is obtained by training by adopting the SE-FPN-based target detection model training method in the first embodiment.

S300: detecting an image to be detected through an SE-FPN target detection model to obtain a target detection result; the target detection result comprises position information of the target object in the image to be detected.

Example four

The present embodiment provides an object detection apparatus based on an object detection model of SE-FPN, as shown in fig. 9, including: the device comprises an acquisition module, an input module and a detection module.

The acquisition module acquires an image to be detected.

The input module inputs an image to be detected into a SE-FPN target detection model obtained by pre-training; the SE-FPN target detection model is obtained by training by adopting the SE-FPN-based target detection model training method in the first embodiment.

The detection module detects an image to be detected through an SE-FPN target detection model to obtain a target detection result; the target detection result comprises position information of the target object in the image to be detected.

EXAMPLE five

This example provides an electronic device, including: at least one processor and memory; wherein the memory stores computer execution instructions; executing, at least in part, computer-executable instructions stored in the memory, cause the at least one processor to perform the SE-FPN based target detection model training method of the first embodiment, or to perform the SE-FPN based target detection model target detection method of the third embodiment.

EXAMPLE six

The present example provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the apparatus in which the storage medium is located is controlled to execute the target detection model training method based on SE-FPN according to the first embodiment, or execute the target detection method based on SE-FPN according to the third embodiment.

The foregoing is merely a preferred embodiment of this invention and is not intended to be exhaustive or to limit the invention to the precise form disclosed. It will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the scope of the invention.

Claims

1. A target detection model training method based on SE-FPN is characterized by comprising the following steps:

zooming a plurality of training pictures acquired from a data set according to different zooming coefficients, splicing the training pictures into a new picture, and performing the same zooming and splicing treatment on target labels corresponding to the training pictures, wherein the new picture comprises a plurality of targets with different sizes;

distributing the targets with different sizes to different pyramid feature layers of the SE-FPN target detection model according to a preset distribution strategy;

in each pyramid feature layer, finding the nearest central point according to the truth value of the training sample distributed to the layermEach position, respectively calculatingmAll anchors and truth values for a positionDIoUD _gCalculatingD _gMean value ofm _gAnd standard deviation ofv _gTo obtain a threshold valuet _g=max(0.2，|m _g-v _g|), choose to be greater thant _gAnd the center position and the anchor output in the target frame; if there is no eligible anchor, then selectD _gMaximum anchor and center position output;

2. The SE-FPN based target detection model training method as claimed in claim 1, wherein the SE-FPN target detection model comprises a three-layer feature pyramid, and the multiple targets with different sizes are distributed to different pyramid feature layers of the SE-FPN target detection model according to the following distribution strategy:

if the width and height of the target are both greater than or equal to the first allocation thresholdT _lAssigned to the first layer bitA pyramid is characterized, namely the uppermost pyramid layer;

otherwise, distributing to a third pyramid layer;

the first allocation thresholdT _lGreater than the second allocation thresholdT _m。

3. The SE-FPN based target detection model training method as defined in claim 1,DIoUD _gthe calculation formula of (2) is as follows:

，

4. The SE-FPN based target detection model training method as claimed in claim 1, wherein the classification loss function adopts a cross entropy loss function, and the position regression function adopts a CIoU loss function.

5. The SE-FPN based target detection model training method as claimed in claim 1, wherein the SE-FPN target detection model comprises a backbone network, a SE-FPN and a head detection module

SE-FPN was constructed by the following steps:

Generating features with different channel weights via SE module

Will be

(ii) a Will have new characteristics

And original characteristicsf _iFusing to obtain theiFeatures of layer

Then, S202 is executed;

s202: new features in S201

After n sets of convolutional layers, the final features are generated

6. An object detection model training device based on SE-FPN is characterized by comprising:

the image preprocessing module is used for zooming a plurality of training pictures acquired from the data set according to different zoom coefficients and splicing the training pictures into a new picture; the target labels corresponding to the multiple training pictures are subjected to the same scaling and splicing treatment; the new picture comprises a plurality of different sized targets;

the target distribution module is used for distributing the targets with different sizes to different pyramid feature layers of the SE-FPN target detection model according to a preset distribution strategy;

7. A target detection method based on a SE-FPN target detection model is characterized by comprising the following steps:

acquiring an image to be detected;

inputting the image to be detected into a pre-trained SE-FPN target detection model; wherein, the SE-FPN target detection model is obtained by training by adopting the SE-FPN based target detection model training method of any one of claims 1-5;

detecting the image to be detected through the SE-FPN target detection model to obtain a target detection result; the target detection result comprises position information of a target object in the image to be detected.

8. An object detection device based on an object detection model of SE-FPN, comprising:

the acquisition module is used for acquiring an image to be detected;

the input module is used for inputting the image to be detected to a pre-trained SE-FPN target detection model; wherein, the SE-FPN target detection model is obtained by training by adopting the SE-FPN based target detection model training method of any one of claims 1-5;

the detection module is used for detecting the image to be detected through the SE-FPN target detection model to obtain a target detection result; the target detection result comprises position information of a target object in the image to be detected.

9. An electronic device, comprising: at least one processor and memory; wherein the memory stores computer-executable instructions; wherein execution of computer-executable instructions stored in the memory on the at least one processor causes the at least one processor to perform the SE-FPN based object detection model training method of any of claims 1-5 or to perform the SE-FPN based object detection model object detection method of claim 7.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, controls an apparatus in which the storage medium is located to perform a method for training an object detection model based on SE-FPN according to any one of claims 1 to 5, or to perform a method for object detection based on an SE-FPN object detection model according to claim 7.