CN112561801A - Target detection model training method based on SE-FPN, target detection method and device - Google Patents

Target detection model training method based on SE-FPN, target detection method and device Download PDF

Info

Publication number
CN112561801A
CN112561801A CN202011560657.5A CN202011560657A CN112561801A CN 112561801 A CN112561801 A CN 112561801A CN 202011560657 A CN202011560657 A CN 202011560657A CN 112561801 A CN112561801 A CN 112561801A
Authority
CN
China
Prior art keywords
fpn
target detection
detection model
target
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011560657.5A
Other languages
Chinese (zh)
Inventor
谷晓琳
杨敏
张燚
刘科
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sunwise Space Technology Ltd
Original Assignee
Beijing Sunwise Space Technology Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sunwise Space Technology Ltd filed Critical Beijing Sunwise Space Technology Ltd
Priority to CN202011560657.5A priority Critical patent/CN112561801A/en
Publication of CN112561801A publication Critical patent/CN112561801A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4038Scaling the whole image or part thereof for image mosaicing, i.e. plane images composed of plane sub-images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Abstract

The SE-FPN-based target detection model training method, the target detection method and the device comprise the following steps: zooming a plurality of training pictures according to different zooming coefficients, and splicing the training pictures into a new picture, wherein the new picture comprises a plurality of targets with different sizes; distributing the targets with different sizes to different pyramid feature layers of the SE-FPN target detection model according to a preset distribution strategy; in each pyramid feature layer, finding the nearest center point according to the truth value of the training sample of the layermA position, calculatemAll anchors and truth values for a positionDIoUD g CalculatingD g Mean valuem g Standard deviation ofv g To obtain a threshold valuet g Is selected to be greater thant g And the center position and the anchor output in the target frame; and calculating a classification loss function and a position regression function, and training the model through a back propagation algorithm. And constructing a target detection network model based on SE-FPN, improving an image preprocessing mode and a sample selection strategy, training the model, applying the model to target detection and improving the target detection efficiency.

Description

Target detection model training method based on SE-FPN, target detection method and device
Technical Field
The invention relates to the field of computer vision, in particular to a target detection model training method based on SE-FPN, a target detection method and a target detection device.
Background
Target detection is a fundamental research topic in the field of computer vision, and is widely applied to the fields of unmanned driving, intelligent monitoring, automatic target identification and the like. The traditional target detection method mainly comprises the steps of carrying out region selection and positioning through a sliding window, and then classifying the target through a classifier such as a Support Vector Machine (SVM) and the like. With the development of deep learning, a series of achievements are obtained in target detection by the convolutional neural network, and compared with the traditional target detection method, the detection method based on the deep convolutional network model has the advantages of autonomous feature extraction, strong generalization capability and the like, and becomes one of important research subjects in the field of target detection.
At present, target detection methods based on deep learning are mainly divided into two types: one type is a two-stage object detector, such as fast-RCNN, that first extracts candidate regions from a region-generating network, and then sends the candidate regions to a detection network for object classification and location regression. The two-stage detector has high detection precision, but has low speed, and is difficult to meet the real-time requirement. Another type is a single stage detector, such as Yolo, that divides the image into multiple meshes, directly predicts the probability that each mesh contains the target by detecting the mesh, regresses bounding boxes and class information. The single-stage detector has high detection speed, can detect the target in real time, but has lower detection precision, particularly for dense targets and small targets.
Two main problems exist in the current single-stage detector:
(1) the target prediction is carried out by the grid where the target center is located, when the targets are dense, one grid may contain a plurality of targets, but only one set of prediction parameters is finally output, only one target can be accurately predicted, and other targets are ignored or have larger errors.
(2) The detector detects the image by using a characteristic pyramid structure hierarchy, and each characteristic layer is predicted by using a specific anchor. However, in a data set acquired by an actual project, the scales of targets are often unevenly distributed, most of the targets are intensively trained on one layer of the pyramid, and the other two layers cannot be well trained, so that not only is resource waste caused, but also the detection precision is influenced.
Disclosure of Invention
In view of the above situation, the present invention provides a target detection model training method based on SE-FPN, a target detection method and apparatus, an electronic device, and a readable storage medium, constructs a target detection network model based on SE-FPN, and improves the target detection efficiency by improving an image preprocessing method and a sample selection strategy and training the model for application to target detection.
In order to realize the purpose of the invention, the following scheme is adopted:
a target detection model training method based on SE-FPN comprises the following steps:
zooming a plurality of training pictures acquired from a data set according to different zooming coefficients, splicing the training pictures into a new picture, and performing identical zooming and splicing processing on target labels corresponding to the training pictures, wherein the new picture comprises a plurality of targets with different sizes;
distributing a plurality of targets with different sizes to different pyramid feature layers of the SE-FPN target detection model according to a preset distribution strategy;
in each pyramid feature layer, finding the nearest central point according to the truth value of the training sample distributed to the layermEach position, respectively calculatingmAll anchors and truth values for a positionDIoUD g CalculatingD g Mean value ofm g And standard deviation ofv g To obtain a threshold valuet g =max(0.2,|m g -v g |), choose to be greater thant g And the center position and the anchor output in the target frame; if there is no qualified anchorThen selectD g Maximum anchor and center position output;
and respectively calculating a classification loss function and a position regression function, and training the model through a back propagation algorithm.
Further, the SE-FPN target detection model comprises three layers of feature pyramids, and a plurality of targets with different sizes are distributed to different pyramid feature layers of the SE-FPN target detection model according to the following distribution strategy:
if the width and height of the target are both greater than or equal to the first allocation thresholdT l The pyramid is distributed to a first layer of characteristic pyramid, namely the uppermost pyramid layer;
if the width and height of the target are both greater than the second allocation thresholdT m But less than the first allocation thresholdT l Then, the second pyramid layer is allocated;
otherwise, distributing to a third pyramid layer;
first allocation thresholdT l Greater than the second allocation thresholdT m
Further, in the present invention,DIoUD g the calculation formula of (2) is as follows:
Figure 100002_DEST_PATH_IMAGE002
wherein the content of the first and second substances,IoUbetween presentation of prediction and sample truthIoUThe values of the number of the first and second,
Figure 100002_DEST_PATH_IMAGE004
representing a center of predictionbWith the true center of the targetb gt The Euclidean distance of (a) is,drepresents the diagonal distance of the smallest rectangle that can cover both the anchor and target boxes.
Further, the classification loss function adopts a cross entropy loss function, and the position regression function adopts a CIoU loss function.
Furthermore, the SE-FPN target detection model comprises a backbone network, a SE-FPN and a head detection module
The backbone network is used for extracting features of the image input into the SE-FPN target detection model to obtain a plurality of feature layers with different scaling scales and used for constructing the SE-FPN;
the head detection module comprises two parts: a classification module for first usingwCarrying out feature extraction on the layer convolution layer, then classifying by using a full-connection layer, and outputting a classification result; position regression module for usingsPerforming final position regression on the layer convolution layer, and outputting the central position coordinates and scale information of the target;
SE-FPN was constructed by the following steps:
extracting features of different layers through backbone networkf k Wherein, in the step (A),k1,2,3, from top to bottom respectivelyf 1f 2f 3Is provided fromiThe layers begin to build up a pyramid of features,istarting from 1:
then, a multilayer characteristic pyramid FPN is constructed from top to bottomiLayer construction FPN, i starts from 1:
s201: if it is noti=1, the original feature is directly sent to S202 for execution, and a new feature is obtained; if it isi> 1, first the upper layer of features
Figure 100002_DEST_PATH_IMAGE006
Generating features with different channel weights via SE module
Figure 100002_DEST_PATH_IMAGE008
Will be
Figure 430242DEST_PATH_IMAGE008
Sending into a 1 × 1 convolution module, and generating and comparing the first and second samples by up-samplingiNew feature with same layer feature resolution and channel number
Figure 100002_DEST_PATH_IMAGE010
(ii) a Will have new characteristics
Figure 231976DEST_PATH_IMAGE010
And original characteristicsf i Fusing to obtain theiLayer newIs characterized by
Figure 100002_DEST_PATH_IMAGE012
Then, S202 is executed;
s202: new features in S201
Figure 100002_DEST_PATH_IMAGE013
After n sets of convolutional layers, the final features are generated
Figure 100002_DEST_PATH_IMAGE015
As a firstiThe characteristics of the layers are sent to a head detection module for classification and position regression; will be provided withiAnd after adding 1, returning to the step S201.
An object detection model training device based on SE-FPN comprises:
the image preprocessing module is used for splicing a plurality of training pictures acquired from the data set into a new picture after the training pictures are zoomed according to different zoom coefficients, and is used for carrying out the same zooming and splicing processing on target labels corresponding to the training pictures, wherein the new picture comprises a plurality of targets with different sizes;
the target distribution module is used for distributing a plurality of targets with different sizes to different pyramid feature layers of the SE-FPN target detection model according to a preset distribution strategy;
a sample selection module for finding the nearest training sample from the center point according to the true value of the training sample distributed to the pyramid feature layer in each pyramid feature layermEach position, respectively calculatingmAll anchors and truth values for a positionDIoUD g CalculatingD g Mean value ofm g And standard deviation ofv g To obtain a threshold valuet g =max(0.2,|m g -v g |), choose to be greater thant g And the center position and the anchor output in the target frame; if there is no eligible anchor, then selectD g Maximum anchor and center position output;
and the calculation module is used for calculating a classification loss function and a position regression function respectively and training the model through a back propagation algorithm.
A target detection method based on a SE-FPN target detection model comprises the following steps:
acquiring an image to be detected;
inputting an image to be detected into a SE-FPN target detection model obtained by pre-training; the SE-FPN target detection model is obtained by training through a SE-FPN-based target detection model training method;
detecting an image to be detected through an SE-FPN target detection model to obtain a target detection result; the target detection result comprises position information of the target object in the image to be detected.
An object detection device based on an object detection model of SE-FPN, comprising:
the acquisition module is used for acquiring an image to be detected;
the input module is used for inputting the image to be detected into a pre-trained SE-FPN target detection model; the SE-FPN target detection model is obtained by training through a SE-FPN-based target detection model training method;
the detection module is used for detecting the image to be detected through the SE-FPN target detection model to obtain a target detection result; the target detection result comprises position information of the target object in the image to be detected.
An electronic device, comprising: at least one processor and memory; wherein the memory stores computer execution instructions; execution of computer-executable instructions stored in the memory at the at least one processor causes the at least one processor to perform a SE-FPN based target detection model training method or to perform a SE-FPN based target detection model target detection method.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, controls an apparatus in which the storage medium is located to perform a method for training a target detection model based on SE-FPN, or to perform a method for target detection based on a SE-FPN target detection model.
The invention has the beneficial effects that:
1. the SE module is introduced into the FPN network structure, and the importance of different channels is learned in the training process, so that the network can pay more attention to the channels containing more effective information, the unimportant channels are restrained, the fusion of high-level features and low-level features is better guided, and the low-level feature map not only has accurate position information, but also contains rich semantic information.
2. The method comprises the steps of calculating a scaling coefficient according to the distribution condition of targets in a data set, enabling the targets to be distributed uniformly after training samples are scaled, distributing the targets to different feature layers according to a target distribution strategy, ensuring that each feature pyramid layer can obtain sufficient training samples, and reducing the condition that some layers in a network model are insufficiently trained due to the fact that the targets are not uniformly distributed.
3. A sample selection strategy is provided, wherein training samples are in m areas of a grid where the center is located and the nearest neighbors around the grid, and positive samples are selected according to the mean value and the variance of the DIoU, so that dense targets can be dispersed into a plurality of positive samples, and the learning probability of the dense targets is increased.
4. Mean valuem g Indicating the degree of matching between the preset anchor and the true value, if the mean value is high, the threshold value should be increased to adjust the positive sample, and if the mean value is low, the threshold value is decreased to adjust the positive sample. Standard deviation ofv g The dispersion degree of the layer of targets is shown, the standard deviation is high and shows that the layer of targets are relatively dispersed, the threshold value is reduced to ensure that each target has a proper anchor, and the standard deviation is low and shows that the layer of targets are relatively concentrated, so that the threshold value can be improved to ensure the high quality of the anchors. Through the combination of the mean value and the variance, the threshold value does not need to be designed manually, and the positive and negative samples can be selected independently by better utilizing data information.
Drawings
The drawings described herein are for illustrative purposes only of selected embodiments and not all possible implementations, and are not intended to limit the scope of the present disclosure.
Fig. 1 is a flow chart of a target detection model training method according to an embodiment of the present application.
FIG. 2 is a schematic structural diagram of a SE-FPN target detection model according to an embodiment of the present application.
Fig. 3 is a schematic structural diagram of a CSP-Darknet network structure model used in the backbone network according to the embodiment of the present application.
FIG. 4 is a schematic diagram of a SE-FPN structure according to an embodiment of the present application.
Fig. 5 is a schematic structural diagram of an SE module according to an embodiment of the present application.
Fig. 6 is a final feature map obtained by multiplying the original feature map according to the embodiment of the present application.
Fig. 7 is a block diagram of a structure of a target detection model training apparatus according to an embodiment of the present application.
Fig. 8 is a block diagram of a target detection method according to an embodiment of the present application.
Fig. 9 is a block diagram of a target detection apparatus according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings.
Example one
The present embodiment provides a target detection model training method based on SE-FPN, as shown in fig. 1, the process includes the following steps:
s100: firstly, randomly taking out four training pictures from a data set, carrying out scaling according to different scaling coefficients, and then splicing into a new picture, wherein the resolution is the input resolution set by the network model, and the new picture comprises a plurality of targets with different sizes.
And zooming and splicing the target labels corresponding to the four pictures in the same way as the pictures, and keeping the consistency of the new image and the target labels.
When the target is a plurality of classes, firstly counting the number of samples of each class, calculating the probability of each sample being extracted according to the statistical result, generating the sampling probability of the data set, and extracting the samples according to the sampling probability each time.
S200: and distributing a plurality of targets with different sizes to different pyramid feature layers of the SE-FPN target detection model according to a preset distribution strategy.
The SE-FPN target detection model comprises three layers of feature pyramids, and the multiple targets with different sizes are distributed to different pyramid feature layers of the SE-FPN target detection model according to the following distribution strategy:
if the width and height of the target are both greater than or equal to the first allocation thresholdT l The pyramid is distributed to a first layer of characteristic pyramid, namely the uppermost pyramid layer; if the width and height of the target are both greater than the second allocation thresholdT m But less than the first allocation thresholdT l Then, the second pyramid layer is allocated; otherwise, distributing to a third pyramid layer; first allocation thresholdT l Greater than the second allocation thresholdT m . In this exampleT m =16,T l =32, see the following equation:
Figure DEST_PATH_IMAGE017
wherein the content of the first and second substances,xthe object is represented by a representation of the object,w gt the width of the object is represented by,h gt indicating a high for the target.
The SE-FPN target detection model comprises a backbone network, a SE-FPN and a head detection module, and is shown in FIG. 2.
And the backbone network is used for extracting the features of the image input into the SE-FPN target detection model to obtain a plurality of feature layers with different scaling scales. In this example, the backbone network uses the CSP-dark net model to extract three sets of features at different scaling scales (typically with scaling factors of 8, 16, 32) for constructing the multi-level feature pyramid network FPN. The CSP-Darknet is a combined network structure, the concrete network structure is shown in figure 3, and a CSP (Cross Stage Partial model) module is added into an original Darknet network, so that the learning capacity of the convolutional network is effectively enhanced, and the calculated amount is reduced.
Conventional FPNs directly connect the high level features to the low level features by upsampling, and then output as new features. The invention adds an SE module in an FPN module, and highlights the lower layer in the high-layer characteristics by introducing an attention mechanismThe more guiding channel enables the two characteristics to be better fused, and the specific network structure of the SE module is shown in figure 5. The SE module first passes through the global pooling layer to compress the feature map into
Figure DEST_PATH_IMAGE019
The vector of (2) is obtained, the global feature of the channel level is obtained, and the global feature is compressed into a channel level through a first full connection layer
Figure DEST_PATH_IMAGE021
Is expanded into by activating the function and then passing through a second fully-connected layer
Figure DEST_PATH_IMAGE023
The vector of (2). The relationship among the channels is learned to obtain the weights of different channels, and the weights are multiplied by the original characteristic diagram to obtain a final characteristic diagram, and the specific implementation is shown in fig. 6. This attention mechanism allows the model to focus more on channel features with large amounts of information, while suppressing those channel features that are not important.
The head detection module comprises two parts: a classification module for first usingwCarrying out feature extraction on the layer convolution layer, then classifying by using a full-connection layer, and outputting a classification result; position regression module for usingsAnd (4) performing final position regression on the layer convolution layer, and outputting the central position coordinates and scale information of the target.
The structural diagram of SE-FPN is shown in FIG. 4, and is constructed by the following steps:
extracting features of different layers through backbone networkf k Wherein, in the step (A),k1,2,3, from top to bottom respectivelyf 1f 2f 3Is provided fromiThe layers begin to build up a pyramid of features,istarting from 1:
then, a multilayer characteristic pyramid FPN is constructed from top to bottomiLayer construction FPN, i starts from 1:
s201: if it is noti=1, the original feature is directly sent to S202 for execution, and a new feature is obtained; if it isi1, firstly, a layer of super-fine powder is addedSign for
Figure 200675DEST_PATH_IMAGE006
Generating features with different channel weights via SE module
Figure 226400DEST_PATH_IMAGE008
Will be
Figure 696696DEST_PATH_IMAGE008
Sending into a 1 × 1 convolution module, and performing upsampling by various methods such as single linear interpolation, bilinear interpolation, deconvolution and the likeiNew feature with same layer feature resolution and channel number
Figure DEST_PATH_IMAGE024
(ii) a Will have new characteristics
Figure 567700DEST_PATH_IMAGE024
And original characteristicsf i The fusion can be carried out in various ways, such as concat, element-wise, sum, etc., and the concat method is adopted in the embodiment to obtainiFeatures of layer
Figure 439841DEST_PATH_IMAGE012
Then, S202 is executed;
s202: new features in S201
Figure 639397DEST_PATH_IMAGE013
After n sets of convolutional layers, the final features are generated
Figure 65830DEST_PATH_IMAGE015
As a firstiThe characteristics of the layers are sent to a head detection module for classification and position regression; will be provided withiAnd after adding 1, returning to the step S201.
S300: in each pyramid feature layer, finding the nearest central point according to the truth value of the training sample distributed to the layermEach position, respectively calculatingmAll anchors and truth values for a positionDIoUD g CalculatingD g Mean value ofm g And standard deviation ofv g To obtain a threshold valuet g =max(0.2,|m g -v g |), choose to be greater thant g And the center position and the anchor output in the target frame; if there is no eligible anchor, then selectD g Maximum anchor and center position output.
DIoUD g The calculation formula of (2) is as follows:
Figure 6104DEST_PATH_IMAGE002
wherein the content of the first and second substances,IoUbetween presentation of prediction and sample truthIoUThe values of the number of the first and second,
Figure 998331DEST_PATH_IMAGE004
representing a center of predictionbWith the true center of the targetb gt The Euclidean distance of (a) is,drepresents the diagonal distance of the smallest rectangle that can cover both the anchor and target boxes.
S400: and respectively calculating a classification loss function and a position regression function by using the samples obtained in the step S300, and training a model by using a back propagation algorithm. In this example, the classification loss function is a cross entropy loss function, and the position regression function is a CIoU loss function, and the formula is as follows:
Figure DEST_PATH_IMAGE026
in the formula (I), the compound is shown in the specification,Cthe category of the sample is represented by,
Figure DEST_PATH_IMAGE028
representing a sample class of network predictions.ɑAnd is a weight coefficient, upsilon is used for measuring the similarity of the aspect ratio, and is specifically defined as shown in a formula,w gt which represents the width of the sample or samples,h gt which is indicative of the height of the sample,h p the height of the sample is predicted and,w p representing the prediction sample width.
Example two
The example provides a target detection model training device based on SE-FPN, and as shown in FIG. 7, the device comprises an image preprocessing module, a feature extraction module, a target distribution module, a sample selection module and a calculation module.
The image preprocessing module zooms a plurality of training pictures acquired from a data set according to different zoom coefficients, then splices the training pictures into a new picture, and performs the same zooming and splicing processing on target labels corresponding to the training pictures, wherein the new picture comprises a plurality of targets with different sizes.
And the target distribution module distributes a plurality of targets with different sizes to different pyramid characteristic layers of the SE-FPN target detection model according to a preset distribution strategy.
The sample selection module finds the nearest central point in each pyramid feature layer according to the truth value of the training sample distributed to the pyramid feature layermEach position, respectively calculatingmAll anchors and truth values for a positionDIoUD g CalculatingD g Mean value ofm g And standard deviation ofv g To obtain a threshold valuet g =max(0.2,|m g -v g |), choose to be greater thant g And the center position and the anchor output in the target frame; if there is no eligible anchor, then selectD g Maximum anchor and center position output.
The calculation module respectively calculates a classification loss function and a position regression function, and trains the model through a back propagation algorithm.
EXAMPLE III
The present embodiment provides a target detection method based on a SE-FPN target detection model, as shown in fig. 8, including the following steps:
s100: and acquiring an image to be detected.
S200: inputting an image to be detected into a SE-FPN target detection model obtained by pre-training; the SE-FPN target detection model is obtained by training by adopting the SE-FPN-based target detection model training method in the first embodiment.
S300: detecting an image to be detected through an SE-FPN target detection model to obtain a target detection result; the target detection result comprises position information of the target object in the image to be detected.
Example four
The present embodiment provides an object detection apparatus based on an object detection model of SE-FPN, as shown in fig. 9, including: the device comprises an acquisition module, an input module and a detection module.
The acquisition module acquires an image to be detected.
The input module inputs an image to be detected into a SE-FPN target detection model obtained by pre-training; the SE-FPN target detection model is obtained by training by adopting the SE-FPN-based target detection model training method in the first embodiment.
The detection module detects an image to be detected through an SE-FPN target detection model to obtain a target detection result; the target detection result comprises position information of the target object in the image to be detected.
EXAMPLE five
This example provides an electronic device, including: at least one processor and memory; wherein the memory stores computer execution instructions; executing, at least in part, computer-executable instructions stored in the memory, cause the at least one processor to perform the SE-FPN based target detection model training method of the first embodiment, or to perform the SE-FPN based target detection model target detection method of the third embodiment.
EXAMPLE six
The present example provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the apparatus in which the storage medium is located is controlled to execute the target detection model training method based on SE-FPN according to the first embodiment, or execute the target detection method based on SE-FPN according to the third embodiment.
The foregoing is merely a preferred embodiment of this invention and is not intended to be exhaustive or to limit the invention to the precise form disclosed. It will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the scope of the invention.

Claims (10)

1. A target detection model training method based on SE-FPN is characterized by comprising the following steps:
zooming a plurality of training pictures acquired from a data set according to different zooming coefficients, splicing the training pictures into a new picture, and performing the same zooming and splicing treatment on target labels corresponding to the training pictures, wherein the new picture comprises a plurality of targets with different sizes;
distributing the targets with different sizes to different pyramid feature layers of the SE-FPN target detection model according to a preset distribution strategy;
in each pyramid feature layer, finding the nearest central point according to the truth value of the training sample distributed to the layermEach position, respectively calculatingmAll anchors and truth values for a positionDIoUD g CalculatingD g Mean value ofm g And standard deviation ofv g To obtain a threshold valuet g =max(0.2,|m g -v g |), choose to be greater thant g And the center position and the anchor output in the target frame; if there is no eligible anchor, then selectD g Maximum anchor and center position output;
and respectively calculating a classification loss function and a position regression function, and training the model through a back propagation algorithm.
2. The SE-FPN based target detection model training method as claimed in claim 1, wherein the SE-FPN target detection model comprises a three-layer feature pyramid, and the multiple targets with different sizes are distributed to different pyramid feature layers of the SE-FPN target detection model according to the following distribution strategy:
if the width and height of the target are both greater than or equal to the first allocation thresholdT l Assigned to the first layer bitA pyramid is characterized, namely the uppermost pyramid layer;
if the width and height of the target are both greater than the second allocation thresholdT m But less than the first allocation thresholdT l Then, the second pyramid layer is allocated;
otherwise, distributing to a third pyramid layer;
the first allocation thresholdT l Greater than the second allocation thresholdT m
3. The SE-FPN based target detection model training method as defined in claim 1,DIoUD g the calculation formula of (2) is as follows:
Figure DEST_PATH_IMAGE002
wherein the content of the first and second substances,IoUbetween presentation of prediction and sample truthIoUThe values of the number of the first and second,
Figure DEST_PATH_IMAGE004
representing a center of predictionbWith the true center of the targetb gt The Euclidean distance of (a) is,drepresents the diagonal distance of the smallest rectangle that can cover both the anchor and target boxes.
4. The SE-FPN based target detection model training method as claimed in claim 1, wherein the classification loss function adopts a cross entropy loss function, and the position regression function adopts a CIoU loss function.
5. The SE-FPN based target detection model training method as claimed in claim 1, wherein the SE-FPN target detection model comprises a backbone network, a SE-FPN and a head detection module
The backbone network is used for extracting features of the image input into the SE-FPN target detection model to obtain a plurality of feature layers with different scaling scales and used for constructing the SE-FPN;
the head detection module comprises two parts: a classification module for first usingwCarrying out feature extraction on the layer convolution layer, then classifying by using a full-connection layer, and outputting a classification result; position regression module for usingsPerforming final position regression on the layer convolution layer, and outputting the central position coordinates and scale information of the target;
SE-FPN was constructed by the following steps:
extracting features of different layers through backbone networkf k Wherein, in the step (A),k1,2,3, from top to bottom respectivelyf 1f 2f 3Is provided fromiThe layers begin to build up a pyramid of features,istarting from 1:
then, a multilayer characteristic pyramid FPN is constructed from top to bottomiLayer construction FPN, i starts from 1:
s201: if it is noti=1, the original feature is directly sent to S202 for execution, and a new feature is obtained; if it isi> 1, first the upper layer of features
Figure DEST_PATH_IMAGE006
Generating features with different channel weights via SE module
Figure DEST_PATH_IMAGE008
Will be
Figure 630581DEST_PATH_IMAGE008
Sending into a 1 × 1 convolution module, and generating and comparing the first and second samples by up-samplingiNew feature with same layer feature resolution and channel number
Figure DEST_PATH_IMAGE010
(ii) a Will have new characteristics
Figure 954246DEST_PATH_IMAGE010
And original characteristicsf i Fusing to obtain theiFeatures of layer
Figure DEST_PATH_IMAGE012
Then, S202 is executed;
s202: new features in S201
Figure DEST_PATH_IMAGE013
After n sets of convolutional layers, the final features are generated
Figure DEST_PATH_IMAGE015
As a firstiThe characteristics of the layers are sent to a head detection module for classification and position regression; will be provided withiAnd after adding 1, returning to the step S201.
6. An object detection model training device based on SE-FPN is characterized by comprising:
the image preprocessing module is used for zooming a plurality of training pictures acquired from the data set according to different zoom coefficients and splicing the training pictures into a new picture; the target labels corresponding to the multiple training pictures are subjected to the same scaling and splicing treatment; the new picture comprises a plurality of different sized targets;
the target distribution module is used for distributing the targets with different sizes to different pyramid feature layers of the SE-FPN target detection model according to a preset distribution strategy;
a sample selection module for finding the nearest training sample from the center point according to the true value of the training sample distributed to the pyramid feature layer in each pyramid feature layermEach position, respectively calculatingmAll anchors and truth values for a positionDIoUD g CalculatingD g Mean value ofm g And standard deviation ofv g To obtain a threshold valuet g =max(0.2,|m g -v g |), choose to be greater thant g And the center position and the anchor output in the target frame; if there is no eligible anchor, then selectD g Maximum anchor and center position output;
and the calculation module is used for calculating a classification loss function and a position regression function respectively and training the model through a back propagation algorithm.
7. A target detection method based on a SE-FPN target detection model is characterized by comprising the following steps:
acquiring an image to be detected;
inputting the image to be detected into a pre-trained SE-FPN target detection model; wherein, the SE-FPN target detection model is obtained by training by adopting the SE-FPN based target detection model training method of any one of claims 1-5;
detecting the image to be detected through the SE-FPN target detection model to obtain a target detection result; the target detection result comprises position information of a target object in the image to be detected.
8. An object detection device based on an object detection model of SE-FPN, comprising:
the acquisition module is used for acquiring an image to be detected;
the input module is used for inputting the image to be detected to a pre-trained SE-FPN target detection model; wherein, the SE-FPN target detection model is obtained by training by adopting the SE-FPN based target detection model training method of any one of claims 1-5;
the detection module is used for detecting the image to be detected through the SE-FPN target detection model to obtain a target detection result; the target detection result comprises position information of a target object in the image to be detected.
9. An electronic device, comprising: at least one processor and memory; wherein the memory stores computer-executable instructions; wherein execution of computer-executable instructions stored in the memory on the at least one processor causes the at least one processor to perform the SE-FPN based object detection model training method of any of claims 1-5 or to perform the SE-FPN based object detection model object detection method of claim 7.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, controls an apparatus in which the storage medium is located to perform a method for training an object detection model based on SE-FPN according to any one of claims 1 to 5, or to perform a method for object detection based on an SE-FPN object detection model according to claim 7.
CN202011560657.5A 2020-12-25 2020-12-25 Target detection model training method based on SE-FPN, target detection method and device Pending CN112561801A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011560657.5A CN112561801A (en) 2020-12-25 2020-12-25 Target detection model training method based on SE-FPN, target detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011560657.5A CN112561801A (en) 2020-12-25 2020-12-25 Target detection model training method based on SE-FPN, target detection method and device

Publications (1)

Publication Number Publication Date
CN112561801A true CN112561801A (en) 2021-03-26

Family

ID=75032568

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011560657.5A Pending CN112561801A (en) 2020-12-25 2020-12-25 Target detection model training method based on SE-FPN, target detection method and device

Country Status (1)

Country Link
CN (1) CN112561801A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113392857A (en) * 2021-08-17 2021-09-14 深圳市爱深盈通信息技术有限公司 Target detection method, device and equipment terminal based on yolo network
CN113421187A (en) * 2021-06-10 2021-09-21 山东师范大学 Super-resolution reconstruction method, system, storage medium and equipment
CN113452912A (en) * 2021-06-25 2021-09-28 山东新一代信息产业技术研究院有限公司 Pan-tilt camera control method, device, equipment and medium for inspection robot

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113421187A (en) * 2021-06-10 2021-09-21 山东师范大学 Super-resolution reconstruction method, system, storage medium and equipment
CN113421187B (en) * 2021-06-10 2023-01-03 山东师范大学 Super-resolution reconstruction method, system, storage medium and equipment
CN113452912A (en) * 2021-06-25 2021-09-28 山东新一代信息产业技术研究院有限公司 Pan-tilt camera control method, device, equipment and medium for inspection robot
CN113392857A (en) * 2021-08-17 2021-09-14 深圳市爱深盈通信息技术有限公司 Target detection method, device and equipment terminal based on yolo network

Similar Documents

Publication Publication Date Title
CN109902677B (en) Vehicle detection method based on deep learning
CN108399362B (en) Rapid pedestrian detection method and device
CN112150821B (en) Lightweight vehicle detection model construction method, system and device
CN114202672A (en) Small target detection method based on attention mechanism
CN111126472A (en) Improved target detection method based on SSD
CN112561801A (en) Target detection model training method based on SE-FPN, target detection method and device
CN108304820B (en) Face detection method and device and terminal equipment
WO2017096758A1 (en) Image classification method, electronic device, and storage medium
CN111524137B (en) Cell identification counting method and device based on image identification and computer equipment
CN111259940B (en) Target detection method based on space attention map
CN112396002A (en) Lightweight remote sensing target detection method based on SE-YOLOv3
CN109583483A (en) A kind of object detection method and system based on convolutional neural networks
CN110889446A (en) Face image recognition model training and face image recognition method and device
CN113052834B (en) Pipeline defect detection method based on convolution neural network multi-scale features
CN112232355B (en) Image segmentation network processing method, image segmentation device and computer equipment
CN108664986B (en) Based on lpNorm regularized multi-task learning image classification method and system
CN111126278A (en) Target detection model optimization and acceleration method for few-category scene
CN113592060A (en) Neural network optimization method and device
CN110807362A (en) Image detection method and device and computer readable storage medium
CN112800955A (en) Remote sensing image rotating target detection method and system based on weighted bidirectional feature pyramid
CN114332578A (en) Image anomaly detection model training method, image anomaly detection method and device
CN115731533A (en) Vehicle-mounted target detection method based on improved YOLOv5
CN114359245A (en) Method for detecting surface defects of products in industrial scene
CN111310821A (en) Multi-view feature fusion method, system, computer device and storage medium
CN114445356A (en) Multi-resolution-based full-field pathological section image tumor rapid positioning method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination