CN116051953A - Small target detection method based on selectable convolution kernel network and weighted bidirectional feature pyramid - Google Patents

Small target detection method based on selectable convolution kernel network and weighted bidirectional feature pyramid Download PDF

Info

Publication number
CN116051953A
CN116051953A CN202211470248.5A CN202211470248A CN116051953A CN 116051953 A CN116051953 A CN 116051953A CN 202211470248 A CN202211470248 A CN 202211470248A CN 116051953 A CN116051953 A CN 116051953A
Authority
CN
China
Prior art keywords
layer
target detection
small target
network
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211470248.5A
Other languages
Chinese (zh)
Inventor
万久地
潘纯洁
张前进
罗正岳
蒋波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Branch China Tower Co ltd
Original Assignee
Chongqing Branch China Tower Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Branch China Tower Co ltd filed Critical Chongqing Branch China Tower Co ltd
Priority to CN202211470248.5A priority Critical patent/CN116051953A/en
Publication of CN116051953A publication Critical patent/CN116051953A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a small target detection method based on a selectable convolution kernel network and a weighted bidirectional feature pyramid, which belongs to the field of deep learning target detection and specifically comprises the following steps of: s1: performing data enhancement on an original image, calculating a predefined anchor frame by adopting an adaptive anchor frame, scaling the image to the same size by adopting adaptive image scaling, and inputting the processed image into a YOLOv5 backbone network of a selectable convolution kernel network based on spatial attention; s2: the input image is subjected to multi-layer feature extraction through the backbone network to obtain different layers of features; s3: performing cross-layer feature fusion on the features of different layers by using BiFPN to obtain a plurality of fusion features; s4: adding a group of small target detection anchor frames on the YOLOv5 detection layer, and carrying out small target detection on a plurality of fusion features; s5: training the improved network model, and inputting the data set into the trained model to detect the small target.

Description

Small target detection method based on selectable convolution kernel network and weighted bidirectional feature pyramid
Technical Field
The invention belongs to the field of deep learning target detection, and relates to a small target detection method based on a selectable convolution kernel network and a weighted bidirectional feature pyramid
Background
With the development of deep learning, the application of the deep learning in image recognition is becoming wider, and a target detection algorithm based on the deep learning is becoming a research hotspot in the image field in recent years. The small target can be defined in terms of both relative and absolute dimensions as: in images with an area ratio of less than 0.1 or in large data sets, objects smaller than a relatively fixed pixel are defined as small objects.
The deep learning is superior to the recognition and detection of large and medium targets in precision and accuracy, but because the small targets occupy less pixels in the image, the visual information is less, and the small targets are extremely easy to be influenced by environmental factors, the recognition and detection efficiency and precision of the small targets are far lower than those of the large and medium targets.
The generalization capability of the current small target detection model is weak, and the detection effect is poor.
Disclosure of Invention
In view of the above, an object of the present invention is to provide a small target detection method based on a spatial attention selectable convolution kernel network and a weighted bi-directional feature pyramid, which is used for improving the accuracy of small target detection based on deep learning.
In order to achieve the above purpose, the present invention provides the following technical solutions:
a small target detection method based on a selectable convolution kernel network and a weighted bidirectional feature pyramid specifically comprises the following steps:
s1: performing data enhancement on an original image, calculating a predefined anchor frame by adopting an adaptive anchor frame, scaling the image to the same size by adopting adaptive image scaling, and inputting the processed image into a YOLOv5 backbone network of a selectable convolution kernel network based on spatial attention;
s2: the input image is subjected to multi-layer feature extraction through the backbone network to obtain different layers of features;
s3: using a weighted bidirectional feature pyramid network BiFPN to perform cross-layer feature fusion on different layers of features to obtain a plurality of fusion features;
s4: adding a group of small target detection anchor frames on the YOLOv5 detection layer, and carrying out small target detection on a plurality of fusion features;
s5: training the improved network model, and inputting the data set into the trained model to detect the small target.
Further, the data enhancement in step S1 specifically includes: and the original image is subjected to Mosaic data enhancement, four pictures are randomly cut and scaled, then randomly arranged and spliced to form one picture, so that a small sample target is increased while the data set is enriched, and the training speed of a network is improved.
Further, in step S1, the calculating the predefined anchor frame by using the adaptive anchor frame specifically includes: on the basis of the initial anchor frame, the output prediction frame is compared with the real frame, the difference is calculated, then the output prediction frame is reversely updated, and iteration parameters are continuously carried out to obtain the most suitable anchor frame value. The data set is analyzed by adopting k-means clustering and a genetic learning algorithm, and a preset anchor frame suitable for object boundary frame prediction in the data set is obtained.
Further, the YOLOv5 backbone network introducing the optional convolution kernel network based on spatial attention in step S1 is: and integrating a spatial attention mechanism Coordinate Attention into the selectable convolution kernel network SKNet to obtain the selectable convolution kernel network CA-SKNet based on spatial attention, and integrating the CA-SKNet into the C3 convolution module to obtain the improved YOLOv5 backbone network.
Further, in the YOLOv5 backbone network, which introduces a selectable convolution kernel network based on spatial attention:
the spatial attention mechanism aggregates the input features along two spatial directions to obtain a pair of direction perception feature graphs, wherein the sizes of the direction perception feature graphs are C multiplied by H multiplied by 1 and C multiplied by 1 multiplied by W respectively;
Figure BDA0003958220150000021
Figure BDA0003958220150000022
wherein C represents the channel number, H is the input image height, W is the input image width, x c (H, i) represents the value of (H, i) in the coordinate of the feature map C×W×H, x c (j, W) represents the value of (j, W) in the coordinate of (C×W×H) in the feature map, Z c h Represents the result of averaging pooling in the W direction, Z c w Representing the average pooling result in the H direction;
the two characteristic graphs with the sizes of C multiplied by H multiplied by 1 and C multiplied by 1 multiplied by W are spliced and then convolved and normalized to obtain the characteristic graph with the size of C multiplied by H multiplied by 1
Figure BDA0003958220150000023
The feature value is scaled between (0, 1) to obtain weight values in two directions through a sigmoid activation function;
f=δ(F 1 ([z h ,z w ]))
g h =σ(F h (f h ))
g w =σ(F w (f w ))
wherein δ represents a nonlinear transformation, F represents a convolution operation, z h Representing the result of averaging pooling in the W direction, sigma representing the activation function, z w Represents the result of averaging pooling in the H direction, F h Represents the pair f h Performing convolution processing to F w Represents the pair f w Performing convolution processing, wherein f represents the result after convolution and nonlinear transformation, and f h Representation ofz h F, the result after convolution and nonlinear transformation w Representing z w The result after convolution and nonlinear transformation, g represents the weight obtained by the activation function, g h Represents f h Through convolution F h And the weight value g obtained by activating the function w Represents f w Through convolution F w And activating the weight obtained by the function;
the original feature map is weighted and calculated to obtain a feature map with attention weights in the width and height directions:
Figure BDA0003958220150000031
x c (i, j) represents the value of (i, j) in the original feature map C X W X H,
Figure BDA0003958220150000032
represents f h Weight obtained by convolution and activation function, < ->
Figure BDA0003958220150000033
Represents f w Weights, y, obtained by activating the function c (i, j) represents the original feature map and the height direction weight and width direction weight are multiplied to obtain a new feature map.
Further, in step S3, the cross-layer feature fusion is performed on the features of different layers by using the weighted bi-directional feature pyramid network BiFPN, which specifically includes:
four layers of features are respectively positioned in the second layer, the fourth layer, the sixth layer and the ninth layer after the backbone network features are extracted;
the ninth layer of features are subjected to a one-time bottom-up sampling process, and are fused with the backbone network to output four layers of features, and a feature map is respectively output at a tenth layer, a fourteenth layer, an eighteenth layer and a twentieth layer;
the twentieth layer of feature map is subjected to a down sampling process from top to bottom, and is output into four layers with the backbone network, and the previous up sampling output is fused with the four layers of features, and one feature is output into the detection layer for detection at the twenty-first layer, the twenty-fourth layer, the twenty-seventh layer and the thirty-first layer respectively.
In step S4, a group of small target detection layers is added to YOLOv5, and after the feature map output by the weighted bidirectional feature pyramid network BiFPN is obtained, the feature map is sent to the detection layers to perform small target detection, so that the accuracy of small target detection is increased.
Further, the small target detection layer is: the method consists of four detection layers, wherein feature maps with different sizes are used for detecting target objects with different sizes, and 160×160, 80×80, 40×40 and 20×20 feature maps output by a weighted bidirectional feature pyramid network (BiFPN) are detected respectively. And outputting corresponding vectors by each detection layer, and finally generating and marking a prediction boundary box and a category of the target in the original image.
Further, the training of the improved network model in step S5 specifically includes the following steps: firstly, inputting an input end into a backbone network after Mosaic data enhancement, self-adaptive anchor frame calculation and self-adaptive image scaling treatment, and carrying out multi-layer feature extraction on the input image by using a YOLOv5 backbone network which introduces a selectable convolution kernel network based on spatial attention to obtain different layers of features; then, cross-layer feature fusion is carried out on the multi-layer features through a weighted bidirectional feature pyramid network to obtain a plurality of fusion features; and finally, adding a group of small target detection layers to perform target detection on a plurality of fused features, calculating rectangular frame loss through CIOU loss, calculating confidence coefficient loss and classification loss through BCE loss, weighting the three to obtain total loss, and carrying out back propagation to minimize loss so as to update network parameters, iterating to obtain a trained model, inputting a data set into the model, and detecting the small targets.
The invention has the beneficial effects that: the method is improved based on the YOLOv5 model, a selectable convolution kernel network based on spatial attention is integrated into a backbone network, small targets can be better extracted, the loss of the small target features is avoided, the features of different layers are fused by using a weighted bidirectional feature pyramid, small target information in the features is enriched, a small target detection layer is increased, and the detection effect on small targets in images is improved.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objects and other advantages of the invention may be realized and obtained by means of the instrumentalities and combinations particularly pointed out in the specification.
Drawings
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in the following preferred detail with reference to the accompanying drawings, in which:
FIG. 1 is a flow chart of a small target detection method based on a selectable convolution kernel network and a weighted bi-directional feature pyramid according to the present invention;
FIG. 2 is a diagram of the improved YOLOv5 model of the present invention;
FIG. 3 is a diagram of an alternative convolution kernel network (CA-SKNet) architecture based on spatial attention;
fig. 4 is a diagram of the structure of the four detection layers of the head.
Detailed Description
Other advantages and effects of the present invention will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present invention with reference to specific examples. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention. It should be noted that the illustrations provided in the following embodiments merely illustrate the basic idea of the present invention by way of illustration, and the following embodiments and features in the embodiments may be combined with each other without conflict.
Wherein the drawings are for illustrative purposes only and are shown in schematic, non-physical, and not intended to limit the invention; for the purpose of better illustrating embodiments of the invention, certain elements of the drawings may be omitted, enlarged or reduced and do not represent the size of the actual product; it will be appreciated by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The same or similar reference numbers in the drawings of embodiments of the invention correspond to the same or similar components; in the description of the present invention, it should be understood that, if there are terms such as "upper", "lower", "left", "right", "front", "rear", etc., that indicate an azimuth or a positional relationship based on the azimuth or the positional relationship shown in the drawings, it is only for convenience of describing the present invention and simplifying the description, but not for indicating or suggesting that the referred device or element must have a specific azimuth, be constructed and operated in a specific azimuth, so that the terms describing the positional relationship in the drawings are merely for exemplary illustration and should not be construed as limiting the present invention, and that the specific meaning of the above terms may be understood by those of ordinary skill in the art according to the specific circumstances.
As shown in fig. 1, this embodiment discloses a small target detection method based on a selectable convolution kernel network and a weighted bidirectional feature pyramid of spatial attention, which specifically includes the following steps:
the invention firstly provides a small target detection method based on a selectable convolution kernel network and a weighted bidirectional feature pyramid of spatial attention, and one embodiment of the invention has the following implementation process: based on the small target data set visclone 2019 containing the respective corresponding preset tags, this data set is captured by various unmanned aerial vehicle cameras according to 4:1 divide training and test sets.
The data enhancement rich data set is made to the data set, the self-adaptive anchor frame calculates the predefined anchor frame and the self-adaptive image scaling process scales the image to the same size 640 x 640, and the processed image is input into the backbone network.
In order to improve the efficiency and the precision of small target detection in an image, the embodiment improves on the basis of a YOLOv5 model, so that the model improves the efficiency and the precision of small target identification. YOLOv5 introduces a selectable convolution kernel network (CA-SKNet) based on spatial attention as shown in fig. 3, so as to obtain a new Backbone network, such as the Backbone part of fig. 2, and inputs the processed image into an improved Backbone network to perform multi-layer feature extraction to obtain different layers of features.
Specifically, in this embodiment, the CA-SKNet is an improvement on a selectable convolution kernel network (SKNet), and the spatial attention mechanism Coordinate Attention is incorporated to solve the defect that SKNet only considers channel information and ignores spatial information, so as to obtain an improved selectable convolution kernel network (CA-SKNet) based on spatial attention, and different convolution kernel sizes, namely different receptive fields, are selected through the spatial attention mechanism, so that characteristics of different sizes can be obtained, and further new weighting characteristics are obtained; then merging the selectable convolution kernel network based on the spatial attention into a C3 module, and aggregating the input features along two spatial directions in the selectable convolution kernel network based on the spatial attention to obtain a pair of direction perception feature graphs with the sizes of C multiplied by H multiplied by 1 and C multiplied by 1 multiplied by W respectively;
Figure BDA0003958220150000051
Figure BDA0003958220150000052
wherein C represents the number of channels, H is the input image height, W is the input image width, xc (H, i) represents the value of (H, i) in the feature map C x W x H, xc (j, W) represents the value of (j, W) in the feature map C x W x H,
Figure BDA0003958220150000053
represents the average pooling result in the W direction, < >>
Figure BDA0003958220150000054
Representing the results of the average pooling in the H direction.
The two characteristic graphs with the sizes of C multiplied by H multiplied by 1 and C multiplied by 1 multiplied by W are spliced and then subjected to convolution and normalization steps to obtain the characteristic graph with the size of C multiplied by H multiplied by 1 and C multiplied by 1 multiplied by W
Figure BDA0003958220150000061
Is characterized by after sigmoid activation functionScaling the characteristic value to a value between (0 and 1) to obtain weights in two directions;
f=δ(F 1 ([z h ,z w ]))
g h =σ(F h (f h ))
g w =σ(F w (f w ))
wherein delta represents nonlinear transformation, F represents convolution operation, sigma represents activation function, F represents the result after convolution and nonlinear transformation, and g represents weight obtained by the activation function.
The original feature map is weighted and calculated to obtain the feature map with attention weights in the width and height directions.
Figure BDA0003958220150000062
The structure of the alternative convolution kernel network based on spatial attention (CA-SKNet) of this embodiment is shown in FIG. 3. The improved backbone network is utilized to obtain a plurality of different layers of features, the deep features contain rich semantic information, but the resolution is very low, the perception capability of details is poor, the resolution of the shallow features is high, more detail information is contained, and useless noise information is also high. Therefore, the weighted bidirectional feature pyramid is used for feature fusion of different layers of features, such as the Neck part of FIG. 2, so as to obtain a plurality of fusion features.
Specifically, a weighted bidirectional feature pyramid network (BiFPN) is used, four layers of features are shared after backbone network feature extraction, the features are respectively located in a second layer, a fourth layer, a sixth layer and a ninth layer, the features of the ninth layer are fused with the features of the fourth layer output by the backbone network through a bottom-up sampling process, a feature map is respectively output at a tenth layer, a fourteenth layer, an eighteenth layer and a twentieth layer, the feature map of the twentieth layer is subjected to a top-down sampling process, and the features of the fourth layer output by the backbone network and the features of the fourth layer output by the previous up sampling process are fused, and the features of the fourth layer, the twenty-seventh layer and the thirty-fourth layer are respectively output to be sent to a detection layer for detection.
Then, in this embodiment, a group of small target detection anchor frames is added to the improved YOLOv5 detection layer, as shown in fig. 4, a 160×160 feature map output by a weighted bidirectional feature pyramid network (BiFPN) is obtained, and then sent to the detection layer to perform small target detection, so as to increase the accuracy of small target detection.
And finally, training the improved network to obtain a trained model, and inputting the data set into the model to detect the small target.
Finally, it is noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made thereto without departing from the spirit and scope of the present invention, which is intended to be covered by the claims of the present invention.

Claims (9)

1. A small target detection method based on a selectable convolution kernel network and a weighted bidirectional feature pyramid is characterized by comprising the following steps of: the method specifically comprises the following steps:
s1: performing data enhancement on an original image, calculating a predefined anchor frame by adopting an adaptive anchor frame, scaling the image to the same size by adopting adaptive image scaling, and inputting the processed image into a YOLOv5 backbone network of a selectable convolution kernel network based on spatial attention;
s2: the input image is subjected to multi-layer feature extraction through the backbone network to obtain different layers of features;
s3: using a weighted bidirectional feature pyramid network BiFPN to perform cross-layer feature fusion on different layers of features to obtain a plurality of fusion features;
s4: adding a group of small target detection anchor frames on the YOLOv5 detection layer, and carrying out small target detection on a plurality of fusion features;
s5: training the improved network model, and inputting the data set into the trained model to detect the small target.
2. The small target detection method based on the selectable convolution kernel network and the weighted bidirectional feature pyramid according to claim 1, wherein the small target detection method comprises the following steps: the data enhancement in step S1 specifically includes: and performing Mosaic data enhancement on the original image, randomly cutting and scaling the four pictures, and randomly arranging and splicing the four pictures to form one picture.
3. The small target detection method based on the selectable convolution kernel network and the weighted bidirectional feature pyramid according to claim 1, wherein the small target detection method comprises the following steps: in step S1, the calculating the predefined anchor frame by using the adaptive anchor frame specifically includes: comparing the output predicted frame with the real frame on the basis of the initial anchor frame, calculating the difference, then reversely updating, and continuously iterating parameters to obtain the most suitable anchor frame value; and (3) analyzing the data set by adopting a k-means clustering and genetic learning algorithm to obtain a preset anchor frame suitable for object boundary frame prediction in the data set.
4. The small target detection method based on the selectable convolution kernel network and the weighted bidirectional feature pyramid according to claim 1, wherein the small target detection method comprises the following steps: the YOLOv5 backbone network that introduces the optional convolution kernel network based on spatial attention in step S1 is: and integrating a spatial attention mechanism Coordinate Attention into the selectable convolution kernel network SKNet to obtain the selectable convolution kernel network CA-SKNet based on spatial attention, and integrating the CA-SKNet into the C3 convolution module to obtain the improved YOLOv5 backbone network.
5. The small target detection method based on the selectable convolution kernel network and the weighted bidirectional feature pyramid according to claim 1, wherein the small target detection method comprises the following steps: in the YOLOv5 backbone network, which introduces a selectable convolution kernel network based on spatial attention:
the spatial attention mechanism aggregates the input features along two spatial directions to obtain a pair of direction perception feature graphs, wherein the sizes of the direction perception feature graphs are C multiplied by H multiplied by 1 and C multiplied by 1 multiplied by W respectively;
Figure FDA0003958220140000011
Figure FDA0003958220140000021
wherein C represents the channel number, H is the input image height, W is the input image width, x c (H, i) represents the value of (H, i) in the coordinate of the feature map C×W×H, x c (j, W) represents a value of (j, W) in the coordinates in the feature map C X W X H,
Figure FDA0003958220140000022
represents the average pooling result in the W direction, < >>
Figure FDA0003958220140000023
Representing the average pooling result in the H direction;
the two characteristic graphs with the sizes of C multiplied by H multiplied by 1 and C multiplied by 1 multiplied by W are spliced and then convolved and normalized to obtain the characteristic graph with the size of C multiplied by H multiplied by 1
Figure FDA0003958220140000024
The feature value is scaled between (0, 1) to obtain weight values in two directions through a sigmoid activation function; />
f=δ(F 1 ([z h ,z w ]))
g h =σ(F h (f h ))
g w =σ(F w (f w ))
Wherein δ represents a nonlinear transformation, F represents a convolution operation, z h Representing the result of averaging pooling in the W direction, sigma representing the activation function, z w Represents the result of averaging pooling in the H direction, F h Represents the pair f h Performing convolution processing to F w Represents the pair f w Performing convolution processing, wherein f represents the result after convolution and nonlinear transformation, and f h Representing z h F, the result after convolution and nonlinear transformation w Representing z w The result after convolution and nonlinear transformation,g represents the weight obtained by activating the function, g h Represents f h Through convolution F h And the weight value g obtained by activating the function w Represents f w Through convolution F w And activating the weight obtained by the function;
the original feature map is weighted and calculated to obtain a feature map with attention weights in the width and height directions:
Figure FDA0003958220140000025
x c (i, j) represents the value of (i, j) in the original feature map C X W X H,
Figure FDA0003958220140000026
represents f h Weight obtained by convolution and activation function>
Figure FDA0003958220140000027
Represents f w Weights, y, obtained by activating the function c (i, j) represents the original feature map and the height direction weight and width direction weight are multiplied to obtain a new feature map.
6. The small target detection method based on the selectable convolution kernel network and the weighted bidirectional feature pyramid according to claim 1, wherein the small target detection method comprises the following steps: in step S3, the cross-layer feature fusion is performed on the features of different layers by using the weighted bi-directional feature pyramid network BiFPN, which specifically includes:
four layers of features are respectively positioned in the second layer, the fourth layer, the sixth layer and the ninth layer after the backbone network features are extracted;
the ninth layer of features are subjected to a one-time bottom-up sampling process, and are fused with the backbone network to output four layers of features, and a feature map is respectively output at a tenth layer, a fourteenth layer, an eighteenth layer and a twentieth layer;
the twentieth layer of feature map is subjected to a down sampling process from top to bottom, and is output into four layers with the backbone network, and the previous up sampling output is fused with the four layers of features, and one feature is output into the detection layer for detection at the twenty-first layer, the twenty-fourth layer, the twenty-seventh layer and the thirty-first layer respectively.
7. The small target detection method based on the selectable convolution kernel network and the weighted bidirectional feature pyramid according to claim 1, wherein the small target detection method comprises the following steps: in step S4, a group of small target detection layers are added in the YOLOv5, and after the feature images output by the weighted bidirectional feature pyramid network BiFPN are obtained, the feature images are sent to the detection layers to carry out small target detection, so that the accuracy of small target detection is improved.
8. The method for small object detection based on a selectable convolutional kernel network and a weighted bi-directional feature pyramid of claim 7, wherein: the small target detection layer is as follows: the method comprises the steps of forming four layers of detection layers, wherein feature graphs with different sizes are used for detecting target objects with different sizes, and respectively detecting 160×160, 80×80, 40×40 and 20×20 feature graphs output by a weighted bidirectional feature pyramid network; and outputting corresponding vectors by each detection layer, and finally generating and marking a prediction boundary box and a category of the target in the original image.
9. The small target detection method based on the selectable convolution kernel network and the weighted bidirectional feature pyramid according to claim 1, wherein the small target detection method comprises the following steps: the training of the improved network model in step S5 specifically includes the following steps: firstly, inputting an input end into a backbone network after Mosaic data enhancement, self-adaptive anchor frame calculation and self-adaptive image scaling treatment, and carrying out multi-layer feature extraction on the input image by using a YOLOv5 backbone network which introduces a selectable convolution kernel network based on spatial attention to obtain different layers of features; then, cross-layer feature fusion is carried out on the multi-layer features through a weighted bidirectional feature pyramid network to obtain a plurality of fusion features; and finally, adding a group of small target detection layers to perform target detection on a plurality of fused features, calculating rectangular frame loss, BCEloss (binary coded decimal) confidence coefficient loss and classification loss through CIOUloss, weighting the three to obtain total loss, and carrying out back propagation to minimize loss so as to update network parameters, iterating to obtain a trained model, inputting a data set into the model, and detecting the small targets.
CN202211470248.5A 2022-11-23 2022-11-23 Small target detection method based on selectable convolution kernel network and weighted bidirectional feature pyramid Pending CN116051953A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211470248.5A CN116051953A (en) 2022-11-23 2022-11-23 Small target detection method based on selectable convolution kernel network and weighted bidirectional feature pyramid

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211470248.5A CN116051953A (en) 2022-11-23 2022-11-23 Small target detection method based on selectable convolution kernel network and weighted bidirectional feature pyramid

Publications (1)

Publication Number Publication Date
CN116051953A true CN116051953A (en) 2023-05-02

Family

ID=86115214

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211470248.5A Pending CN116051953A (en) 2022-11-23 2022-11-23 Small target detection method based on selectable convolution kernel network and weighted bidirectional feature pyramid

Country Status (1)

Country Link
CN (1) CN116051953A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116363124A (en) * 2023-05-26 2023-06-30 南京杰智易科技有限公司 Steel surface defect detection method based on deep learning
CN116532046A (en) * 2023-07-05 2023-08-04 南京邮电大学 Microfluidic automatic feeding device and method for spirofluorene xanthene
CN116612087A (en) * 2023-05-22 2023-08-18 山东省人工智能研究院 Coronary artery CTA stenosis detection method based on YOLOv5-LA
CN116664573A (en) * 2023-07-31 2023-08-29 山东科技大学 Downhole drill rod number statistics method based on improved YOLOX
CN116665156A (en) * 2023-07-28 2023-08-29 苏州中德睿博智能科技有限公司 Multi-scale attention-fused traffic helmet small target detection system and method
CN116682014A (en) * 2023-06-07 2023-09-01 无锡照明股份有限公司 Method, device, equipment and storage medium for dividing lamp curtain building image
CN117197475A (en) * 2023-09-20 2023-12-08 南京航空航天大学 Target detection method for large-range multi-interference-source scene
CN116682014B (en) * 2023-06-07 2024-07-05 无锡照明股份有限公司 Method, device, equipment and storage medium for dividing lamp curtain building image

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116612087A (en) * 2023-05-22 2023-08-18 山东省人工智能研究院 Coronary artery CTA stenosis detection method based on YOLOv5-LA
CN116612087B (en) * 2023-05-22 2024-02-23 山东省人工智能研究院 Coronary artery CTA stenosis detection method based on YOLOv5-LA
CN116363124A (en) * 2023-05-26 2023-06-30 南京杰智易科技有限公司 Steel surface defect detection method based on deep learning
CN116682014A (en) * 2023-06-07 2023-09-01 无锡照明股份有限公司 Method, device, equipment and storage medium for dividing lamp curtain building image
CN116682014B (en) * 2023-06-07 2024-07-05 无锡照明股份有限公司 Method, device, equipment and storage medium for dividing lamp curtain building image
CN116532046A (en) * 2023-07-05 2023-08-04 南京邮电大学 Microfluidic automatic feeding device and method for spirofluorene xanthene
CN116532046B (en) * 2023-07-05 2023-10-10 南京邮电大学 Microfluidic automatic feeding device and method for spirofluorene xanthene
CN116665156A (en) * 2023-07-28 2023-08-29 苏州中德睿博智能科技有限公司 Multi-scale attention-fused traffic helmet small target detection system and method
CN116664573A (en) * 2023-07-31 2023-08-29 山东科技大学 Downhole drill rod number statistics method based on improved YOLOX
CN116664573B (en) * 2023-07-31 2024-02-09 山东科技大学 Downhole drill rod number statistics method based on improved YOLOX
CN117197475A (en) * 2023-09-20 2023-12-08 南京航空航天大学 Target detection method for large-range multi-interference-source scene
CN117197475B (en) * 2023-09-20 2024-02-20 南京航空航天大学 Target detection method for large-range multi-interference-source scene

Similar Documents

Publication Publication Date Title
CN109977918B (en) Target detection positioning optimization method based on unsupervised domain adaptation
CN116051953A (en) Small target detection method based on selectable convolution kernel network and weighted bidirectional feature pyramid
CN108564097B (en) Multi-scale target detection method based on deep convolutional neural network
CN111027493B (en) Pedestrian detection method based on deep learning multi-network soft fusion
CN108154102B (en) Road traffic sign identification method
CN114202672A (en) Small target detection method based on attention mechanism
CN111767882A (en) Multi-mode pedestrian detection method based on improved YOLO model
CN110782420A (en) Small target feature representation enhancement method based on deep learning
CN111079739B (en) Multi-scale attention feature detection method
CN111563502A (en) Image text recognition method and device, electronic equipment and computer storage medium
CN110309842B (en) Object detection method and device based on convolutional neural network
CN110647802A (en) Remote sensing image ship target detection method based on deep learning
CN114724120B (en) Vehicle target detection method and system based on radar vision semantic segmentation adaptive fusion
CN112613343B (en) River waste monitoring method based on improved YOLOv4
CN113052185A (en) Small sample target detection method based on fast R-CNN
CN110136162B (en) Unmanned aerial vehicle visual angle remote sensing target tracking method and device
CN115631344B (en) Target detection method based on feature self-adaptive aggregation
CN116091946A (en) Yolov 5-based unmanned aerial vehicle aerial image target detection method
CN112348056A (en) Point cloud data classification method, device, equipment and readable storage medium
CN111583322A (en) Depth learning-based 2D image scene depth prediction and semantic segmentation method and system
CN117437691A (en) Real-time multi-person abnormal behavior identification method and system based on lightweight network
CN116630637A (en) optical-SAR image joint interpretation method based on multi-modal contrast learning
CN107729992B (en) Deep learning method based on back propagation
CN116958615A (en) Picture identification method, device, equipment and medium
CN113420760A (en) Handwritten Mongolian detection and identification method based on segmentation and deformation LSTM

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination