CN114419589A - Road target detection method based on attention feature enhancement module - Google Patents

Road target detection method based on attention feature enhancement module Download PDF

Info

Publication number
CN114419589A
CN114419589A CN202210049982.8A CN202210049982A CN114419589A CN 114419589 A CN114419589 A CN 114419589A CN 202210049982 A CN202210049982 A CN 202210049982A CN 114419589 A CN114419589 A CN 114419589A
Authority
CN
China
Prior art keywords
feature map
attention
feature
module
inputting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210049982.8A
Other languages
Chinese (zh)
Inventor
潘树国
孙迎春
高旺
彭雅慧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202210049982.8A priority Critical patent/CN114419589A/en
Publication of CN114419589A publication Critical patent/CN114419589A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • G06F18/2193Validation; Performance evaluation; Active pattern learning techniques based on specific statistical tests
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Abstract

The invention discloses a road target detection method based on an attention feature enhancement module, and belongs to the technical field of target detection. Firstly, constructing a convolution nerve module to extract the characteristics of a road target to be detected in an original image to obtain input characteristic graphs with different sizes; then constructing an attention feature enhancement module comprising a CBAM attention mechanism and a semantic enhancement branch, and performing feature enhancement on the obtained feature map; and finally, performing classification regression by using a decoupling head based on the enhanced feature map containing deep semantic information and shallow texture information to complete target detection. The BDD100K data set detection result shows that the average precision rate of the method disclosed by the invention is improved by 1.8%; the detection result of the PASCAL VOC 2007 data set shows that the average precision rate of the method disclosed by the invention is improved by 0.6%.

Description

Road target detection method based on attention feature enhancement module
Technical Field
The invention belongs to the technical field of Object detection, and particularly relates to a road Object detection method based on an attention feature enhancement module.
Background
With the increasing of automobile holding capacity, the travel safety problem is increasingly prominent, and the automatic driving technology based on computer vision provides a new solution for the traffic problem, and is paid more and more attention and researched by more and more countries. When the traditional target detection algorithm is used for detecting the road target, the problems of poor screening of distinguishing features, high missing rate, low recall rate and the like exist, so that the improvement of the precision of the road target detection algorithm in a complex traffic scene has important significance.
The deep convolutional neural network has stronger robustness because the deep convolutional neural network can independently complete the learning of target characteristics and extract key information. In recent years, a target detection model based on a convolutional neural network mainly has two ideas, namely a target candidate box idea and a regression idea, and correspondingly generated algorithms are called a two-stage algorithm and a single-stage algorithm. A two-stage detection algorithm represented by R-CNN, Fast R-CNN, Faster R-CNN, R-FCN and the like firstly extracts a target candidate frame, and then completes model training based on the extracted candidate frame by using a detection network. The single-stage detection algorithm represented by the algorithms such as SSD, YOLO, YOLOv3 and the like has higher detection speed by directly detecting the type and the position information of the network regression target. However, because the contribution degrees of different feature maps and even different regions in the same feature map to the target are different, the features obtained by the current detection algorithm always have universality and redundancy, and the task requirements cannot be met accurately.
Disclosure of Invention
In order to solve the problems, the invention discloses a road target detection method based on an attention feature enhancement module, which increases an attention mechanism to distinguishably extract target features, increases the expression of the features of a task interesting area, and simultaneously adds a semantic enhancement branch to spread the features with strong semantics, thereby effectively improving the detection precision of road targets compared with other advanced target detection algorithms.
In order to achieve the purpose, the technical scheme of the invention is as follows:
a road target detection method based on an attention feature enhancement module comprises the following steps:
step 1, acquiring image information of a road target to be detected;
step 2, constructing a convolutional neural network, extracting the characteristics representing the road target from the image information, and obtaining input characteristic graphs of different sizes;
step 3, constructing an attention feature enhancement module, wherein the attention feature enhancement module comprises a CBAM (CBAM) attention mechanism and a semantic enhancement branch, and performing feature enhancement on the input feature map obtained in the step 2 through the attention feature enhancement module, so as to improve the feature expression of the task region of interest and obtain a feature map containing deep semantic information and shallow texture information;
and 4, classifying and regressing by adopting a decoupling output head based on the enhanced characteristic diagram obtained in the step 3, and outputting a detection result.
The method for detecting the road target based on the attention feature enhancing module further comprises the following steps: step 1.3 the concrete steps of constructing the attention feature enhancing module comprise:
step 1.3.1, input feature map of size original image 1/32
Figure BDA0003474099780000021
Inputting the feature map into a CBAM module
Figure BDA0003474099780000022
Will be provided with
Figure BDA0003474099780000023
Performing convolution operation, batch normalization and activation function processing, and then performing upsampling to obtain a semantic enhanced feature map with the size of the original image 1/16
Figure BDA0003474099780000024
Step 1.3.2 input feature map of size original image 1/16
Figure BDA0003474099780000025
And semantically enhanced feature maps
Figure BDA0003474099780000026
Carrying out addition operation to obtain a feature map
Figure BDA0003474099780000027
Will be provided with
Figure BDA0003474099780000028
Inputting the feature map into a CBAM module
Figure BDA0003474099780000029
Will feature map
Figure BDA00034740997800000210
And characteristic diagram
Figure BDA00034740997800000211
Performing a stitching operation on the channel dimension to obtain a semantic enhanced feature map with the size of the original image 1/16
Figure BDA00034740997800000212
Step 1.3.3, the feature map is to be enhanced
Figure BDA00034740997800000213
Inputting the feature map into a CSPLAyer unit, performing convolution operation, batch normalization and activation function processing on the obtained feature map, and then performing up-sampling on the feature map to obtain a semantic enhanced feature map with the size of an original image 1/8
Figure BDA00034740997800000214
Step 1.3.4, input feature map of size original image 1/8
Figure BDA00034740997800000215
And semantically enhanced feature maps
Figure BDA00034740997800000216
Carrying out addition operation to obtain a feature map
Figure BDA00034740997800000217
Will be provided with
Figure BDA00034740997800000218
Inputting the feature map into a CBAM module
Figure BDA00034740997800000219
Will feature map
Figure BDA00034740997800000220
And characteristic diagram
Figure BDA00034740997800000221
Performing a stitching operation on the channel dimension to obtain a semantic enhanced feature map with the size of the original image 1/8
Figure BDA00034740997800000222
Step 1.3.5, the feature map is enhanced
Figure BDA00034740997800000223
Inputting the characteristic diagram into a CSPLAyer unit, and using the obtained characteristic diagram for detecting the target by the decoupling head.
The method for detecting the road target based on the attention feature enhancing module further comprises the following steps: the CBAM module described in step 1.3.1, step 1.3.2 and step 1.3.4 includes two processing steps of generating a channel attention feature map and a spatial attention feature map:
performing global mean pooling and global maximum pooling on the input feature map F to obtain spatial information of the input features, and generating
Figure BDA00034740997800000224
And
Figure BDA00034740997800000225
two spatial context descriptors.
Figure BDA00034740997800000226
And
Figure BDA00034740997800000227
a channel attention map is generated by a multi-layer perceptron including a hidden layer. When the size of the input feature map is
Figure BDA00034740997800000228
Channel attention feature map Mc(F) The detailed calculation formula is as follows:
Figure BDA00034740997800000229
wherein the content of the first and second substances,
Figure BDA00034740997800000230
r represents a reduction value of the bottleneck structure of the multilayer perceptron, and the value is 16; σ (-) represents a Sigmoid activation function; r (-) represents a ReLU linear rectification function; g (-) is a global mean pooling function; δ (-) is a global maximum pooling function.
The Sigmoid activation function σ () is calculated as:
Figure BDA0003474099780000031
the calculation formula of the ReLU linear rectification function R (-) is as follows:
Figure BDA0003474099780000032
the global mean pooling function g (-) is calculated as:
Figure BDA0003474099780000033
the global maximum pooling function δ (-) is calculated as:
Figure BDA0003474099780000034
in generating the spatial attention feature map, the input feature map F' is first subjected to pooling operations including mean pooling and maximum pooling along the channel axis, and generated
Figure BDA0003474099780000035
And
Figure BDA0003474099780000036
two-dimensional feature maps. Splicing the generated two-dimensional characteristic graphs in the channel dimension, and performing convolution through a standard convolution layer to generate a space attention characteristic graph Ms(F'). Spatial attention feature map Ms(F') the detailed calculation formula is:
Figure BDA0003474099780000037
wherein the content of the first and second substances,
Figure BDA0003474099780000038
f7×7which represents a convolution operation with a convolution kernel of 7 x 7.
The final output attention feature mapping calculation formula of the CBAM module is as follows:
Figure BDA0003474099780000039
the method for detecting the road target based on the attention feature enhancing module further comprises the following steps: the specific steps of the CSPLAyer unit in step 1.3.3 and step 1.3.5 include:
firstly, the methodInputting a feature map F1Carrying out convolution operation, batch normalization and activation function processing to obtain a characteristic diagram F11
Then inputting a feature map F1Inputting another branch, performing convolution operation, batch normalization and activation function processing to obtain a feature map F21Will F21Successively carrying out three operations in successive residual bottleneck blocks to obtain a characteristic diagram F22
Finally, the feature map F11And characteristic diagram F22Stitching in channel dimension to obtain feature map F31And apply the feature map F31And carrying out convolution operation, batch normalization and activation function processing for subsequent operation.
The invention has the beneficial effects that:
the invention provides a road target detection method based on an attention feature enhancement module. Compared with the basic YOLOX-L algorithm, the road target detection method provided by the invention has the advantages that the detection speed is realized, and the detection precision can be improved. The method proposed herein improved the average accuracy by 1.8% on the BDD100K data set and by 0.6% on the PASCAL VOC 2007 test set. With the increasing of automobile holding capacity, the travel safety problem is increasingly prominent, and the automatic driving technology based on computer vision provides a new solution for the traffic problem, and is paid more and more attention and researched by more and more countries. However, when the traditional target detection algorithm detects the road target in a complex traffic scene, the problems of high missed detection rate, low recall rate and the like exist, so that the improvement of the precision of the road target detection algorithm has important significance.
Drawings
FIG. 1 is a flow chart of the present method;
FIG. 2 is a diagram of a CBAM attention model architecture;
FIG. 3 is a diagram of the AFE-YOLOX algorithm network.
Detailed Description
The present invention will be further illustrated with reference to the accompanying drawings and specific embodiments, which are to be understood as merely illustrative of the invention and not as limiting the scope of the invention.
The invention uses the BDD100K data set and the VOC data set to carry out experiments on the proposed road target detection method based on the attention feature enhancement module.
Firstly, acquiring image information of a road target to be detected, constructing a convolutional neural network on the basis of the image information to acquire the characteristics of the target to be detected from the image information, and acquiring input characteristic graphs with different sizes; then, an attention feature enhancement module consisting of a CBAM attention mechanism and a semantic enhancement branch performs feature enhancement on the input feature map to obtain an enhanced feature map containing deep semantic information and shallow texture information; and finally, classifying and regressing the enhanced feature map by using the decoupling head to obtain a detection result.
Step 1, constructing a convolutional neural network to acquire the characteristics of a target to be detected from image information, and acquiring input characteristic diagrams with different sizes:
step 1.1, constructing a convolutional neural network for extracting target features
The constructed neural network structure is shown in the CSPDarknet part of fig. 3. The input image firstly passes through a Focus network module to obtain a feature map with the width and height dimensions of 1/2 being the original image and the number of channels being 4 times that of the original image, and then passes through operations of a connected module and a CSP layer unit for multiple times to obtain input feature maps with the dimensions of 1/8, 1/16 and 1/32 respectively.
The specific implementation process of the Focus network module is as follows: and taking a value of every other pixel of the input image, stacking the obtained 4 independent feature layers, and concentrating the width and height information into channel information. The concrete implementation process of the conditional module is as follows: and performing convolution operation on the input feature layer, performing batch normalization processing on the result obtained by the convolution operation, and performing activation function operation. Specific embodiments of the CSPLayer unit are: firstly, inputting a feature map F1Carrying out convolution operation, batch normalization and activation function processing to obtain a characteristic diagram F11(ii) a Then inputting a feature map F1Inputting another branch, performing convolution operation, batch normalization and activation function processing to obtain a feature map F21Will F21Successively carrying out three operations in successive residual bottleneck blocks to obtain a characteristic diagram F22(ii) a Finally, the feature map F11And characteristic diagram F22Stitching in channel dimension to obtain feature map F31And apply the feature map F31And carrying out convolution operation, batch normalization and activation function processing for subsequent operation.
Step 2, constructing an attention feature enhancement module, wherein the attention feature enhancement module comprises a CBAM (CBAM) attention mechanism and a semantic enhancement branch, and performing feature enhancement on the input feature map obtained in the step 1 through the attention feature enhancement module to obtain an enhanced feature map:
step 2.1, input feature map with size of original image 1/32
Figure BDA0003474099780000051
Inputting the feature map into a CBAM module
Figure BDA0003474099780000052
Will be provided with
Figure BDA0003474099780000053
Performing convolution operation, batch normalization and activation function processing, and then performing upsampling to obtain a semantic enhanced feature map with the size of the original image 1/16
Figure BDA0003474099780000054
Step 2.2, input feature map with size of original image 1/16
Figure BDA0003474099780000055
And semantically enhanced feature maps
Figure BDA0003474099780000056
Carrying out addition operation to obtain a feature map
Figure BDA0003474099780000057
Will be provided with
Figure BDA0003474099780000058
Inputting the feature map into a CBAM module
Figure BDA0003474099780000059
Will feature map
Figure BDA00034740997800000510
And characteristic diagram
Figure BDA00034740997800000511
Performing a stitching operation on the channel dimension to obtain a semantic enhanced feature map with the size of the original image 1/16
Figure BDA00034740997800000512
Step 2.3, enhancing the feature map
Figure BDA00034740997800000513
Inputting the feature map into a CSPLAyer unit, performing convolution operation, batch normalization and activation function processing on the obtained feature map, and then performing up-sampling on the feature map to obtain a semantic enhanced feature map with the size of an original image 1/8
Figure BDA00034740997800000514
Step 2.4, input feature map with size of original image 1/8
Figure BDA00034740997800000515
And semantically enhanced feature maps
Figure BDA00034740997800000516
Carrying out addition operation to obtain a feature map
Figure BDA00034740997800000517
Will be provided with
Figure BDA00034740997800000518
Inputting the feature map into a CBAM module
Figure BDA00034740997800000519
Will feature map
Figure BDA00034740997800000520
And characteristic diagram
Figure BDA00034740997800000521
Performing a stitching operation on the channel dimension to obtain a semantic enhanced feature map with the size of the original image 1/8
Figure BDA00034740997800000522
Step 2.5, enhancing the feature map
Figure BDA00034740997800000523
Inputting the characteristic diagram into a CSPLAyer unit, and using the obtained characteristic diagram for detecting the target by the decoupling head.
The CBAM modules in step 2.1, step 2.2, and step 2.4 are shown in fig. 2. The CBAM module comprises two processing steps of generating a channel attention feature map and a space attention feature map:
performing global mean pooling and global maximum pooling on the input feature map F to obtain spatial information of the input features, and generating
Figure BDA00034740997800000524
And
Figure BDA00034740997800000525
two spatial context descriptors.
Figure BDA00034740997800000526
And
Figure BDA00034740997800000527
a channel attention map is generated by a multi-layer perceptron including a hidden layer. When the size of the input feature map is
Figure BDA00034740997800000528
ChannelAttention feature map Mc(F) The detailed calculation formula is as follows:
Figure BDA00034740997800000529
wherein the content of the first and second substances,
Figure 1
r represents a reduction value of the bottleneck structure of the multilayer perceptron, and the value is 16; σ (-) represents a Sigmoid activation function; r (-) represents a ReLU linear rectification function; g (-) is a global mean pooling function; δ (-) is a global maximum pooling function.
In generating the spatial attention feature map, the input feature map F' is first subjected to pooling operations including mean pooling and maximum pooling along the channel axis, and generated
Figure BDA0003474099780000062
And
Figure BDA0003474099780000063
two-dimensional feature maps. Splicing the generated two-dimensional characteristic graphs in the channel dimension, and performing convolution through a standard convolution layer to generate a space attention characteristic graph Ms(F'). Spatial attention feature map Ms(F') the detailed calculation formula is:
Figure BDA0003474099780000064
wherein the content of the first and second substances,
Figure BDA0003474099780000065
f7×7which represents a convolution operation with a convolution kernel of 7 x 7.
The final output attention feature mapping calculation formula of the CBAM module is as follows:
Figure BDA0003474099780000066
and 3, classifying and regressing by adopting a decoupling output head based on the enhanced characteristic diagram obtained in the step 2, and outputting a detection result.
Step 3.1, enhancing the feature map F1Carrying out 1 multiplied by 1 convolution operation to reduce dimensionality and obtaining a characteristic diagram F with 256 channels11Will F11Carrying out 3 multiplied by 3 convolution operation, batch normalization and activation function processing on the input classification branch to obtain a feature map F12To F12Performing convolution 1 multiplied by 1 convolution operation to obtain a feature map F with the number of channels as the number of target categories13
Step 3.2, enhancing the feature map F1Carrying out 1 multiplied by 1 convolution operation to reduce dimensionality and obtaining a characteristic diagram F with 256 channels21Will F21Inputting regression branches to carry out 3 multiplied by 3 convolution operation, batch normalization and activation function processing to obtain a feature map F22To F22Performing convolution 1 multiplied by 1 convolution operation to obtain a characteristic diagram F with the number of channels as the number of target coordinates23
Step 3.3, to feature graph F22Performing convolution 1 multiplied by 1 convolution operation to obtain a characteristic diagram F with the number of channels being the number of anchor frames33
Step 3.4, converting the characteristic diagram F13、F23And F33And splicing to obtain a detection result characteristic diagram.
Table 1 compares the overall detection results before and after adding the attention feature enhancement module and the various target detection results on the BDD100K dataset based on the YOLOX-L algorithm. The detection average accuracy rate of the AFE-YOLOX-L algorithm added with the attention feature enhancement module on 7 types of road targets reaches 59.0%, compared with the YOLOX-L algorithm, the average accuracy rate is improved by 1.8%, the person accuracy rate is improved by 0.7%, the rider accuracy rate is improved by 0.9%, the car accuracy rate is improved by 0.2%, the bus accuracy rate is improved by 0.9%, the truck accuracy rate is improved by 1.2%, the bike accuracy rate is improved by 4.4%, and the motor accuracy rate is improved by 4.4%.
TABLE 1 comparison of BDD100K data set test results
Figure BDA0003474099780000067
Figure BDA0003474099780000071
Table 2 compares the performance of the AFE-YOLOX-L algorithm with other advanced object detection algorithms on the BDD100K data set. By contrast, the AFE-YOLOX-L algorithm is superior to many advanced target detection algorithms.
TABLE 2 Performance contrast of AFE-YOLOX-L with other advanced object detection algorithms on BDD100K dataset
Figure BDA0003474099780000072
Table 3 compares the overall detection result obtained by adding the attention feature enhancement module and the target detection results of various types based on the YOLOX-L algorithm with the PASCAL VOC 2007 train val and the PASCAL VOC 2012 train val as training data sets and the PASCAL VOC 2007 test set. When the input image is verified to be 320 multiplied by 320, the detection average accuracy rate of the AFE-YOLOX-L algorithm added with the attention feature enhancement module on 20 types of targets reaches 84.1%, compared with the detection average accuracy rate of the AFE-YOLOX-L algorithm, the detection average accuracy rate is improved by 0.6%, and 17 types of targets are improved to different degrees.
TABLE 3 comparison of the test results of the PASCAL VOC 2007 test set
Figure BDA0003474099780000073
Table 4 compares the performance of the AFE-YOLOX-L algorithm with other advanced target detection algorithms on the PASCAL VOC 2007 test data set. By contrast, the AFE-YOLOX-L algorithm is superior to many advanced target detection algorithms.
TABLE 4 Performance comparison of AFE-YOLOX-L with other advanced target detection algorithms on PASCAL VOC 2007 dataset
Figure BDA0003474099780000074
Figure BDA0003474099780000081
It should be noted that the above-mentioned contents only illustrate the technical idea of the present invention, and the protection scope of the present invention is not limited thereby, and it is obvious to those skilled in the art that several modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations fall within the protection scope of the claims of the present invention.

Claims (4)

1. A road target detection method based on an attention feature enhancement module is characterized by comprising the following steps:
step 1, acquiring image information of a road target to be detected;
step 2, constructing a convolutional neural network, extracting the characteristics representing the road target from the image information, and obtaining input characteristic graphs of different sizes;
step 3, constructing an attention feature enhancement module, wherein the attention feature enhancement module comprises a CBAM (CBAM) attention mechanism and a semantic enhancement branch, and performing feature enhancement on the input feature map obtained in the step 2 through the attention feature enhancement module, so as to improve the feature expression of the task region of interest and obtain a feature map containing deep semantic information and shallow texture information;
and 4, classifying and regressing by adopting a decoupling output head based on the enhanced characteristic diagram obtained in the step 3, and outputting a detection result.
2. The road target detection method based on the attention feature enhancing module is characterized by comprising the following steps of: in step 3, the specific steps of constructing the attention feature enhancing module include:
step 1.3.1, input feature map of size original image 1/32
Figure FDA0003474099770000011
Inputting the feature map into a CBAM module
Figure FDA0003474099770000012
Will be provided with
Figure FDA0003474099770000013
Performing convolution operation, batch normalization and activation function processing, and then performing upsampling to obtain a semantic enhanced feature map with the size of the original image 1/16
Figure FDA0003474099770000014
Step 1.3.2 input feature map of size original image 1/16
Figure FDA0003474099770000015
And semantically enhanced feature maps
Figure FDA0003474099770000016
Carrying out addition operation to obtain a feature map
Figure FDA0003474099770000017
Will be provided with
Figure FDA0003474099770000018
Inputting the feature map into a CBAM module
Figure FDA0003474099770000019
Will feature map
Figure FDA00034740997700000110
And characteristic diagram
Figure FDA00034740997700000111
Performing a stitching operation on the channel dimension to obtain a semantic enhanced feature map with the size of the original image 1/16
Figure FDA00034740997700000112
Step 1.3.3, the feature map is to be enhanced
Figure FDA00034740997700000113
Inputting the feature map into a CSPLAyer unit, performing convolution operation, batch normalization and activation function processing on the obtained feature map, and then performing up-sampling on the feature map to obtain a semantic enhanced feature map with the size of an original image 1/8
Figure FDA00034740997700000114
Step 1.3.4, input feature map of size original image 1/8
Figure FDA00034740997700000115
And semantically enhanced feature maps
Figure FDA00034740997700000116
Carrying out addition operation to obtain a feature map
Figure FDA00034740997700000117
Will be provided with
Figure FDA00034740997700000118
Inputting the feature map into a CBAM module
Figure FDA00034740997700000119
Will feature map
Figure FDA00034740997700000120
And characteristic diagram
Figure FDA00034740997700000121
Performing a stitching operation on the channel dimension to obtain a semantic enhanced feature map with the size of the original image 1/8
Figure FDA00034740997700000122
Step 1.3.5, the feature map is enhanced
Figure FDA00034740997700000123
Inputting the characteristic diagram into a CSPLAyer unit, and using the obtained characteristic diagram for detecting the target by the decoupling head.
3. The road target detection method based on the attention feature enhancing module as claimed in claim 2, wherein: the CBAM module described in step 1.3.1, step 1.3.2 and step 1.3.4 includes two processing steps of generating a channel attention feature map and a spatial attention feature map:
performing global mean pooling and global maximum pooling on the input feature map F to obtain spatial information of the input features, and generating
Figure FDA0003474099770000021
And
Figure FDA0003474099770000022
two spatial context descriptors;
Figure FDA0003474099770000023
and
Figure FDA0003474099770000024
generating a channel attention map by a multi-layer perceptron including a hidden layer; when the size of the input feature map is
Figure FDA0003474099770000025
Channel attention feature map Mc(F) The detailed calculation formula is as follows:
Figure FDA0003474099770000026
wherein the content of the first and second substances,
Figure FDA0003474099770000027
r represents a reduction value of the bottleneck structure of the multilayer perceptron, and the value is 16; σ (-) represents a Sigmoid activation function; r (-) represents a ReLU linear rectification function; g (-) is a global mean pooling function; δ (·) is a global maximum pooling function;
the Sigmoid activation function σ () is calculated as:
Figure FDA0003474099770000028
the calculation formula of the ReLU linear rectification function R (-) is as follows:
Figure FDA0003474099770000029
the global mean pooling function g (-) is calculated as:
Figure FDA00034740997700000210
the global maximum pooling function δ (-) is calculated as:
Figure FDA00034740997700000211
in generating the spatial attention feature map, the input feature map F' is first subjected to pooling operations including mean pooling and maximum pooling along the channel axis, and generated
Figure FDA00034740997700000212
And
Figure FDA00034740997700000213
two-dimensional feature maps; splicing the generated two-dimensional characteristic graphs in the channel dimension, and performing convolution through a standard convolution layer to generate a space attention characteristic graph Ms(F'); spatial attention feature map Ms(F') the detailed calculation formula is:
Figure FDA00034740997700000214
wherein the content of the first and second substances,
Figure FDA0003474099770000031
f7×7it represents a convolution operation with a convolution kernel of 7 × 7;
the final output attention feature mapping calculation formula of the CBAM module is as follows:
Figure FDA0003474099770000032
4. the road target detection method based on the attention feature enhancing module as claimed in claim 2, wherein: the specific steps of the CSPLAyer unit in step 1.3.3 and step 1.3.5 include:
firstly, inputting a feature map F1Carrying out convolution operation, batch normalization and activation function processing to obtain a characteristic diagram F11
Then inputting a feature map F1Inputting another branch, performing convolution operation, batch normalization and activation function processing to obtain a feature map F21Will F21Successively carrying out three operations in successive residual bottleneck blocks to obtain a characteristic diagram F22
Finally, the feature map F11And characteristic diagram F22Stitching in channel dimension to obtain feature map F31And apply the feature map F31And carrying out convolution operation, batch normalization and activation function processing for subsequent operation.
CN202210049982.8A 2022-01-17 2022-01-17 Road target detection method based on attention feature enhancement module Pending CN114419589A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210049982.8A CN114419589A (en) 2022-01-17 2022-01-17 Road target detection method based on attention feature enhancement module

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210049982.8A CN114419589A (en) 2022-01-17 2022-01-17 Road target detection method based on attention feature enhancement module

Publications (1)

Publication Number Publication Date
CN114419589A true CN114419589A (en) 2022-04-29

Family

ID=81273922

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210049982.8A Pending CN114419589A (en) 2022-01-17 2022-01-17 Road target detection method based on attention feature enhancement module

Country Status (1)

Country Link
CN (1) CN114419589A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115690704A (en) * 2022-09-27 2023-02-03 淮阴工学院 LG-CenterNet model-based complex road scene target detection method and device
CN116824272A (en) * 2023-08-10 2023-09-29 湖北工业大学 Feature enhanced target detection method based on rotation feature

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115690704A (en) * 2022-09-27 2023-02-03 淮阴工学院 LG-CenterNet model-based complex road scene target detection method and device
CN115690704B (en) * 2022-09-27 2023-08-22 淮阴工学院 LG-CenterNet model-based complex road scene target detection method and device
CN116824272A (en) * 2023-08-10 2023-09-29 湖北工业大学 Feature enhanced target detection method based on rotation feature
CN116824272B (en) * 2023-08-10 2024-02-13 湖北工业大学 Feature enhanced target detection method based on rotation feature

Similar Documents

Publication Publication Date Title
CN112541503B (en) Real-time semantic segmentation method based on context attention mechanism and information fusion
Abbas et al. A comprehensive review of recent advances on deep vision systems
Li et al. A unified framework for concurrent pedestrian and cyclist detection
CN114419589A (en) Road target detection method based on attention feature enhancement module
Walambe et al. Lightweight object detection ensemble framework for autonomous vehicles in challenging weather conditions
Hsu et al. Pedestrian detection using stationary wavelet dilated residual super-resolution
CN112966747A (en) Improved vehicle detection method based on anchor-frame-free detection network
Dutta et al. ViT-BEVSeg: A hierarchical transformer network for monocular birds-eye-view segmentation
Tong et al. ASCNet: 3D object detection from point cloud based on adaptive spatial context features
CN115294326A (en) Method for extracting features based on target detection grouping residual error structure
Li et al. Detection of road objects based on camera sensors for autonomous driving in various traffic situations
Li et al. Fast vehicle detection algorithm based on lightweight YOLO7-tiny
CN115280373A (en) Managing occlusions in twin network tracking using structured dropping
Özyurt et al. A new method for classification of images using convolutional neural network based on Dwt-Svd perceptual hash function
Liu et al. Small Target Detection for Unmanned Aerial Vehicle Images Based on YOLOv5l
Thu et al. Pyramidal part-based model for partial occlusion handling in pedestrian classification
Lai et al. A Light Weight Multi-Head SSD Model for ADAS Applications
Wei et al. An Efficient Point Cloud-based 3D Single Stage Object Detector
Nguyen Predicted anchor region proposal with balanced feature pyramid for license plate detection in traffic scene images
Abebe et al. Extended Single Shoot Multibox Detector for Traffic Signs Detection and Recognition in Real-time
CN113780140B (en) Gesture image segmentation and recognition method and device based on deep learning
Yim et al. Object-oriented cutout data augmentation for tiny object detection
Lu et al. Wavelet and cutout in YOLO architecture for road pothole detection
Cai et al. CNXResNet: A Light-weight Backbone based on PP-YOLOE for Drone-captured Scenarios
Jiang et al. A multiobject detection scheme based on deep learning for infrared images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination