CN112183203B - Real-time traffic sign detection method based on multi-scale pixel feature fusion - Google Patents
Real-time traffic sign detection method based on multi-scale pixel feature fusion Download PDFInfo
- Publication number
- CN112183203B CN112183203B CN202010866848.8A CN202010866848A CN112183203B CN 112183203 B CN112183203 B CN 112183203B CN 202010866848 A CN202010866848 A CN 202010866848A CN 112183203 B CN112183203 B CN 112183203B
- Authority
- CN
- China
- Prior art keywords
- feature
- feature map
- channel
- fusion
- pixel
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000004927 fusion Effects 0.000 title claims abstract description 38
- 238000001514 detection method Methods 0.000 title claims abstract description 32
- 238000000034 method Methods 0.000 claims abstract description 23
- 238000000605 extraction Methods 0.000 claims abstract description 20
- 230000008707 rearrangement Effects 0.000 claims abstract description 20
- 230000001629 suppression Effects 0.000 claims abstract description 6
- 238000007781 pre-processing Methods 0.000 claims abstract description 3
- 230000006870 function Effects 0.000 claims description 26
- 238000005070 sampling Methods 0.000 claims description 12
- 230000008569 process Effects 0.000 claims description 10
- 230000004913 activation Effects 0.000 claims description 9
- 238000012549 training Methods 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 5
- 230000010354 integration Effects 0.000 claims description 3
- 238000011176 pooling Methods 0.000 claims description 3
- 238000012360 testing method Methods 0.000 claims description 3
- 238000012795 verification Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 2
- 238000013135 deep learning Methods 0.000 abstract description 3
- 230000000694 effects Effects 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 206010039203 Road traffic accident Diseases 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000005562 fading Methods 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
- G06V20/58—Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
- G06V20/582—Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads of traffic signs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4038—Image mosaicing, e.g. composing plane images from plane sub-images
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Probability & Statistics with Applications (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
A real-time traffic sign detection method based on multi-scale pixel feature fusion belongs to the field of deep learning and target detection. Firstly, acquiring an image containing traffic signs and preprocessing the image; secondly, inputting the preprocessed image into MobileNetv network for feature extraction; inputting the extracted multi-scale feature map to a pixel feature fusion module for pixel rearrangement, and splicing to generate a fusion feature map with semantic information and detail information; then, downsampling the fusion feature images to obtain six scale feature images, inputting the six scale feature images to the high-efficiency channel attention module, and distributing weights to the feature channels according to importance degrees; inputting the weighted six-scale feature map to an SSD detection layer to predict the position of the boundary box and the class of the object; and finally, performing non-maximum suppression to obtain an optimal traffic sign detection result. The method can give consideration to real-time performance and accuracy when detecting the traffic sign image, and has strong robustness.
Description
Technical Field
The invention belongs to the field of deep learning and target detection, and particularly relates to a real-time traffic sign detection method based on multi-scale pixel feature fusion.
Background
For road traffic safety, traffic signs are critical. In a real driving scene, the illumination changes caused by natural environments such as sunlight and weather, and special conditions such as fading, deformation and shielding of traffic signs exist, so that people eyes can miss or incorrectly identify the traffic signs, and the traffic accidents are caused by incorrect judgment of the road conditions in front, so that personal and property and vehicle losses are caused, and even life safety is threatened. The real-time accurate traffic sign detection technology is used as an important component of an advanced driving assistance system, can assist a driver to ensure driving safety, avoids dangerous occurrence, and has important application in the fields of traffic safety, automatic driving and the like.
In practical applications, the driving assistance system is required to have extremely high sensitivity, i.e. the category of the vehicle can be identified when the vehicle is far enough away from the traffic sign, and a better early warning is provided for the driver or the driving system. This requires the detection algorithm to meet high real-time and small target detection performance. The current method for improving the detection performance of the small target brings additional calculation and parameters at the same time, so that the real-time performance of the detection algorithm is reduced. Therefore, how to improve the small target detection performance of the algorithm to meet the requirement of a real driving assistance system while ensuring real-time performance without introducing excessive additional calculation cost is a problem to be solved.
Disclosure of Invention
In order to solve the technical problems, the invention aims to provide a real-time traffic sign detection method based on multi-scale pixel feature fusion, which overcomes the difficulty that the traffic sign method based on deep learning is difficult to consider real-time performance and accuracy.
In order to achieve the technical purpose, the technical scheme of the invention is as follows:
a real-time traffic sign detection method based on multi-scale pixel feature fusion comprises the following steps:
(1) Acquiring an image containing traffic signs, and preprocessing the acquired image;
(2) Inputting the image obtained after the pretreatment in the step (1) into MobileNetv network for feature extraction to obtain three-scale depth feature images;
(3) Inputting the depth feature images with three scales obtained in the step (2) into a pixel feature fusion module for pixel rearrangement, and splicing to generate a fusion feature image with semantic information and detail information;
(4) Downsampling the fusion feature map obtained in the step (3) to obtain six scale feature maps, inputting the six scale feature maps into the high-efficiency channel attention module, and distributing weights to the feature channels according to the importance degree;
(5) And (3) inputting the six scale feature maps with the weights generated in the step (4) into an SSD detection layer for classifying and positioning traffic signs, and finally, performing non-maximum suppression to obtain an optimal traffic sign detection result.
Further, the specific process of step (1) is as follows:
(a) Acquiring images containing traffic signs, and marking boundary frames and category information of each traffic sign appearing in each image;
(b) When the number of the acquired images is small, the existing images are utilized to perform data enhancement operation. More images are created by adopting methods of overturning, translating, rotating or adding noise and the like, so that the trained neural network has better effect;
(c) Uniformly converting the image resolution into 300 x 300 to adapt to the input size;
(d) And optimizing the images based on the number of positive and negative samples, and dividing the images to obtain a training image set and a test image set.
Further, the specific process of step (2) is as follows:
(A) Firstly, carrying out preliminary feature extraction on a 300 x 300 input image through a 3*3 standard convolution block to obtain a 150 x 32 feature map, wherein 32 represents the channel number of the feature map;
(B) The 150×150×32 feature map obtained in step (a) sequentially goes through 6 reverse residual bottleneck blocks to perform depth feature extraction, so as to obtain depth feature maps A, B, C of 38×38×32, 19×19×96 and 10×10×320 respectively.
Further, the specific process of the step (3) is as follows:
Step (I), pixel rearrangement with an upsampling factor of 4 is carried out on the 10-320 depth feature map obtained by the feature extraction in the step (2), so as to obtain a 38-20 upsampling feature map D;
(II) carrying out pixel rearrangement with an image upsampling factor of 2 on the 19 x 96 depth feature map obtained by the feature extraction in the step (2) to obtain a 38 x 24 upsampling feature map E;
And (III) performing splicing processing on the 38 x 20 up-sampling feature map D and the 38 x 24 up-sampling feature map E obtained by pixel rearrangement in the step (I) and the step (II) and the 38 x 32 depth feature map A obtained by feature extraction in the step (2) to generate a 38 x 76 fusion feature map F with semantic information and detail information.
The pixel characteristic fusion module in the step (3) synthesizes the fusion characteristic map by adopting a pixel rearrangement mode, and compared with other up-sampling mode pixel rearrangement, the pixel characteristic fusion module can enhance the information carried by the characteristic map under the condition of not adding any additional parameters and calculation; the pixel rearrangement expands the width and length by compressing the channel number in the feature map, which is essentially to rearrange the features in the same pixel position in the low-resolution feature map with the channel number of r 2 C and the length and width of H x W according to a specific sequence to obtain a high-resolution feature map with the channel number of C and the length and width of rH x rW, wherein r represents an up-sampling factor; unlike up-sampling approaches of interpolation and deconvolution, pixel rebinning does not introduce additional parameters and computational expense, while solving some artifacts or checkerboard effects of interpolation and deconvolution.
Further, the specific process of step (4) is as follows:
(i) Downsampling the 38×38×76 fusion feature map F obtained in the step (3) by convolution with a stride of 2 to obtain a19×19×256 feature map G; downsampling the feature map G by convolution with the stride of 2 to obtain a 10 x 256 feature map H; sequentially obtaining a 5 x 256 feature map I, a 3 x 128 feature map J and a1 x 128 feature map K according to the steps;
(ii) Respectively inputting the 38 x 76 fusion feature map F obtained in the step (3) and the feature maps G-K with six scales into a high-efficiency channel attention module, and distributing weights to the feature channels according to importance degrees to obtain feature maps with six scales;
Wherein the efficient channel attention module of step (ii) is capable of learning relationships between channels, assigning channel weights based on channel importance;
Firstly, compressing the dimension of a characteristic channel, and converting an original characteristic channel of H, W and C into 1, 1 and C through global pooling to obtain a global characteristic value in the dimension of the channel;
And then carrying out information extraction integration on each channel and 5 neighborhood channels of the channel by using one-dimensional convolution with the convolution kernel size of 5 to obtain a correlation parameter L i between the channels:
wherein alpha j represents one-dimensional convolution kernel parameters, and are updated along with network training after being initialized and set by an Xavier; Representing 5 neighborhood channels/>, representing characteristic channel C i Global feature value of the jth channel;
And then L i is used for obtaining the activation value of each channel through a Sigmoid activation function as the weight omega i of the channel:
wherein σ represents a Sigmoid activation function;
Finally multiplying the weight with the original channel characteristic value to obtain a weighted output characteristic channel; the network can focus on important subject features by weighting the feature channels.
Further, the specific process of step (5) is as follows:
Taking the six weighted scale feature maps obtained in the step (4) as input, generating a plurality of default frames for each pixel of the input feature maps, and then detecting by a positioning sub-network and a classification sub-network respectively; the detection value comprises two parts: bounding box location and category confidence; the positioning sub-network predicts a bounding box for each default box; the classifying sub-network predicts the confidence of all the categories of each default frame;
And secondly, suppressing the confidence of the target category and the position offset of the prediction frame relative to the default frame in the plurality of prediction frames by using non-maximum suppression, and selecting the prediction frame with the minimum target loss function as the optimal prediction frame to obtain the target category and the prediction frame position in the optimal prediction frame.
Wherein the objective loss function L (x, L, c, g) of the detection network in the step (two) consists of a classification loss function L conf (x, c) and a positioning loss function L loc (x, L, g):
Wherein x is a default frame on the feature map, L is a predicted frame, c is a confidence predicted value of the default frame on the feature map on each category, g is a real frame, L conf (x, c) is a softmax classification loss function of the default frame on the feature map on the category score set c, L loc (x, L, g) is a position loss function, N is the number of default frames matched with the real frame, and the weight coefficient alpha is set to 1 through cross verification. The detection network achieves more accurate target positioning and classification by optimizing the loss function.
The beneficial effects brought by adopting the technical scheme are that:
The invention provides a multi-scale pixel feature fusion strategy, which is characterized in that a deep feature image extracted by MobileNetv networks is subjected to pixel rearrangement to synthesize a fusion feature image, and compared with other up-sampling pixel rearrangement modes, the method can enhance small target information carried by the feature image under the condition of not adding any additional parameters and calculation; and a high-efficiency channel attention module is added before the network is detected, weights are distributed to the characteristic channels according to the importance degree, and the detection performance is effectively improved. The method has the advantages of small memory occupation, high detection speed and accurate detection of small targets, and can realize high-precision real-time traffic sign detection.
Drawings
FIG. 1 is a schematic flow chart of the present invention;
FIG. 2 is a diagram of a model structure of the present invention;
Detailed Description
For the purposes of making the technical solutions and advantages of the method of the present invention more clear, the following description is given by way of example with reference to the accompanying drawings, which are not intended to limit the invention:
step 1, obtaining images containing traffic signs, and marking boundary boxes and category information of each traffic sign appearing in each image.
When the number of the acquired images is small, the existing images are utilized to perform data enhancement operation. More images are created by adopting methods of overturning, translating, rotating or adding noise and the like, so that the trained neural network has better effect.
The image resolution is uniformly translated to 300 x 300 to accommodate the input size.
And optimizing the images based on the number of positive and negative samples, and dividing the images to obtain a training image set and a test image set.
Step 2, performing preliminary feature extraction on a 300×300 input image through a 3*3 standard convolution block to obtain a 150×150×32 feature map, wherein 32 represents the channel number of the feature map.
And sequentially carrying out depth feature extraction on the 150×150×32 feature map through 6 reverse residual bottleneck blocks to respectively obtain 38×38×32, 19×19×96 and 10×10×320 depth feature maps A, B, C.
Step 3, performing pixel rearrangement with an upsampling factor of 4 on the 10×10×320 depth feature map obtained by the feature extraction in step 2 to obtain a 38×38×20 upsampling feature map D;
Performing pixel rearrangement with an image upsampling factor of 2 on the 19×19×96 depth feature image obtained by the feature extraction in the step 2 to obtain a 38×38×24 upsampling feature image E;
and (3) splicing the 38 x 20 up-sampling feature map D obtained by pixel rearrangement and the 38 x 24 up-sampling feature map E with the 38 x 32 depth feature map A to generate a 38 x 76 fusion feature map F with semantic information and detail information.
Step 4, downsampling the 38×38×76 fusion feature map F obtained in the step 3 by convolution with a stride of 2 to obtain a 19×19×256 feature map G; downsampling the feature map G by convolution with the stride of 2 to obtain a 10 x 256 feature map H; sequentially obtaining a 5 x 256 feature map I, a 3 x 128 feature map J and a1 x 128 feature map K according to the steps;
respectively inputting a feature map with six scales of 38 x 76 fusion feature map F and feature maps G-K into a high-efficiency channel attention module, compressing the feature channel dimension, and converting an original feature channel with H x W x C into 1 x C through global pooling to obtain a global feature value in the channel dimension;
And then carrying out information extraction integration on each channel and 5 neighborhood channels of the channel by using one-dimensional convolution with the convolution kernel size of 5 to obtain a correlation parameter L i between the channels:
wherein alpha j represents one-dimensional convolution kernel parameters, and are updated along with network training after being initialized and set by an Xavier; Representing 5 neighborhood channels/>, representing characteristic channel C i Global feature value of the jth channel;
And then L i is used for obtaining the activation value of each channel through a Sigmoid activation function as the weight omega i of the channel:
wherein σ represents a Sigmoid activation function;
Finally multiplying the weight with the original channel characteristic value to obtain a characteristic diagram with six scales and weight;
Step 5, taking the six scale feature images with weights obtained in the step 4 as input, generating a plurality of default frames for each pixel of the input feature images, and then detecting by a positioning sub-network and a classification sub-network respectively; the detection value comprises two parts: bounding box location and category confidence; the positioning sub-network predicts a bounding box for each default box; the classification sub-network predicts the confidence of all its classes for each default box.
And suppressing the confidence of the target category in the plurality of prediction frames and the position offset of the prediction frames relative to the default frame by using non-maximum suppression, and selecting the prediction frame with the minimum target loss function as the optimal prediction frame to obtain the target category and the prediction frame position in the optimal prediction frame.
Wherein the objective loss function L (x, L, c, g) of the network consists of a classification loss function L conf (x, c) and a positioning loss function L loc (x, L, g):
Wherein x is a default frame on the feature map, L is a predicted frame, c is a confidence predicted value of the default frame on the feature map on each category, g is a real frame, L conf (x, c) is a softmax classification loss function of the default frame on the feature map on the category score set c, L loc (x, L, g) is a position loss function, N is the number of default frames matched with the real frame, and the weight coefficient alpha is set to 1 through cross verification. The detection network achieves more accurate target positioning and classification by optimizing the loss function.
The above embodiments are only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited thereto, and any modification made on the basis of the technical scheme according to the technical idea of the present invention falls within the protection scope of the present invention.
Claims (6)
1. The real-time traffic sign detection method based on multi-scale pixel feature fusion is characterized by comprising the following steps of:
(1) Acquiring an image containing traffic signs, and preprocessing the acquired image;
(2) Inputting the image obtained after the pretreatment in the step (1) into MobileNetv network for feature extraction to obtain three-scale depth feature images;
(3) Inputting the depth feature images with three scales obtained in the step (2) into a pixel feature fusion module for pixel rearrangement, and splicing to generate a fusion feature image with semantic information and detail information;
(4) Downsampling the fusion feature map obtained in the step (3) to obtain six scale feature maps, inputting the six scale feature maps into the high-efficiency channel attention module, and distributing weights to the feature channels according to the importance degree;
(5) Inputting the six scale feature maps with the weights generated in the step (4) to an SSD detection layer for classifying and positioning traffic signs, and finally performing non-maximum suppression to obtain an optimal traffic sign detection result;
The pixel characteristic fusion module in the step (3) synthesizes the fusion characteristic map by adopting a pixel rearrangement mode, and compared with other up-sampling mode pixel rearrangement, the pixel characteristic fusion module can enhance the information carried by the characteristic map under the condition of not adding any additional parameters and calculation; the pixel rearrangement expands the width and length by compressing the channel number in the feature map, which is essentially to rearrange the features in the same pixel position in the low-resolution feature map with the channel number of r 2 C and the length and width of H x W according to a specific sequence to obtain a high-resolution feature map with the channel number of C and the length and width of rH x rW, wherein r represents an up-sampling factor;
The specific process of the step (4) is as follows:
(i) Downsampling the 38×38×76 fusion feature map F obtained in the step (3) by convolution with a stride of 2 to obtain a19×19×256 feature map G; downsampling the feature map G by convolution with the stride of 2 to obtain a 10 x 256 feature map H; sequentially obtaining a 5 x 256 feature map I, a 3 x 128 feature map J and a1 x 128 feature map K according to the steps;
(ii) Respectively inputting the 38 x 76 fusion feature map F obtained in the step (3) and the feature maps G-K with six scales into a high-efficiency channel attention module, and distributing weights to the feature channels according to importance degrees to obtain feature maps with six scales;
the efficient channel attention module of step (ii) is capable of learning relationships between channels, assigning channel weights based on channel importance;
Firstly, compressing the dimension of a characteristic channel, and converting an original characteristic channel of an H, W and C characteristic graph into 1,1 and C through global pooling to obtain a global characteristic value in the dimension of the channel;
And then carrying out information extraction integration on each channel and 5 neighborhood channels of the channel by using one-dimensional convolution with the convolution kernel size of 5 to obtain a correlation parameter L i between the channels:
wherein alpha j represents one-dimensional convolution kernel parameters, and are updated along with network training after being initialized and set by an Xavier; Representing 5 neighborhood channels/>, representing characteristic channel C i Global feature value of the jth channel;
And then L i is used for obtaining the activation value of each channel through a Sigmoid activation function as the weight omega i of the channel:
wherein σ represents a Sigmoid activation function;
Finally multiplying the weight with the original channel characteristic value to obtain a weighted output characteristic channel; the network can focus on important subject features by weighting the feature channels.
2. The method for detecting real-time traffic sign based on multi-scale pixel feature fusion as claimed in claim 1, wherein the specific process of the step (1) is as follows:
(a) Acquiring an image containing traffic signs and performing data enhancement operation;
(b) Marking the boundary box and category information of each traffic sign appearing in each image;
(c) Uniformly converting the image resolution into 300 x 300 to adapt to the input size;
(d) And optimizing the images based on the number of positive and negative samples, and dividing the images to obtain a training image set and a test image set.
3. The method for detecting real-time traffic sign based on multi-scale pixel feature fusion as claimed in claim 1, wherein the specific process of the step (2) is as follows:
(A) Firstly, carrying out preliminary feature extraction on a 300 x 300 input image through a 3*3 standard convolution block to obtain a 150 x 32 feature map, wherein 32 represents the channel number of the feature map;
(B) The 150×150×32 feature map obtained in step (a) sequentially goes through 6 reverse residual bottleneck blocks to perform depth feature extraction, so as to obtain depth feature maps A, B, C of 38×38×32, 19×19×96 and 10×10×320 respectively.
4. The method for detecting real-time traffic sign based on multi-scale pixel feature fusion as claimed in claim 1, wherein the specific process of the step (3) is as follows:
Step (I), pixel rearrangement with an upsampling factor of 4 is carried out on the 10-320 depth feature map obtained by the feature extraction in the step (2), so as to obtain a 38-20 upsampling feature map D;
(II) carrying out pixel rearrangement with an image upsampling factor of 2 on the 19 x 96 depth feature map obtained by the feature extraction in the step (2) to obtain a 38 x 24 upsampling feature map E;
And (III) performing splicing processing on the 38 x 20 up-sampling feature map D and the 38 x 24 up-sampling feature map E obtained by pixel rearrangement in the step (I) and the step (II) and the 38 x 32 depth feature map A obtained by feature extraction in the step (2) to generate a 38 x 76 fusion feature map F with semantic information and detail information.
5. The method for detecting real-time traffic sign based on multi-scale pixel feature fusion as claimed in claim 1, wherein the specific process of the step (5) is as follows:
Taking the six weighted scale feature maps obtained in the step (4) as input, generating a plurality of default frames for each pixel of the input feature maps, and then detecting by a positioning sub-network and a classification sub-network respectively; the detection value comprises two parts: bounding box location and category confidence; the positioning sub-network predicts a bounding box for each default box; the classifying sub-network predicts the confidence of all the categories of each default frame;
And secondly, suppressing the confidence of the target category and the position offset of the prediction frame relative to the default frame in the plurality of prediction frames by using non-maximum suppression, and selecting the prediction frame with the minimum target loss function as the optimal prediction frame to obtain the target category and the prediction frame position in the optimal prediction frame.
6. The method for detecting traffic sign in real time based on multi-scale pixel feature fusion according to claim 5, wherein in the step (two), the objective loss function L (x, L, c, g) of the detection network consists of a classification loss function L conf (x, c) and a positioning loss function L loc (x, L, g):
Wherein x is a default frame on the feature map, L is a predicted frame, c is a confidence predicted value of the default frame on the feature map on each category, g is a real frame, L conf (x, c) is a softmax classification loss function of the default frame on the feature map on the category score set c, L loc (x, L, g) is a position loss function, N is the number of default frames matched with the real frame, and the weight coefficient alpha is set to 1 through cross verification.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010866848.8A CN112183203B (en) | 2020-08-26 | 2020-08-26 | Real-time traffic sign detection method based on multi-scale pixel feature fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010866848.8A CN112183203B (en) | 2020-08-26 | 2020-08-26 | Real-time traffic sign detection method based on multi-scale pixel feature fusion |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112183203A CN112183203A (en) | 2021-01-05 |
CN112183203B true CN112183203B (en) | 2024-05-28 |
Family
ID=73925715
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010866848.8A Active CN112183203B (en) | 2020-08-26 | 2020-08-26 | Real-time traffic sign detection method based on multi-scale pixel feature fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112183203B (en) |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112784857B (en) * | 2021-01-29 | 2022-11-04 | 北京三快在线科技有限公司 | Model training and image processing method and device |
CN112861987B (en) * | 2021-03-03 | 2024-04-16 | 德鲁动力科技(成都)有限公司 | Target detection method in dim light environment |
CN113449770B (en) * | 2021-05-18 | 2024-02-13 | 科大讯飞股份有限公司 | Image detection method, electronic device and storage device |
CN113313118A (en) * | 2021-06-25 | 2021-08-27 | 哈尔滨工程大学 | Self-adaptive variable-proportion target detection method based on multi-scale feature fusion |
CN113536978B (en) * | 2021-06-28 | 2023-08-18 | 杭州电子科技大学 | Camouflage target detection method based on saliency |
CN113537397B (en) * | 2021-08-11 | 2024-04-19 | 大连海事大学 | Target detection and image definition joint learning method based on multi-scale feature fusion |
CN113902903B (en) * | 2021-09-30 | 2024-08-02 | 北京工业大学 | Downsampling-based double-attention multi-scale fusion method |
CN113723377B (en) * | 2021-11-02 | 2022-01-11 | 南京信息工程大学 | Traffic sign detection method based on LD-SSD network |
CN114241274B (en) * | 2021-11-30 | 2023-04-07 | 电子科技大学 | Small target detection method based on super-resolution multi-scale feature fusion |
CN114463772B (en) * | 2022-01-13 | 2022-11-25 | 苏州大学 | Deep learning-based traffic sign detection and identification method and system |
CN116797890A (en) * | 2022-03-11 | 2023-09-22 | 北京字跳网络技术有限公司 | Image enhancement method, device, equipment and medium |
CN114462555B (en) * | 2022-04-13 | 2022-08-16 | 国网江西省电力有限公司电力科学研究院 | Multi-scale feature fusion power distribution network equipment identification method based on raspberry group |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108268870A (en) * | 2018-01-29 | 2018-07-10 | 重庆理工大学 | Multi-scale feature fusion ultrasonoscopy semantic segmentation method based on confrontation study |
CN109344821A (en) * | 2018-08-30 | 2019-02-15 | 西安电子科技大学 | Small target detecting method based on Fusion Features and deep learning |
CN110110599A (en) * | 2019-04-03 | 2019-08-09 | 天津大学 | A kind of Remote Sensing Target detection method based on multi-scale feature fusion |
CN110287849A (en) * | 2019-06-20 | 2019-09-27 | 北京工业大学 | A kind of lightweight depth network image object detection method suitable for raspberry pie |
CN111179217A (en) * | 2019-12-04 | 2020-05-19 | 天津大学 | Attention mechanism-based remote sensing image multi-scale target detection method |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107609536A (en) * | 2017-09-29 | 2018-01-19 | 百度在线网络技术(北京)有限公司 | Information generating method and device |
-
2020
- 2020-08-26 CN CN202010866848.8A patent/CN112183203B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108268870A (en) * | 2018-01-29 | 2018-07-10 | 重庆理工大学 | Multi-scale feature fusion ultrasonoscopy semantic segmentation method based on confrontation study |
CN109344821A (en) * | 2018-08-30 | 2019-02-15 | 西安电子科技大学 | Small target detecting method based on Fusion Features and deep learning |
CN110110599A (en) * | 2019-04-03 | 2019-08-09 | 天津大学 | A kind of Remote Sensing Target detection method based on multi-scale feature fusion |
CN110287849A (en) * | 2019-06-20 | 2019-09-27 | 北京工业大学 | A kind of lightweight depth network image object detection method suitable for raspberry pie |
CN111179217A (en) * | 2019-12-04 | 2020-05-19 | 天津大学 | Attention mechanism-based remote sensing image multi-scale target detection method |
Non-Patent Citations (3)
Title |
---|
基于Anchor-free的交通标志检测;范红超;李万志;章超权;;地球信息科学学报;20200125(第01期);全文 * |
基于残差单发多框检测器模型的交通标志检测与识别;张淑芳;朱彤;;浙江大学学报(工学版);20190509(第05期);全文 * |
多尺度特征融合与极限学习机结合的交通标志识别;马永杰;程时升;马芸婷;陈敏;;液晶与显示;20200615(第06期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN112183203A (en) | 2021-01-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112183203B (en) | Real-time traffic sign detection method based on multi-scale pixel feature fusion | |
CN108647585B (en) | Traffic identifier detection method based on multi-scale circulation attention network | |
CN110188705B (en) | Remote traffic sign detection and identification method suitable for vehicle-mounted system | |
CN112132156B (en) | Image saliency target detection method and system based on multi-depth feature fusion | |
WO2021244621A1 (en) | Scenario semantic parsing method based on global guidance selective context network | |
CN111563508A (en) | Semantic segmentation method based on spatial information fusion | |
CN111461039B (en) | Landmark identification method based on multi-scale feature fusion | |
CN110781744A (en) | Small-scale pedestrian detection method based on multi-level feature fusion | |
CN113888547A (en) | Non-supervision domain self-adaptive remote sensing road semantic segmentation method based on GAN network | |
CN116665176B (en) | Multi-task network road target detection method for vehicle automatic driving | |
CN113688836A (en) | Real-time road image semantic segmentation method and system based on deep learning | |
CN114519819B (en) | Remote sensing image target detection method based on global context awareness | |
CN114202743A (en) | Improved fast-RCNN-based small target detection method in automatic driving scene | |
CN113673562B (en) | Feature enhancement method, object segmentation method, device and storage medium | |
CN117079163A (en) | Aerial image small target detection method based on improved YOLOX-S | |
CN113505640A (en) | Small-scale pedestrian detection method based on multi-scale feature fusion | |
CN116630702A (en) | Pavement adhesion coefficient prediction method based on semantic segmentation network | |
CN112613434A (en) | Road target detection method, device and storage medium | |
CN116597270A (en) | Road damage target detection method based on attention mechanism integrated learning network | |
CN114495050A (en) | Multitask integrated detection method for automatic driving forward vision detection | |
CN113378642A (en) | Method for detecting illegal occupation buildings in rural areas | |
CN117115616A (en) | Real-time low-illumination image target detection method based on convolutional neural network | |
CN117115770A (en) | Automatic driving method based on convolutional neural network and attention mechanism | |
CN116863227A (en) | Hazardous chemical vehicle detection method based on improved YOLOv5 | |
CN113537397B (en) | Target detection and image definition joint learning method based on multi-scale feature fusion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |