CN112183203B - Real-time traffic sign detection method based on multi-scale pixel feature fusion - Google Patents

Real-time traffic sign detection method based on multi-scale pixel feature fusion Download PDF

Info

Publication number
CN112183203B
CN112183203B CN202010866848.8A CN202010866848A CN112183203B CN 112183203 B CN112183203 B CN 112183203B CN 202010866848 A CN202010866848 A CN 202010866848A CN 112183203 B CN112183203 B CN 112183203B
Authority
CN
China
Prior art keywords
feature
feature map
channel
fusion
pixel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010866848.8A
Other languages
Chinese (zh)
Other versions
CN112183203A (en
Inventor
任坤
黄泷
范春奇
陶清扬
冯波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN202010866848.8A priority Critical patent/CN112183203B/en
Publication of CN112183203A publication Critical patent/CN112183203A/en
Application granted granted Critical
Publication of CN112183203B publication Critical patent/CN112183203B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • G06V20/582Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads of traffic signs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4038Image mosaicing, e.g. composing plane images from plane sub-images

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Probability & Statistics with Applications (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

A real-time traffic sign detection method based on multi-scale pixel feature fusion belongs to the field of deep learning and target detection. Firstly, acquiring an image containing traffic signs and preprocessing the image; secondly, inputting the preprocessed image into MobileNetv network for feature extraction; inputting the extracted multi-scale feature map to a pixel feature fusion module for pixel rearrangement, and splicing to generate a fusion feature map with semantic information and detail information; then, downsampling the fusion feature images to obtain six scale feature images, inputting the six scale feature images to the high-efficiency channel attention module, and distributing weights to the feature channels according to importance degrees; inputting the weighted six-scale feature map to an SSD detection layer to predict the position of the boundary box and the class of the object; and finally, performing non-maximum suppression to obtain an optimal traffic sign detection result. The method can give consideration to real-time performance and accuracy when detecting the traffic sign image, and has strong robustness.

Description

Real-time traffic sign detection method based on multi-scale pixel feature fusion
Technical Field
The invention belongs to the field of deep learning and target detection, and particularly relates to a real-time traffic sign detection method based on multi-scale pixel feature fusion.
Background
For road traffic safety, traffic signs are critical. In a real driving scene, the illumination changes caused by natural environments such as sunlight and weather, and special conditions such as fading, deformation and shielding of traffic signs exist, so that people eyes can miss or incorrectly identify the traffic signs, and the traffic accidents are caused by incorrect judgment of the road conditions in front, so that personal and property and vehicle losses are caused, and even life safety is threatened. The real-time accurate traffic sign detection technology is used as an important component of an advanced driving assistance system, can assist a driver to ensure driving safety, avoids dangerous occurrence, and has important application in the fields of traffic safety, automatic driving and the like.
In practical applications, the driving assistance system is required to have extremely high sensitivity, i.e. the category of the vehicle can be identified when the vehicle is far enough away from the traffic sign, and a better early warning is provided for the driver or the driving system. This requires the detection algorithm to meet high real-time and small target detection performance. The current method for improving the detection performance of the small target brings additional calculation and parameters at the same time, so that the real-time performance of the detection algorithm is reduced. Therefore, how to improve the small target detection performance of the algorithm to meet the requirement of a real driving assistance system while ensuring real-time performance without introducing excessive additional calculation cost is a problem to be solved.
Disclosure of Invention
In order to solve the technical problems, the invention aims to provide a real-time traffic sign detection method based on multi-scale pixel feature fusion, which overcomes the difficulty that the traffic sign method based on deep learning is difficult to consider real-time performance and accuracy.
In order to achieve the technical purpose, the technical scheme of the invention is as follows:
a real-time traffic sign detection method based on multi-scale pixel feature fusion comprises the following steps:
(1) Acquiring an image containing traffic signs, and preprocessing the acquired image;
(2) Inputting the image obtained after the pretreatment in the step (1) into MobileNetv network for feature extraction to obtain three-scale depth feature images;
(3) Inputting the depth feature images with three scales obtained in the step (2) into a pixel feature fusion module for pixel rearrangement, and splicing to generate a fusion feature image with semantic information and detail information;
(4) Downsampling the fusion feature map obtained in the step (3) to obtain six scale feature maps, inputting the six scale feature maps into the high-efficiency channel attention module, and distributing weights to the feature channels according to the importance degree;
(5) And (3) inputting the six scale feature maps with the weights generated in the step (4) into an SSD detection layer for classifying and positioning traffic signs, and finally, performing non-maximum suppression to obtain an optimal traffic sign detection result.
Further, the specific process of step (1) is as follows:
(a) Acquiring images containing traffic signs, and marking boundary frames and category information of each traffic sign appearing in each image;
(b) When the number of the acquired images is small, the existing images are utilized to perform data enhancement operation. More images are created by adopting methods of overturning, translating, rotating or adding noise and the like, so that the trained neural network has better effect;
(c) Uniformly converting the image resolution into 300 x 300 to adapt to the input size;
(d) And optimizing the images based on the number of positive and negative samples, and dividing the images to obtain a training image set and a test image set.
Further, the specific process of step (2) is as follows:
(A) Firstly, carrying out preliminary feature extraction on a 300 x 300 input image through a 3*3 standard convolution block to obtain a 150 x 32 feature map, wherein 32 represents the channel number of the feature map;
(B) The 150×150×32 feature map obtained in step (a) sequentially goes through 6 reverse residual bottleneck blocks to perform depth feature extraction, so as to obtain depth feature maps A, B, C of 38×38×32, 19×19×96 and 10×10×320 respectively.
Further, the specific process of the step (3) is as follows:
Step (I), pixel rearrangement with an upsampling factor of 4 is carried out on the 10-320 depth feature map obtained by the feature extraction in the step (2), so as to obtain a 38-20 upsampling feature map D;
(II) carrying out pixel rearrangement with an image upsampling factor of 2 on the 19 x 96 depth feature map obtained by the feature extraction in the step (2) to obtain a 38 x 24 upsampling feature map E;
And (III) performing splicing processing on the 38 x 20 up-sampling feature map D and the 38 x 24 up-sampling feature map E obtained by pixel rearrangement in the step (I) and the step (II) and the 38 x 32 depth feature map A obtained by feature extraction in the step (2) to generate a 38 x 76 fusion feature map F with semantic information and detail information.
The pixel characteristic fusion module in the step (3) synthesizes the fusion characteristic map by adopting a pixel rearrangement mode, and compared with other up-sampling mode pixel rearrangement, the pixel characteristic fusion module can enhance the information carried by the characteristic map under the condition of not adding any additional parameters and calculation; the pixel rearrangement expands the width and length by compressing the channel number in the feature map, which is essentially to rearrange the features in the same pixel position in the low-resolution feature map with the channel number of r 2 C and the length and width of H x W according to a specific sequence to obtain a high-resolution feature map with the channel number of C and the length and width of rH x rW, wherein r represents an up-sampling factor; unlike up-sampling approaches of interpolation and deconvolution, pixel rebinning does not introduce additional parameters and computational expense, while solving some artifacts or checkerboard effects of interpolation and deconvolution.
Further, the specific process of step (4) is as follows:
(i) Downsampling the 38×38×76 fusion feature map F obtained in the step (3) by convolution with a stride of 2 to obtain a19×19×256 feature map G; downsampling the feature map G by convolution with the stride of 2 to obtain a 10 x 256 feature map H; sequentially obtaining a 5 x 256 feature map I, a 3 x 128 feature map J and a1 x 128 feature map K according to the steps;
(ii) Respectively inputting the 38 x 76 fusion feature map F obtained in the step (3) and the feature maps G-K with six scales into a high-efficiency channel attention module, and distributing weights to the feature channels according to importance degrees to obtain feature maps with six scales;
Wherein the efficient channel attention module of step (ii) is capable of learning relationships between channels, assigning channel weights based on channel importance;
Firstly, compressing the dimension of a characteristic channel, and converting an original characteristic channel of H, W and C into 1, 1 and C through global pooling to obtain a global characteristic value in the dimension of the channel;
And then carrying out information extraction integration on each channel and 5 neighborhood channels of the channel by using one-dimensional convolution with the convolution kernel size of 5 to obtain a correlation parameter L i between the channels:
wherein alpha j represents one-dimensional convolution kernel parameters, and are updated along with network training after being initialized and set by an Xavier; Representing 5 neighborhood channels/>, representing characteristic channel C i Global feature value of the jth channel;
And then L i is used for obtaining the activation value of each channel through a Sigmoid activation function as the weight omega i of the channel:
wherein σ represents a Sigmoid activation function;
Finally multiplying the weight with the original channel characteristic value to obtain a weighted output characteristic channel; the network can focus on important subject features by weighting the feature channels.
Further, the specific process of step (5) is as follows:
Taking the six weighted scale feature maps obtained in the step (4) as input, generating a plurality of default frames for each pixel of the input feature maps, and then detecting by a positioning sub-network and a classification sub-network respectively; the detection value comprises two parts: bounding box location and category confidence; the positioning sub-network predicts a bounding box for each default box; the classifying sub-network predicts the confidence of all the categories of each default frame;
And secondly, suppressing the confidence of the target category and the position offset of the prediction frame relative to the default frame in the plurality of prediction frames by using non-maximum suppression, and selecting the prediction frame with the minimum target loss function as the optimal prediction frame to obtain the target category and the prediction frame position in the optimal prediction frame.
Wherein the objective loss function L (x, L, c, g) of the detection network in the step (two) consists of a classification loss function L conf (x, c) and a positioning loss function L loc (x, L, g):
Wherein x is a default frame on the feature map, L is a predicted frame, c is a confidence predicted value of the default frame on the feature map on each category, g is a real frame, L conf (x, c) is a softmax classification loss function of the default frame on the feature map on the category score set c, L loc (x, L, g) is a position loss function, N is the number of default frames matched with the real frame, and the weight coefficient alpha is set to 1 through cross verification. The detection network achieves more accurate target positioning and classification by optimizing the loss function.
The beneficial effects brought by adopting the technical scheme are that:
The invention provides a multi-scale pixel feature fusion strategy, which is characterized in that a deep feature image extracted by MobileNetv networks is subjected to pixel rearrangement to synthesize a fusion feature image, and compared with other up-sampling pixel rearrangement modes, the method can enhance small target information carried by the feature image under the condition of not adding any additional parameters and calculation; and a high-efficiency channel attention module is added before the network is detected, weights are distributed to the characteristic channels according to the importance degree, and the detection performance is effectively improved. The method has the advantages of small memory occupation, high detection speed and accurate detection of small targets, and can realize high-precision real-time traffic sign detection.
Drawings
FIG. 1 is a schematic flow chart of the present invention;
FIG. 2 is a diagram of a model structure of the present invention;
Detailed Description
For the purposes of making the technical solutions and advantages of the method of the present invention more clear, the following description is given by way of example with reference to the accompanying drawings, which are not intended to limit the invention:
step 1, obtaining images containing traffic signs, and marking boundary boxes and category information of each traffic sign appearing in each image.
When the number of the acquired images is small, the existing images are utilized to perform data enhancement operation. More images are created by adopting methods of overturning, translating, rotating or adding noise and the like, so that the trained neural network has better effect.
The image resolution is uniformly translated to 300 x 300 to accommodate the input size.
And optimizing the images based on the number of positive and negative samples, and dividing the images to obtain a training image set and a test image set.
Step 2, performing preliminary feature extraction on a 300×300 input image through a 3*3 standard convolution block to obtain a 150×150×32 feature map, wherein 32 represents the channel number of the feature map.
And sequentially carrying out depth feature extraction on the 150×150×32 feature map through 6 reverse residual bottleneck blocks to respectively obtain 38×38×32, 19×19×96 and 10×10×320 depth feature maps A, B, C.
Step 3, performing pixel rearrangement with an upsampling factor of 4 on the 10×10×320 depth feature map obtained by the feature extraction in step 2 to obtain a 38×38×20 upsampling feature map D;
Performing pixel rearrangement with an image upsampling factor of 2 on the 19×19×96 depth feature image obtained by the feature extraction in the step 2 to obtain a 38×38×24 upsampling feature image E;
and (3) splicing the 38 x 20 up-sampling feature map D obtained by pixel rearrangement and the 38 x 24 up-sampling feature map E with the 38 x 32 depth feature map A to generate a 38 x 76 fusion feature map F with semantic information and detail information.
Step 4, downsampling the 38×38×76 fusion feature map F obtained in the step 3 by convolution with a stride of 2 to obtain a 19×19×256 feature map G; downsampling the feature map G by convolution with the stride of 2 to obtain a 10 x 256 feature map H; sequentially obtaining a 5 x 256 feature map I, a 3 x 128 feature map J and a1 x 128 feature map K according to the steps;
respectively inputting a feature map with six scales of 38 x 76 fusion feature map F and feature maps G-K into a high-efficiency channel attention module, compressing the feature channel dimension, and converting an original feature channel with H x W x C into 1 x C through global pooling to obtain a global feature value in the channel dimension;
And then carrying out information extraction integration on each channel and 5 neighborhood channels of the channel by using one-dimensional convolution with the convolution kernel size of 5 to obtain a correlation parameter L i between the channels:
wherein alpha j represents one-dimensional convolution kernel parameters, and are updated along with network training after being initialized and set by an Xavier; Representing 5 neighborhood channels/>, representing characteristic channel C i Global feature value of the jth channel;
And then L i is used for obtaining the activation value of each channel through a Sigmoid activation function as the weight omega i of the channel:
wherein σ represents a Sigmoid activation function;
Finally multiplying the weight with the original channel characteristic value to obtain a characteristic diagram with six scales and weight;
Step 5, taking the six scale feature images with weights obtained in the step 4 as input, generating a plurality of default frames for each pixel of the input feature images, and then detecting by a positioning sub-network and a classification sub-network respectively; the detection value comprises two parts: bounding box location and category confidence; the positioning sub-network predicts a bounding box for each default box; the classification sub-network predicts the confidence of all its classes for each default box.
And suppressing the confidence of the target category in the plurality of prediction frames and the position offset of the prediction frames relative to the default frame by using non-maximum suppression, and selecting the prediction frame with the minimum target loss function as the optimal prediction frame to obtain the target category and the prediction frame position in the optimal prediction frame.
Wherein the objective loss function L (x, L, c, g) of the network consists of a classification loss function L conf (x, c) and a positioning loss function L loc (x, L, g):
Wherein x is a default frame on the feature map, L is a predicted frame, c is a confidence predicted value of the default frame on the feature map on each category, g is a real frame, L conf (x, c) is a softmax classification loss function of the default frame on the feature map on the category score set c, L loc (x, L, g) is a position loss function, N is the number of default frames matched with the real frame, and the weight coefficient alpha is set to 1 through cross verification. The detection network achieves more accurate target positioning and classification by optimizing the loss function.
The above embodiments are only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited thereto, and any modification made on the basis of the technical scheme according to the technical idea of the present invention falls within the protection scope of the present invention.

Claims (6)

1. The real-time traffic sign detection method based on multi-scale pixel feature fusion is characterized by comprising the following steps of:
(1) Acquiring an image containing traffic signs, and preprocessing the acquired image;
(2) Inputting the image obtained after the pretreatment in the step (1) into MobileNetv network for feature extraction to obtain three-scale depth feature images;
(3) Inputting the depth feature images with three scales obtained in the step (2) into a pixel feature fusion module for pixel rearrangement, and splicing to generate a fusion feature image with semantic information and detail information;
(4) Downsampling the fusion feature map obtained in the step (3) to obtain six scale feature maps, inputting the six scale feature maps into the high-efficiency channel attention module, and distributing weights to the feature channels according to the importance degree;
(5) Inputting the six scale feature maps with the weights generated in the step (4) to an SSD detection layer for classifying and positioning traffic signs, and finally performing non-maximum suppression to obtain an optimal traffic sign detection result;
The pixel characteristic fusion module in the step (3) synthesizes the fusion characteristic map by adopting a pixel rearrangement mode, and compared with other up-sampling mode pixel rearrangement, the pixel characteristic fusion module can enhance the information carried by the characteristic map under the condition of not adding any additional parameters and calculation; the pixel rearrangement expands the width and length by compressing the channel number in the feature map, which is essentially to rearrange the features in the same pixel position in the low-resolution feature map with the channel number of r 2 C and the length and width of H x W according to a specific sequence to obtain a high-resolution feature map with the channel number of C and the length and width of rH x rW, wherein r represents an up-sampling factor;
The specific process of the step (4) is as follows:
(i) Downsampling the 38×38×76 fusion feature map F obtained in the step (3) by convolution with a stride of 2 to obtain a19×19×256 feature map G; downsampling the feature map G by convolution with the stride of 2 to obtain a 10 x 256 feature map H; sequentially obtaining a 5 x 256 feature map I, a 3 x 128 feature map J and a1 x 128 feature map K according to the steps;
(ii) Respectively inputting the 38 x 76 fusion feature map F obtained in the step (3) and the feature maps G-K with six scales into a high-efficiency channel attention module, and distributing weights to the feature channels according to importance degrees to obtain feature maps with six scales;
the efficient channel attention module of step (ii) is capable of learning relationships between channels, assigning channel weights based on channel importance;
Firstly, compressing the dimension of a characteristic channel, and converting an original characteristic channel of an H, W and C characteristic graph into 1,1 and C through global pooling to obtain a global characteristic value in the dimension of the channel;
And then carrying out information extraction integration on each channel and 5 neighborhood channels of the channel by using one-dimensional convolution with the convolution kernel size of 5 to obtain a correlation parameter L i between the channels:
wherein alpha j represents one-dimensional convolution kernel parameters, and are updated along with network training after being initialized and set by an Xavier; Representing 5 neighborhood channels/>, representing characteristic channel C i Global feature value of the jth channel;
And then L i is used for obtaining the activation value of each channel through a Sigmoid activation function as the weight omega i of the channel:
wherein σ represents a Sigmoid activation function;
Finally multiplying the weight with the original channel characteristic value to obtain a weighted output characteristic channel; the network can focus on important subject features by weighting the feature channels.
2. The method for detecting real-time traffic sign based on multi-scale pixel feature fusion as claimed in claim 1, wherein the specific process of the step (1) is as follows:
(a) Acquiring an image containing traffic signs and performing data enhancement operation;
(b) Marking the boundary box and category information of each traffic sign appearing in each image;
(c) Uniformly converting the image resolution into 300 x 300 to adapt to the input size;
(d) And optimizing the images based on the number of positive and negative samples, and dividing the images to obtain a training image set and a test image set.
3. The method for detecting real-time traffic sign based on multi-scale pixel feature fusion as claimed in claim 1, wherein the specific process of the step (2) is as follows:
(A) Firstly, carrying out preliminary feature extraction on a 300 x 300 input image through a 3*3 standard convolution block to obtain a 150 x 32 feature map, wherein 32 represents the channel number of the feature map;
(B) The 150×150×32 feature map obtained in step (a) sequentially goes through 6 reverse residual bottleneck blocks to perform depth feature extraction, so as to obtain depth feature maps A, B, C of 38×38×32, 19×19×96 and 10×10×320 respectively.
4. The method for detecting real-time traffic sign based on multi-scale pixel feature fusion as claimed in claim 1, wherein the specific process of the step (3) is as follows:
Step (I), pixel rearrangement with an upsampling factor of 4 is carried out on the 10-320 depth feature map obtained by the feature extraction in the step (2), so as to obtain a 38-20 upsampling feature map D;
(II) carrying out pixel rearrangement with an image upsampling factor of 2 on the 19 x 96 depth feature map obtained by the feature extraction in the step (2) to obtain a 38 x 24 upsampling feature map E;
And (III) performing splicing processing on the 38 x 20 up-sampling feature map D and the 38 x 24 up-sampling feature map E obtained by pixel rearrangement in the step (I) and the step (II) and the 38 x 32 depth feature map A obtained by feature extraction in the step (2) to generate a 38 x 76 fusion feature map F with semantic information and detail information.
5. The method for detecting real-time traffic sign based on multi-scale pixel feature fusion as claimed in claim 1, wherein the specific process of the step (5) is as follows:
Taking the six weighted scale feature maps obtained in the step (4) as input, generating a plurality of default frames for each pixel of the input feature maps, and then detecting by a positioning sub-network and a classification sub-network respectively; the detection value comprises two parts: bounding box location and category confidence; the positioning sub-network predicts a bounding box for each default box; the classifying sub-network predicts the confidence of all the categories of each default frame;
And secondly, suppressing the confidence of the target category and the position offset of the prediction frame relative to the default frame in the plurality of prediction frames by using non-maximum suppression, and selecting the prediction frame with the minimum target loss function as the optimal prediction frame to obtain the target category and the prediction frame position in the optimal prediction frame.
6. The method for detecting traffic sign in real time based on multi-scale pixel feature fusion according to claim 5, wherein in the step (two), the objective loss function L (x, L, c, g) of the detection network consists of a classification loss function L conf (x, c) and a positioning loss function L loc (x, L, g):
Wherein x is a default frame on the feature map, L is a predicted frame, c is a confidence predicted value of the default frame on the feature map on each category, g is a real frame, L conf (x, c) is a softmax classification loss function of the default frame on the feature map on the category score set c, L loc (x, L, g) is a position loss function, N is the number of default frames matched with the real frame, and the weight coefficient alpha is set to 1 through cross verification.
CN202010866848.8A 2020-08-26 2020-08-26 Real-time traffic sign detection method based on multi-scale pixel feature fusion Active CN112183203B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010866848.8A CN112183203B (en) 2020-08-26 2020-08-26 Real-time traffic sign detection method based on multi-scale pixel feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010866848.8A CN112183203B (en) 2020-08-26 2020-08-26 Real-time traffic sign detection method based on multi-scale pixel feature fusion

Publications (2)

Publication Number Publication Date
CN112183203A CN112183203A (en) 2021-01-05
CN112183203B true CN112183203B (en) 2024-05-28

Family

ID=73925715

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010866848.8A Active CN112183203B (en) 2020-08-26 2020-08-26 Real-time traffic sign detection method based on multi-scale pixel feature fusion

Country Status (1)

Country Link
CN (1) CN112183203B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112784857B (en) * 2021-01-29 2022-11-04 北京三快在线科技有限公司 Model training and image processing method and device
CN112861987B (en) * 2021-03-03 2024-04-16 德鲁动力科技(成都)有限公司 Target detection method in dim light environment
CN113449770B (en) * 2021-05-18 2024-02-13 科大讯飞股份有限公司 Image detection method, electronic device and storage device
CN113313118A (en) * 2021-06-25 2021-08-27 哈尔滨工程大学 Self-adaptive variable-proportion target detection method based on multi-scale feature fusion
CN113536978B (en) * 2021-06-28 2023-08-18 杭州电子科技大学 Camouflage target detection method based on saliency
CN113537397B (en) * 2021-08-11 2024-04-19 大连海事大学 Target detection and image definition joint learning method based on multi-scale feature fusion
CN113902903B (en) * 2021-09-30 2024-08-02 北京工业大学 Downsampling-based double-attention multi-scale fusion method
CN113723377B (en) * 2021-11-02 2022-01-11 南京信息工程大学 Traffic sign detection method based on LD-SSD network
CN114241274B (en) * 2021-11-30 2023-04-07 电子科技大学 Small target detection method based on super-resolution multi-scale feature fusion
CN114463772B (en) * 2022-01-13 2022-11-25 苏州大学 Deep learning-based traffic sign detection and identification method and system
CN116797890A (en) * 2022-03-11 2023-09-22 北京字跳网络技术有限公司 Image enhancement method, device, equipment and medium
CN114462555B (en) * 2022-04-13 2022-08-16 国网江西省电力有限公司电力科学研究院 Multi-scale feature fusion power distribution network equipment identification method based on raspberry group

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108268870A (en) * 2018-01-29 2018-07-10 重庆理工大学 Multi-scale feature fusion ultrasonoscopy semantic segmentation method based on confrontation study
CN109344821A (en) * 2018-08-30 2019-02-15 西安电子科技大学 Small target detecting method based on Fusion Features and deep learning
CN110110599A (en) * 2019-04-03 2019-08-09 天津大学 A kind of Remote Sensing Target detection method based on multi-scale feature fusion
CN110287849A (en) * 2019-06-20 2019-09-27 北京工业大学 A kind of lightweight depth network image object detection method suitable for raspberry pie
CN111179217A (en) * 2019-12-04 2020-05-19 天津大学 Attention mechanism-based remote sensing image multi-scale target detection method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107609536A (en) * 2017-09-29 2018-01-19 百度在线网络技术(北京)有限公司 Information generating method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108268870A (en) * 2018-01-29 2018-07-10 重庆理工大学 Multi-scale feature fusion ultrasonoscopy semantic segmentation method based on confrontation study
CN109344821A (en) * 2018-08-30 2019-02-15 西安电子科技大学 Small target detecting method based on Fusion Features and deep learning
CN110110599A (en) * 2019-04-03 2019-08-09 天津大学 A kind of Remote Sensing Target detection method based on multi-scale feature fusion
CN110287849A (en) * 2019-06-20 2019-09-27 北京工业大学 A kind of lightweight depth network image object detection method suitable for raspberry pie
CN111179217A (en) * 2019-12-04 2020-05-19 天津大学 Attention mechanism-based remote sensing image multi-scale target detection method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
基于Anchor-free的交通标志检测;范红超;李万志;章超权;;地球信息科学学报;20200125(第01期);全文 *
基于残差单发多框检测器模型的交通标志检测与识别;张淑芳;朱彤;;浙江大学学报(工学版);20190509(第05期);全文 *
多尺度特征融合与极限学习机结合的交通标志识别;马永杰;程时升;马芸婷;陈敏;;液晶与显示;20200615(第06期);全文 *

Also Published As

Publication number Publication date
CN112183203A (en) 2021-01-05

Similar Documents

Publication Publication Date Title
CN112183203B (en) Real-time traffic sign detection method based on multi-scale pixel feature fusion
CN108647585B (en) Traffic identifier detection method based on multi-scale circulation attention network
CN110188705B (en) Remote traffic sign detection and identification method suitable for vehicle-mounted system
CN112132156B (en) Image saliency target detection method and system based on multi-depth feature fusion
WO2021244621A1 (en) Scenario semantic parsing method based on global guidance selective context network
CN111563508A (en) Semantic segmentation method based on spatial information fusion
CN111461039B (en) Landmark identification method based on multi-scale feature fusion
CN110781744A (en) Small-scale pedestrian detection method based on multi-level feature fusion
CN113888547A (en) Non-supervision domain self-adaptive remote sensing road semantic segmentation method based on GAN network
CN116665176B (en) Multi-task network road target detection method for vehicle automatic driving
CN113688836A (en) Real-time road image semantic segmentation method and system based on deep learning
CN114519819B (en) Remote sensing image target detection method based on global context awareness
CN114202743A (en) Improved fast-RCNN-based small target detection method in automatic driving scene
CN113673562B (en) Feature enhancement method, object segmentation method, device and storage medium
CN117079163A (en) Aerial image small target detection method based on improved YOLOX-S
CN113505640A (en) Small-scale pedestrian detection method based on multi-scale feature fusion
CN116630702A (en) Pavement adhesion coefficient prediction method based on semantic segmentation network
CN112613434A (en) Road target detection method, device and storage medium
CN116597270A (en) Road damage target detection method based on attention mechanism integrated learning network
CN114495050A (en) Multitask integrated detection method for automatic driving forward vision detection
CN113378642A (en) Method for detecting illegal occupation buildings in rural areas
CN117115616A (en) Real-time low-illumination image target detection method based on convolutional neural network
CN117115770A (en) Automatic driving method based on convolutional neural network and attention mechanism
CN116863227A (en) Hazardous chemical vehicle detection method based on improved YOLOv5
CN113537397B (en) Target detection and image definition joint learning method based on multi-scale feature fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant