CN111582029B - Traffic sign identification method based on dense connection and attention mechanism - Google Patents

Traffic sign identification method based on dense connection and attention mechanism Download PDF

Info

Publication number
CN111582029B
CN111582029B CN202010255951.9A CN202010255951A CN111582029B CN 111582029 B CN111582029 B CN 111582029B CN 202010255951 A CN202010255951 A CN 202010255951A CN 111582029 B CN111582029 B CN 111582029B
Authority
CN
China
Prior art keywords
traffic sign
output
path
branch
feature map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010255951.9A
Other languages
Chinese (zh)
Other versions
CN111582029A (en
Inventor
褚晶辉
黄浩
吕卫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN202010255951.9A priority Critical patent/CN111582029B/en
Publication of CN111582029A publication Critical patent/CN111582029A/en
Application granted granted Critical
Publication of CN111582029B publication Critical patent/CN111582029B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • G06V20/582Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads of traffic signs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Abstract

The invention discloses a traffic sign identification method based on dense connection and a channel attention mechanism, which comprises the following steps: constructing a data set and carrying out data preprocessing; building a traffic sign recognition neural network based on intensive connection and attention mechanism through a deep learning framework; inputting the pictures in the training set into a neural network, obtaining traffic sign category and position information through forward propagation, calculating errors with the information in the true value, performing backward propagation, and continuously updating network parameters until the errors are not reduced; inputting a picture with a traffic sign, loading the trained model, and outputting a traffic sign recognition result picture. The invention makes full use of the deep characteristics of the network, so that the network has stronger representation capability and can better mix global and local information.

Description

Traffic sign identification method based on dense connection and attention mechanism
Technical Field
The invention relates to the field of driving assistance systems and target detection, in particular to a traffic sign identification method based on intensive connection and attention mechanism.
Background
In the existing automatic driving system based on the visual method, the target detection is the most core task, including lane line detection, vehicle detection, non-motor vehicle detection, pedestrian detection, traffic sign detection and the like. When the vehicle runs on an actual road, the autonomous vehicle must comply with traffic regulations and need to make a judgment according to traffic signs and actual conditions of the road, but in the face of complex and changeable road scenes, the vehicle needs to obtain a 'prompt' for standard running from the surrounding environment, so a traffic sign detection algorithm is an indispensable part of an autonomous system. In the early stage of research, scholars at home and abroad mainly combine various image processing methods to solve the problem of traffic sign recognition because the traffic signs are regular in shape and bright in color. In recent years, with the continuous and intensive research on neural networks, deep learning methods are more applied to the field of automatic driving and occupy an important position.
The traffic sign recognition algorithm based on deep learning is high in accuracy, and can better cope with special conditions such as shielding, rain and snow weather and the like. At present, a common traffic sign identification method based on a convolutional neural network is fast-RCNN [1] 、SSD [2] And YOLO [3] And the like. The patent "a traffic sign recognition method based on SRCNN" (CN110321803A), can improve the low resolution image recognition accuracy rate and have the advantage that the calculated amount is little. The patent "a traffic sign recognition method based on improve SSD network" (CN110287806A) aims at improving the detection rate of SSD network to small target, realizes SSD algorithm to the detection of small traffic sign.
In the field of traffic sign identification, the anchor frame is often regarded as a group of prior frames with different sizes, and plays a great role in frames such as fast RCNN and SSD due to heuristic prior information. Neural networks typically require the generation of a very large set of anchor blocks, where only a small fraction overlaps with the true ground truth, which creates a large imbalance between positive and negative samples, slowing down the training speed.
Reference to the literature
[1]Ren S,He K,Girshick R,et al.Faster r-cnn:Towards real-time object detection with region proposal networks[C]//Advances in neural information processing systems.2015:91-99.
[2]Huang J,Rathod V,Sun C,et al.Speed/accuracy trade-offs for modern convolutional object detectors[C]//Proceedings of the IEEE conference on computer vision and pattern recognition.2017:7310-7311.
[3]Redmon J,Divvala S,Girshick R,et al.You only look once:Unified,real-time object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition.2016:779-788.
Disclosure of Invention
The invention provides a traffic sign identification method based on a dense connection and attention mechanism, which abandons the anchor frame and uses a dense connection network DenseNet121 as a backbone network for feature extraction, introduces a channel attention mechanism, and is described in detail as follows:
a traffic sign identification method based on a dense connection and channel attention mechanism comprises the following steps:
constructing a data set and carrying out data preprocessing;
building a traffic sign recognition neural network based on intensive connection and attention mechanism through a deep learning framework;
inputting the pictures in the training set into a neural network, obtaining traffic sign category and position information through forward propagation, calculating errors with the information in the true value, performing backward propagation, and continuously updating network parameters until the errors are not reduced;
inputting a picture with a traffic sign, loading the trained model, and outputting a traffic sign recognition result picture.
The method for building the traffic sign recognition neural network based on the intensive connection and attention mechanism through the deep learning framework comprises the following steps of:
the coding path, the decoding path, the output prediction part and the four parallel branches jointly form a U-shaped coding and decoding network.
Further, the decoding path is divided into four parts,
each part consists of a residual block and a channel attention module; the feature map of the decoding path passes through the residual block and then enters the channel attention module to increase the weight of the effective channel and reduce the weight of the ineffective channel.
Wherein, the number of the parallel branches is four,
the first branch inputs the output of the fourth part of the coding path into a decoding path;
the second branch performs characteristic fusion on the output of the third part of the coding path and the output of the first part of the decoding path, and the characteristic fusion is used as the input of the second part of the decoding path;
the third branch performs characteristic fusion on the output of the second part of the coding path and the output of the second part of the decoding path, and inputs the output of the third part of the decoding path;
and the fourth branch performs characteristic fusion on the output of the first part of the encoding path and the output of the third part of the decoding path, and inputs the output of the fourth part of the decoding path.
Further, the output prediction part is formed by connecting three other branches in parallel,
after the fifth branch passes through the two rolling blocks, a feature map of N channels is obtained, wherein N is the number of the classes of the traffic sign, and the probability that the traffic sign belongs to the N classes is calculated through the feature map of the N channels;
the sixth branch circuit can obtain a feature map of two channels after passing through the two rolling blocks, two points X and Y can be obtained by calculating the two channels of the feature map, and one coordinate, namely the coordinate of the central point of the traffic sign, can be predicted;
and the last branch also passes through the two rolling blocks to obtain a feature map of two channels, and the two channels of the feature map are calculated to obtain two values w and h, namely the width and the height of the traffic sign detection frame.
The technical scheme provided by the invention has the beneficial effects that:
1. adopting DenseNet121 as a backbone network to extract features, and adopting a dense connection mode, wherein each layer can receive all the layers in front of the layer as additional input of the layer; the DenseNet realizes the characteristic reuse, has smaller parameters and more efficient calculation, and simultaneously promotes the reverse propagation of the gradient, so that the network is easier to train;
2. by adopting a U-shaped coding and decoding network, the deep layer characteristics and the shallow layer characteristics of the network can be fused together to detect the traffic signs with different scales; repeated coding and decoding operations fully utilize deep characteristics of the network, so that the network has stronger representation capability and can better mix global and local information;
3. in a decoding network, a channel attention module is introduced before each feature fusion branch, useless channel information is fully filtered, beneficial information is reserved and fused into a feature map, and the accuracy of traffic sign identification is improved; meanwhile, a large number of residual error structures are used in the decoding network, so that the nonlinear capacity of the network can be improved, and the problem of network degradation is solved;
4. the channel attention module of the invention simultaneously uses average pooling and maximum pooling, and combines the two pooling together to increase the weight of the effective channel and reduce the weight of the ineffective channel.
Drawings
FIG. 1 is a diagram of an overall neural network architecture for a traffic sign recognition method based on a dense connection and attention mechanism;
FIG. 2 is a schematic diagram of a neural network structure of a Dense connection module Dense Block;
FIG. 3 is a schematic diagram of a neural network structure of an attention module;
fig. 4 is a diagram of the recognition effect of the traffic sign recognition method based on the dense connection and attention mechanism.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention are described in further detail below.
The innovation points of the invention are as follows:
1) the method uses the dense connection network as a backbone network to extract the characteristics, and greatly reduces the parameter quantity of the network and relieves the generation of the gradient explosion problem to a certain extent through characteristic reuse and bypass arrangement;
2) the characteristic diagrams output by different modules of the DenseNet121 are fused, multi-scale information is fully utilized, the characteristic diagram output finally by the neural network has information of targets with various sizes, large-size targets can be distinguished easily, and the recognition rate of small targets can be improved;
3) the network utilizes a channel attention mechanism, and calculates the importance degree between channels through a full connection layer, thereby filtering unimportant channel values. The neural network concentrates attention on important parameters, selects key information, ignores other unimportant information, increases the weight of beneficial parameters and reduces the weight of useless parameters;
4) the traffic sign is identified by using an anchor-free method, the phenomenon of unbalanced quantity of positive and negative samples caused by an anchor frame is avoided, and the accuracy of traffic sign identification is improved.
Example 1
The embodiment of the invention provides a traffic sign identification method based on dense connection and channel attention mechanism, and referring to fig. 1, the method comprises the following steps:
101: constructing a data set and carrying out data preprocessing;
the step 101 is specifically: a data set was downloaded, the data source being the chinese traffic sign data set TT100K (Tsinghua-Tencent 100K) published by the university of qinghua, which was taken from a street view panorama in Tencent. The training set of data sets contained 6107 pictures, and the test set contained 3073 pictures with image sizes of 2048 × 2048 pixels. The invention selects the category with the frequency of appearance more than 100 in the data set for training, and the category has 45 categories in total.
102: building a traffic sign recognition neural network based on dense connection and attention mechanism through a deep learning frame PyTorch;
the traffic sign recognition neural network based on the dense connection and attention mechanism comprises four parts, namely an encoding path, a decoding path, an output prediction part and four parallel branches, and a U-shaped encoding and decoding network is formed together.
The coding path is composed of a feature extraction network DenseNet121, and the input image is firstly subjected to primary extraction of shallow features through a 7 × 7 convolution module. The coding path is divided into four parts, and each part is composed of a rolling block and a dense connecting block. The number of dense connection layers included in the four dense connection blocks is 6, 12, 24, and 16, respectively, for sufficient extraction of image features. A Pool layer is arranged between every two adjacent dense connecting blocks for down-sampling, so that the effect of expanding the receptive field of the network is achieved, and characteristic diagrams of various scales are obtained at the same time. The feature map obtained after each down-sampling is fused with the feature map with the same scale in the decoding path through the parallel branch, so that the network obtains more global feature information.
The decoding path is divided into four parts, each part consisting of a Residual Block and a channel attention Block. The feature map of the decoding path will go through the residual block and then enter the channel attention module, so as to increase the weight of the effective channel and reduce the weight of the ineffective channel.
The number of the parallel branches is four, and the output of the fourth part of the coding path is input into the decoding path by the first branch 1; the second branch 2 path carries out feature fusion on the output of the third part of the coding path and the output of the first part of the decoding path, and the feature fusion is used as the input of the second part of the decoding path; the third branch 3 performs characteristic fusion on the output of the second part of the coding path and the output of the second part of the decoding path, and inputs the output of the third part of the decoding path; the fourth branch 4 performs feature fusion on the output of the first part of the encoding path and the output of the third part of the decoding path, and inputs the output to the fourth part of the decoding path.
After the coding and decoding network, the feature map is restored to the original size and contains various scales and deep semantic features, and the high-dimensional feature map is called a heat map and can predict traffic signs with various sizes. Sending the heat map into an output prediction part, wherein the output prediction part is formed by connecting three branches in parallel, a fifth branch can obtain a characteristic map of an N channel after passing through two convolution blocks, N is the number of the types of the traffic signs, and the probability that the traffic signs belong to the N types is calculated through the characteristic map of the N channel, so that the types of the traffic signs can be predicted; the sixth branch circuit can obtain a feature map of two channels after passing through the two rolling blocks, two points X and Y can be obtained by calculating the two channels of the feature map, and one coordinate, namely the coordinate (X, Y) of the central point of the traffic sign, can be predicted; the last branch (seventh branch) will also get a two-channel feature map after passing through two convolution blocks, and two values w and h, i.e. the width and height of the traffic sign detection frame, will be obtained by calculating the two channels of the feature map.
103: training a model;
and inputting the pictures in the training set into a traffic sign recognition neural network based on dense connection and attention mechanism, obtaining traffic sign category and position information through forward propagation, performing backward propagation with information calculation errors in a ground route, and continuously updating network parameters until the errors are not reduced. And storing the trained network parameters as a model.
104: and inputting a picture with a traffic sign, loading the trained model in the third step, and outputting a traffic sign recognition result picture.
Example 2
The scheme of example 1 is further described below with reference to specific examples, which are described in detail below:
201: constructing a data set and carrying out data preprocessing:
(1) the invention uses a TT100K (Tsinghua-Tencent 100K) data set which is disclosed and is divided into a training set and a testing set. The training set contained 6107 pictures, and the test set contained 3073 pictures, each of 2048 pixels by 2048 pixels. TT100K is captured using a street view panorama in Tencent, covering a total of 180 traffic sign categories in China, but many of these categories are rare and occur less frequently in the data set. The invention adopts 45 types of traffic signs with the frequency of occurrence more than 100 in the data set to train.
(2) Since the whole image cannot be directly trained and trained due to the GPU video memory limitation, the picture in (1) is clipped, and the training set of 2048 × 2048 is clipped to 512 × 512 pixels. The data set is then processed and converted to json files in standard coco data format for network reading.
202: a traffic sign recognition neural network based on dense connection and attention mechanism is built through a deep learning framework PyTorch, the traffic sign recognition neural network is of a U-shaped structure, and the detailed description is given below:
(1) the main structure of the neural network is shown in fig. 1, and a U-shaped coding and decoding network is formed by four parts, namely a coding path formed by a feature extraction network densnet 121, a decoding path formed by a channel attention module and a Residual Block, an output prediction part and four parallel branches.
a) The coding path is composed of a feature extraction network DenseNet121 and is divided into four parts: the first part consists of a 7 × 7 convolution Block and a Dense connection Block density Block, the 7 × 7 convolution Block is for the preliminary extraction of shallow features, the 7 × 7 convolution Block contains 1 convolution Layer with convolution kernel size of 7 × 7, expansion rate of 1 × 1, and number of channels of 64, 1 Batch Norm Layer, 1 Relu Layer, and one Pool Layer, the step size of 7 × 7 convolution is 2, the Pool Layer is the largest pooling Layer, and the number of Dense connection Layer density in the Dense connection Block density Block is 6(L ═ 6); the second part consists of a 3 × 3 convolution Block and a Dense connection Block Dense Block, wherein the 3 × 3 convolution Block contains 1 convolution Layer with the convolution kernel size of 3 × 3, the expansion rate of 1 × 1 and the channel number of 128, 1 Batch Norm Layer, 1 Relu Layer and one Pool Layer, the step size of the 3 × 3 convolution is 1, the Pool Layer is the maximum pooling Layer, and the number of Dense connection layers Dense Layer in the Dense connection Block Dense Block is 12(L ═ 12); the third part consists of a 3 × 3 convolution Block and a Dense connection Block Dense Block, wherein the 3 × 3 convolution Block contains 1 convolution Layer with the convolution kernel size of 3 × 3, the expansion rate of 1 × 1 and the channel number of 256, 1 Batch Norm Layer, 1 Relu Layer and one Pool Layer, the step size of the 3 × 3 convolution is 1, the Pool Layer is the maximum pooling Layer, and the number of Dense connection layers Dense Layer in the Dense connection Block Dense Block is 24(L ═ 24); the fourth part consists of a 3 × 3 convolution Block and a Dense connection Block Dense Block, the 3 × 3 convolution Block contains 1 convolution Layer with convolution kernel size of 3 × 3, expansion rate of 1 × 1 and channel number of 512, 1 Batch Norm Layer, 1 Relu Layer and one Pool Layer, the step size of 3 × 3 convolution is 1, the Pool Layer is the maximum pooling Layer, and the number of Dense connection layers Dense Layer in the Dense connection Block Dense Block is 16(L ═ 16). Each convolution block is provided with a Pool layer for down-sampling, so that the receptive field of the network is enlarged, and characteristic maps with various scales are obtained. The feature map obtained after each down-sampling is fused with the feature map with the same scale in the decoding path through the parallel branch, so that the network obtains more global feature information.
b) The decoding path is divided into four parts, each part consisting of a Residual Block and a channel attention Block. The feature map of the decoding path will go through the residual block and then enter the channel attention module in order to increase the weight of the valid channel and decrease the weight of the invalid channel. The four Residual blocks are identical in structure, each Residual Block comprises two series-connected intra-frame convolutions, and each intra-frame convolution Block comprises 1 convolution layer with convolution kernel size of 3 × 3, expansion rate of 1 × 1, step size of 1 and channel number of 256, 1 Batch Norm layer and 1 Relu layer. And a channel attention module is arranged behind each residual block, so that useless channel information is fully filtered, and beneficial information is reserved and fused into the feature map. After the coding and decoding network, the feature map is restored to the original size and contains various scales and deep semantic features, and the feature map is called a heat map and can predict traffic signs with various sizes.
c) The number of parallel branches is 4: the first branch inputs the output of the fourth part of the coding path into the decoding path after passing through a 1 x 1 convolution block; the second branch performs characteristic fusion on the output of the third part of the coding path and the output of the first part of the decoding path after passing through a 1-to-1 convolution block, and the output of the third part of the coding path and the output of the first part of the decoding path are used as the input of the second part of the decoding path; the third branch circuit conducts characteristic fusion on the output of the second part of the coding path and the output of the second part of the decoding path after passing through the convolution block of 1 x 1, and the output is input into the third part of the decoding path; and the fourth branch circuit performs characteristic fusion on the output of the first part of the encoding path and the output of the third part of the decoding path after the output of the first part of the encoding path passes through a 1 × 1 convolution block, and inputs the output into the fourth part of the decoding path. Each of the four 1 x 1 convolution blocks contains 1 convolution layer with a convolution kernel size of 1 x 1, an expansion rate of 1 x 1, a step size of 1, 1 Batch Norm layer, and 1 Relu layer.
d) The output prediction part firstly passes through a 3 x 3 convolution block, contains 1 convolution layer with convolution kernel size of 3 x 3, expansion rate of 1 x 1, step length of 1 and channel number of 256, 1 Batch Norm layer and 1 Relu layer, and then is divided into three stages of branches connected in parallel. Each branch consists of a convolution block of 3 × 3 and a convolution block of 1 × 1, the convolution blocks of 3 × 3 of the three branches are identical, and each branch comprises 1 convolution layer with a convolution kernel size of 3 × 3, an expansion rate of 1 × 1 and a channel number of 128, 1 Batch Norm layer and 1 Relu layer. The 1 × 1 convolution block of the fifth branch contains 1 convolution layer with convolution kernel size of 1 × 1, expansion rate of 1 × 1, step length of 1, channel number of 45, 1 Batch Norm layer and 1 Relu layer, the branch can obtain a characteristic diagram of N channels after passing through the two convolution blocks, N is the category number of the traffic sign, and the probability that the traffic sign belongs to N categories is calculated through the characteristic diagram of the N channels, so that the category of the traffic sign can be predicted; the other two branches (the sixth branch and the seventh branch) have the same 1 × 1 convolution block, and each branch comprises 1 convolution layer with convolution kernel size of 1 × 1, expansion rate of 1 × 1, step length of 1 and channel number of 2, 1 Batch Norm layer and 1 Relu layer, the sixth branch and the seventh branch can predict the central point coordinate of the traffic sign and the width and height information of the traffic sign detection frame respectively, wherein the sixth branch can obtain a feature map of two channels after passing through the two rolling blocks, two points X and Y can be obtained by calculating the two channels of the feature map, a coordinate can be predicted, i.e. the coordinates (X, Y) of the center point of the traffic sign, the seventh branch will also get a two-channel feature map after passing through the two convolution blocks, two values w and h, i.e. the width and height of the traffic sign detection box, are obtained by calculating the two channels of the feature map. Through the prediction information of the fifth branch, the sixth branch and the seventh branch, the type of the traffic sign can be identified, and the position information of the traffic sign in the picture can be calculated.
(2) The Dense connection module Dense Block (L ═ 6) has a structure shown in fig. 2, and includes 6 Dense connection layers Dense Layer. The Dense connection module Dense Block is generally composed of L Dense Layer Dense connections. The neural network adopts DenseNet121 as a coding path, and the number of dense connection layers contained in four dense connection blocks is respectively 6, 12, 24 and 16, so that the neural network is used for fully extracting image characteristics. Each densely connected layer is composed of one 3 × 3 convolution block and one 1 × 1 convolution block, the 3 × 3 convolution block contains 1 convolution layer with convolution kernel size of 3 × 3, expansion rate of 1 × 1 and step size of 1, 1 Batch Norm layer and 1 Relu layer, and the 1 × 1 convolution block contains 1 convolution layer with convolution kernel size of 1 × 1, expansion rate of 1 × 1 and step size of 1, 1 Batch Norm layer and 1 Relu layer.
(3) The channel attention module is divided into three branches (i.e., the eighth branch, the ninth branch and the tenth branch) as shown in fig. 3. The eighth branch performs global maximum pooling (MaxPool) on the input feature map, performs Linear transformation (Linear), and performs normalization operation through a Sigmoid function; the ninth branch performs global average pooling (AvgPool) on the input feature map, performs Linear transformation (Linear), performs normalization operation through a Sigmoid function, and adds the normalized operation to the output of the eighth branch; and after the outputs of the eighth branch and the ninth branch are added, normalizing by the Sigmoid function again, and multiplying by the output characteristic diagram of the tenth branch, wherein the characteristic diagram output by the tenth branch is the original input characteristic diagram of the attention module.
203: training a model;
and inputting the image which is cut in the first step into the traffic sign recognition neural network which is built in the second step and is based on the dense connection and attention mechanism, and obtaining the category information of the traffic sign and the position information of the detection frame through forward propagation. And calculating the error of the traffic sign category and position information predicted by the neural network and the label information in the ground route, reversely propagating the error term from the output layer to the hidden layer by layer, updating network parameters until the network parameters reach the input layer, and continuously feeding back and optimizing by using an ADAM (adaptive dynamic adaptive analysis) optimizer until the error is not reduced any more.
The batch _ size of the network is set to 4, namely 4 traffic sign pictures of 512 x 512 are trained each time; the epoch is set to 110, i.e., the entire network requires 110 rounds of training. And storing the trained network parameters as a model.
204: and inputting a traffic sign picture to be detected and identified, and loading the trained model in the third step, namely outputting an identification result picture, as shown in fig. 4.
205: the invention uses precision ratio (AP) and recall ratio (AR) to measure the effect of the algorithm. 3073 test set pictures are input for detection and calculation, and then AP is 95.5 and AR is 99.6 are calculated.
In the embodiment of the present invention, except for the specific description of the model of each device, the model of other devices is not limited, as long as the device can perform the above functions.
Those skilled in the art will appreciate that the drawings are only schematic illustrations of preferred embodiments, and the above-described embodiments of the present invention are merely provided for description and do not represent the merits of the embodiments.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (3)

1. A traffic sign identification method based on a dense connection and channel attention mechanism is characterized by comprising the following steps:
constructing a data set and carrying out data preprocessing;
building a traffic sign recognition neural network based on intensive connection and attention mechanism through a deep learning framework;
inputting the pictures in the training set into a neural network, obtaining traffic sign category and position information through forward propagation, calculating errors with the information in the true value, performing backward propagation, and continuously updating network parameters until the errors are not reduced;
inputting a picture with a traffic sign, loading the trained model, and outputting a traffic sign recognition result picture;
wherein, the building of the traffic sign recognition neural network based on the intensive connection and attention mechanism through the deep learning framework comprises the following steps:
the coding path, the decoding path, the output prediction part and the four parallel branches jointly form a U-shaped coding and decoding network;
wherein the decoding path is divided into four parts,
each part consists of a residual block and a channel attention module; the feature map of the decoding path firstly passes through the residual block and then enters the channel attention module to increase the weight of the effective channel and reduce the weight of the ineffective channel.
2. The traffic sign recognition method based on the dense connection and channel attention mechanism as claimed in claim 1, wherein the number of the parallel branches is four,
the first branch inputs the output of the fourth part of the coding path into a decoding path;
the second branch performs characteristic fusion on the output of the third part of the coding path and the output of the first part of the decoding path, and the characteristic fusion is used as the input of the second part of the decoding path;
the third branch performs characteristic fusion on the output of the second part of the coding path and the output of the second part of the decoding path, and inputs the output of the third part of the decoding path;
and the fourth branch performs characteristic fusion on the output of the first part of the encoding path and the output of the third part of the decoding path, and inputs the output of the fourth part of the decoding path.
3. The traffic sign recognition method based on the dense connection and channel attention mechanism as claimed in claim 1, wherein the output prediction part is formed by connecting three other branches in parallel,
after the fifth branch passes through the two rolling blocks, a feature map of N channels is obtained, wherein N is the number of the classes of the traffic sign, and the probability that the traffic sign belongs to the N classes is calculated through the feature map of the N channels;
the sixth branch circuit can obtain a feature map of two channels after passing through the two rolling blocks, two points X and Y can be obtained by calculating the two channels of the feature map, and one coordinate, namely the coordinate of the central point of the traffic sign, can be predicted;
and the last branch also passes through the two rolling blocks to obtain a feature map of two channels, and the two channels of the feature map are calculated to obtain two values w and h, namely the width and the height of the traffic sign detection frame.
CN202010255951.9A 2020-04-02 2020-04-02 Traffic sign identification method based on dense connection and attention mechanism Active CN111582029B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010255951.9A CN111582029B (en) 2020-04-02 2020-04-02 Traffic sign identification method based on dense connection and attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010255951.9A CN111582029B (en) 2020-04-02 2020-04-02 Traffic sign identification method based on dense connection and attention mechanism

Publications (2)

Publication Number Publication Date
CN111582029A CN111582029A (en) 2020-08-25
CN111582029B true CN111582029B (en) 2022-08-12

Family

ID=72122485

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010255951.9A Active CN111582029B (en) 2020-04-02 2020-04-02 Traffic sign identification method based on dense connection and attention mechanism

Country Status (1)

Country Link
CN (1) CN111582029B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112036467B (en) * 2020-08-27 2024-01-12 北京鹰瞳科技发展股份有限公司 Abnormal heart sound identification method and device based on multi-scale attention neural network
CN112163506A (en) * 2020-09-25 2021-01-01 伏羲九针智能科技(北京)有限公司 Vein blood vessel identification method, device and equipment based on ultrasound
CN112364193A (en) * 2020-11-17 2021-02-12 同济大学 Image retrieval-oriented method for fusing multilayer characteristic deep neural network model
CN112598126A (en) * 2020-12-04 2021-04-02 北京迈格威科技有限公司 Neural network construction method, device, equipment and medium
CN113887373B (en) * 2021-09-27 2022-12-16 中关村科学城城市大脑股份有限公司 Attitude identification method and system based on urban intelligent sports parallel fusion network
CN114463772B (en) * 2022-01-13 2022-11-25 苏州大学 Deep learning-based traffic sign detection and identification method and system
CN116721403A (en) * 2023-06-19 2023-09-08 山东高速集团有限公司 Road traffic sign detection method
CN116978051A (en) * 2023-08-03 2023-10-31 杭州海量信息技术有限公司 Method and device for extracting key information of form image

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110619279A (en) * 2019-08-22 2019-12-27 天津大学 Road traffic sign instance segmentation method based on tracking
CN110909674A (en) * 2019-11-21 2020-03-24 清华大学苏州汽车研究院(吴江) Traffic sign identification method, device, equipment and storage medium
CN110930397A (en) * 2019-12-06 2020-03-27 陕西师范大学 Magnetic resonance image segmentation method and device, terminal equipment and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110619279A (en) * 2019-08-22 2019-12-27 天津大学 Road traffic sign instance segmentation method based on tracking
CN110909674A (en) * 2019-11-21 2020-03-24 清华大学苏州汽车研究院(吴江) Traffic sign identification method, device, equipment and storage medium
CN110930397A (en) * 2019-12-06 2020-03-27 陕西师范大学 Magnetic resonance image segmentation method and device, terminal equipment and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Attention-based Neural Network for Traffic Sign Detection;J. Zhang 等;《2018 24th International Conference on Pattern Recognition (ICPR)》;20181129;全文 *
多尺度非局部注意力网络的小目标检测算法;李金宝 等;《计算机科学与探索》;20191231;第14卷(第10期);第1744-1749页 *
结合注意力机制的交通标志识别;马平 等;《科技风》;20190726(第21期);全文 *

Also Published As

Publication number Publication date
CN111582029A (en) 2020-08-25

Similar Documents

Publication Publication Date Title
CN111582029B (en) Traffic sign identification method based on dense connection and attention mechanism
CN107665603B (en) Real-time detection method for judging parking space occupation
CN111695448B (en) Roadside vehicle identification method based on visual sensor
CN108537824B (en) Feature map enhanced network structure optimization method based on alternating deconvolution and convolution
CN111079604A (en) Method for quickly detecting tiny target facing large-scale remote sensing image
CN111753682B (en) Hoisting area dynamic monitoring method based on target detection algorithm
CN114898352A (en) Method for simultaneously realizing image defogging and license plate detection
CN108304786A (en) A kind of pedestrian detection method based on binaryzation convolutional neural networks
CN111462140B (en) Real-time image instance segmentation method based on block stitching
CN115205264A (en) High-resolution remote sensing ship detection method based on improved YOLOv4
CN114267025A (en) Traffic sign detection method based on high-resolution network and light-weight attention mechanism
CN113313031B (en) Deep learning-based lane line detection and vehicle transverse positioning method
CN115861951B (en) Complex environment lane line accurate detection method based on dual-feature extraction network
CN112819000A (en) Streetscape image semantic segmentation system, streetscape image semantic segmentation method, electronic equipment and computer readable medium
CN113298817A (en) High-accuracy semantic segmentation method for remote sensing image
CN115100549A (en) Transmission line hardware detection method based on improved YOLOv5
CN115661032A (en) Intelligent pavement disease detection method suitable for complex background
CN110852157A (en) Deep learning track line detection method based on binarization network
CN103605960B (en) A kind of method for identifying traffic status merged based on different focal video image
CN112785610B (en) Lane line semantic segmentation method integrating low-level features
CN116129327A (en) Infrared vehicle detection method based on improved YOLOv7 algorithm
CN114782949B (en) Traffic scene semantic segmentation method for boundary guide context aggregation
CN114708560B (en) YOLOX algorithm-based illegal parking detection method and system
CN114820931B (en) Virtual reality-based CIM (common information model) visual real-time imaging method for smart city
CN116503709A (en) Vehicle detection method based on improved YOLOv5 in haze weather

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant