CN108416270B

CN108416270B - Traffic sign identification method based on multi-attribute combined characteristics

Info

Publication number: CN108416270B
Application number: CN201810117900.2A
Authority: CN
Inventors: 孙伟; 杜宏吉; 张小瑞; 赵玉舟; 施顺顺; 杨翠芳
Original assignee: Nanjing University of Information Science and Technology
Current assignee: NANJING LOONG SHIELD INTELLIGENT TECHNOLOGY CO.,LTD.
Priority date: 2018-02-06
Filing date: 2018-02-06
Publication date: 2021-07-06
Anticipated expiration: 2038-02-06
Also published as: CN108416270A

Abstract

The invention relates to a traffic sign recognition method based on multi-attribute combined characteristics, which comprises the steps of preprocessing an image, designing a convolutional neural network structure, automatically extracting traffic sign characteristics through a CNN network, extracting final 3-layer characteristic graphs in order to fully utilize useful information of CNN multi-layer characteristics, forming a characteristic matrix of 3 scales by performing multi-scale pooling operation on each extracted layer of characteristic graph, and expanding and cascading the characteristic matrices of 3 scales into column vectors according to columns; then, cascading the obtained 3 column vectors into a combined feature vector with multiple scales and multiple attributes; and finally, classifying the combined feature vectors through an ELM classifier, thereby efficiently finishing the identification and classification of the traffic signs.

Description

Traffic sign identification method based on multi-attribute combined characteristics

Technical Field

The invention relates to a traffic sign identification method based on multi-attribute combined characteristics, and belongs to the field of traffic sign identification in intelligent traffic systems.

Background

In recent years, traffic sign recognition is widely applied to aspects such as driver assistance systems, unmanned intelligent automobiles, highway maintenance and the like, and the traditional traffic sign recognition method is difficult to meet the requirements of high accuracy and real-time performance.

In recent two years, traffic sign recognition methods based on deep learning become popular to research, for example, Convolutional Neural Networks (CNNs) have been successfully applied to traffic sign recognition systems, but it is common to use the last layer of features of CNNs for classifier training, and the features may not contain enough useful information to realize the classification of traffic signs. Therefore, if the features extracted by the multilayer network can be fully utilized for classifier training, the recognition accuracy can be improved, and the training time and the computation amount can be reduced to a certain extent.

Most of the current CNN researches focus on improving the classification precision, the learning speed is neglected, and the learning speed is just required to be improved for the traffic sign identification to ensure the real-time requirement. Meanwhile, research shows that the generalization capability of the CNN has certain limitation, and an Extreme Learning Machine (ELM) has good generalization performance. ELM belongs to a single hidden layer feedforward neural network, and the learning speed is faster than other traditional learning algorithms under the condition of ensuring the learning precision.

Disclosure of Invention

The technical problem to be solved by the invention patent is as follows: aiming at the problems and the defects in the background technology, the traffic sign identification method based on the multi-attribute combined characteristics is provided, the traffic sign can be accurately identified, effective information can be extracted, and technical support is provided for the aspects of unmanned intelligent automobiles, driver assistance systems, road maintenance and the like.

The technical scheme provided by the invention is as follows: a traffic sign identification method based on multi-attribute combined features comprises the following steps:

step 1, preprocessing a traffic sign image and normalizing image data;

step 2, designing a convolutional neural network structure;

step 3, training the convolutional neural network in the step 2, and extracting the preprocessed traffic sign image features by using the network;

step 4, extracting the feature maps of the last three layers of the convolutional neural network, forming feature matrixes of three scales by multi-scale pooling operation on the extracted feature maps of the last three layers, expanding and cascading the feature matrixes of the three scales according to columns to form column vectors, and then cascading the obtained three column vectors to form a combined feature vector with multi-scale and multi-attribute;

step 5, designing an ELM classifier model;

and 6, classifying the combined feature vectors through an ELM classifier.

The technical scheme is further designed as follows: the convolutional neural network structure in the step 2 is composed of 8 layers, and comprises 1 input layer, 1 full-connection layer, 3 convolutional layers and 3 pooling layers, wherein the convolutional layers and the pooling layers are arranged in a cross mode.

The number of iterations in training the convolutional neural network in step 3 is set to 30.

The ELM classifier model is as follows:

the labeled feature vector input by the ELM classifier is represented as (f)_i,t_i) N, N represents the number of training samples; wherein f is_i＝[f_i1,f_i2,...,f_in]^T∈RⁿRepresenting the joint feature vector of the ith sample, wherein n is the number of input neurons of the ELM model; t is t_i＝[t_i1,t_i2,...,t_im]^T∈R^mRepresenting a label vector of the ith sample, wherein m is the number of output neurons of the ELM model; j 1, 2.. M, M denotes the number of hidden layer neurons, β_jRepresenting a weight value connecting the jth hidden node and the output node; w is a_ijRepresenting a weight vector connecting the ith sample and the jth hidden node; b_jRepresents the bias of the jth hidden node; o_iAn output vector representing the ith sample; t is t_iA label vector representing the ith sample; g (-) represents the activation function.

The invention has the beneficial effects that:

(1) the invention adopts the convolutional neural network to extract the features and classify the extreme learning machine, fully combines the advantages of multilayer feature extraction of the convolutional neural network and the advantages of high generalization performance and learning speed of the extreme learning machine, and the like, not only can improve the identification accuracy of the traffic sign, but also can ensure the real-time requirement of the traffic sign identification.

(2) The invention adopts the convolutional neural network to extract the multilayer traffic sign characteristic diagram, the multilayer characteristic contains more detailed information, the structure is more stable, the useful information of the multilayer characteristic is fully utilized, and the identification accuracy of the traffic sign can be effectively improved.

(3) The method adopts multi-scale pooling to sample the extracted characteristic graph, so that the convolutional neural network can process input with any size, output with the same dimensionality can be obtained aiming at input with different sizes, and the problem that an input image in the traditional convolutional neural network needs to be fixed in size is solved; meanwhile, the invariance of the characteristic extracted by the convolutional neural network can be enhanced by performing multi-scale pooling operation on the characteristic graph after the convolutional operation, so that the accuracy and the robustness of target identification are improved.

(4) Compared with the existing traffic sign identification technology, the method only needs simple normalization processing in the preprocessing process, and greatly simplifies the preprocessing flow which is complicated in the previous period.

Drawings

FIG. 1 is a flow chart of the identification method of the present invention.

Fig. 2 is a schematic structural diagram of CNN extracting features of the last three layers.

FIG. 3 is a schematic diagram of a multi-scale pooling operation.

FIG. 4 is a diagram of an ELM classifier model.

Detailed Description

The traffic sign identification method based on the multi-attribute combined characteristic is further explained with reference to the attached drawings.

As shown in fig. 1, the traffic sign recognition method of the present invention includes the following steps:

step 1: preprocessing a database image;

the traffic sign image adopted in this embodiment mainly comes from the GTSRB data set and the traffic sign image in the natural scene shot by the intelligent camera, and total 5000 training samples and 1000 test samples contain 43 types of traffic signs.

First, the picture size is uniformly adjusted to 48 × 48 pixels, and normalization processing is performed. The training samples need to carry labels, and the labeled training samples are denoted as (x)_i,t_i) N, N represents the number of training samples;x_ifeature vector representing the ith sample, t_iThe label vector representing the ith sample.

Step 2: designing a CNN network structure;

the CNN network structure designed by the patent comprises 8 layers, including 1 input layer, 1 full-connection layer, 3 convolutional layers (C1, C3, C5) and 3 pooling layers (P2, P4, P6), wherein the convolutional layers and the pooling layers are arranged in a cross mode. The input layer is a traffic sign image sample with the pixel value size of 48 multiplied by 48, and input is divided into three color dimensions of RGB. Convolutional layer C1 has 100 feature maps of size 46 × 46, convolutional kernel size 3 × 3, and convolutional step size 1; pooling layer P2 has 100 feature maps of 23 × 23, pooled kernel size of 2 × 2, and stride of 2; convolutional layer C3 has 150 feature maps of size 20 × 20, convolutional kernel size 4 × 4, and stride 1; pooling layer P4 has 150 feature maps of size 10 × 10, pooled kernel size 2 × 2, and stride 2; convolutional layer C5 has 250 feature maps of size 8 × 8, convolutional kernel size 3 × 3, and stride 1; pooling layer P6 has 250 feature maps of size 4X 4, pooled kernel size 2X 2, and stride 2. Since the CNN network is only used for feature extraction and not for classification, the final fully-connected layer is equivalent to a common forward neural network (SLFN) classifier, having a total of 43 neurons, representing 43 different traffic sign classes.

The activation function of each neuron in this patent uses a hyperbolic tangent function:

f(x)＝tanh(x)＝(e^x-e^-x)/(e^x+e^-x)。

and step 3: training a CNN network, and automatically extracting the traffic sign image characteristics of each level of a sample;

the embodiment trains the CNN network designed in step 2 by using an online gradient descent method. The training iteration number is set to 30 in the embodiment, overfitting can occur when the iteration number is too high, and underfitting can occur when the iteration number is too low. The network initial weight is initialized within the range of [ -0.01,0.01 ].

And 4, step 4: extracting feature maps of the last three layers of the CNN network, and combining the feature maps to form a multi-scale multi-attribute combined feature vector by using multi-scale pooling operation;

the traditional CNN usually uses the last layer of characteristics of the network for identification and classification, can not completely show the detailed characteristics of the traffic sign, and can more comprehensively and more abundantly show the multi-attribute characteristics of the traffic sign by combining the multi-layer characteristics. Firstly, extracting a characteristic diagram of the last three layers (P4, C5 and P6) during feedforward training; then, using multi-scale pooling operation to form 3 feature matrices, finally expanding and cascading the 3 feature matrices by columns into a column vector with multi-scale and multi-features, and the structural diagram of CNN extracting the features of the last three layers is shown in fig. 2.

The multi-scale pooling adopts a plurality of sampling sizes and sampling step sizes, and no matter how large the size of the extracted feature map is, the multi-scale pooling outputs feature matrixes of 3 different scales, namely 1 × 1 × a, 2 × 2 × a and 3 × 3 × a, wherein a represents the number of the extracted feature maps, and the multi-scale pooling operation is schematically shown in fig. 3.

In this embodiment, the P4 layers are subjected to multi-scale pooling to obtain 3 feature matrices with sizes of 1 × 150, 4 × 150, and 9 × 150, and are sequentially cascaded to obtain a fixed-size 14 × 150-2100 × 1 feature column vector; obtaining 3 characteristic matrixes with the sizes of 1 × 250, 4 × 250 and 9 × 250 by using the C5 layer after multi-scale pooling, and sequentially cascading to obtain a 14 × 250 ═ 3500 × 1 characteristic column vector with a fixed size; the P6 layers are subjected to multi-scale pooling to obtain 3 feature matrices with the sizes of 1 × 250, 4 × 250 and 9 × 250, and the feature matrices are sequentially cascaded to obtain a 14 × 250 ═ 3500 × 1 feature column vector with a fixed size.

And 5: designing an ELM classifier;

the extracted feature maps of the P4, C5 and P6 layers are cascaded by 3 column vectors formed after multi-scale pooling operation to form a combined feature vector f with multi-scale and multi-attributes_iAnd serves as an input of the ELM model. The labeled feature vector input by the ELM classifier is represented as (f)_i,t_i) N, N represents the number of training samples. Wherein f is_i＝[f_i1,f_i2,...,f_in]^T∈RⁿRepresenting the joint feature vector of the ith sample, wherein n is the number of input neurons of the ELM model; t is t_i＝[t_i1,t_i2,...,t_im]^T∈R^mAnd m is the number of output neurons of the ELM model. The ELM classifier model is shown in FIG. 4.

The ELM model is expressed as:

wherein M represents the number of hidden layer neurons, j ═ 1, 2.., M; beta is a_jRepresenting a weight value connecting the jth hidden node and the output node; w is a_ijRepresenting a weight vector connecting the ith sample and the jth hidden node; b_jRepresents the bias of the jth hidden node; o_iAn output vector representing the ith sample; t is t_iA label vector representing the ith sample; g (-) represents the activation function.

Joint feature vector { f) of all training samples is input_iH, with y_iRepresenting the actual output vector, equation (1) is further simplified to:

H_w,b,fβ＝Y (2)

wherein beta is the output weight between the hidden layer and the output neuron,

h is the output matrix of hidden layer neurons,

wherein, the constant C is a cost parameter and represents a regularization factor, and C is 2000; ξ represents the error tolerance parameter introduced to ensure that the ELM model fits all the training samples. Solving equation (3) by using a Lagrange multiplier method to obtain:

step 6: training an ELM classifier;

the method comprises the following specific steps:

6.1 input Joint features of training samples (f)_i,t_i) (ii) a Using sigmoid function as activation function in the form of

The hidden layer node M is set to 10000;

6.2, randomly generating parameters (w, b) of the hidden layer;

6.3, calculating a hidden layer output matrix H;

6.4, calculating an output weight beta according to a formula (4);

6.5 calculating the output vector o according to the formula (1)_i，o_iIs a binary target vector; the output neuron m represents the type of the traffic sign, and m is 43 in the patent. If the ith training sample x_iBelonging to the k-th traffic sign, then o_iThe k-th element in (1) and the other elements are 0.

And 7: traffic sign detection

According to the step 1, preprocessing a traffic sign image sample to be detected, extracting a characteristic diagram of P4, C5 and P6 layers through a trained CNN network after preprocessing, cascading characteristic column vectors to form a combined characteristic vector with multiple scales and multiple attributes after using multi-scale pooling operation, then identifying the traffic sign by taking the combined characteristic vector as the input of an ELM classifier, and if the output vector o is output_iIf the kth element of (1) is 1, it indicates that the sample to be detected belongs to the kth traffic sign.

The technical solutions of the present invention are not limited to the above embodiments, and all technical solutions obtained by using equivalent substitution modes fall within the scope of the present invention.

Claims

1. A traffic sign identification method based on multi-attribute combined features is characterized by comprising the following steps:

step 1, preprocessing a traffic sign image;

step 2, designing a convolutional neural network structure; the convolutional neural network comprises 1 input layer, 1 full-connection layer, 3 convolutional layers C1, C3, C5 and 3 pooling layers P2, P4 and P6;

convolutional layer C1 has 100 feature maps of size 46 × 46, convolutional kernel size 3 × 3, and convolutional step size 1; pooling layer P2 has 100 feature maps of 23 × 23, pooled kernel size of 2 × 2, and stride of 2; convolutional layer C3 has 150 feature maps of size 20 × 20, convolutional kernel size 4 × 4, and stride 1; pooling layer P4 has 150 feature maps of size 10 × 10, pooled kernel size 2 × 2, and stride 2; convolutional layer C5 has 250 feature maps of size 8 × 8, convolutional kernel size 3 × 3, and stride 1; pooling layer P6 has 250 feature maps of size 4 × 4, pooled kernel size 2 × 2, and stride 2;

the P4 layers are subjected to multi-scale pooling to obtain 3 characteristic matrixes with the sizes of 1 × 150, 4 × 150 and 9 × 150, and are sequentially cascaded to obtain a characteristic column vector of which the size is 14 × 150 which is 2100 × 1 and the fixed size; obtaining 3 characteristic matrixes with the sizes of 1 × 250, 4 × 250 and 9 × 250 by using the C5 layer after multi-scale pooling, and sequentially cascading to obtain a 14 × 250 ═ 3500 × 1 characteristic column vector with a fixed size; obtaining 3 feature matrixes with the sizes of 1 × 250, 4 × 250 and 9 × 250 by using the P6 layer after multi-scale pooling, and sequentially cascading to obtain a 14 × 250 ═ 3500 × 1 feature column vector with a fixed size;

step 5, designing an ELM classifier model;

the constraint optimization formula of the ELM classifier model in the training process is as follows:

st.Hβ＝T-ξ

wherein, the constant C is a cost parameter and represents a regularization factor; ξ represents an error tolerant parameter to ensure that the ELM model fits all training samples;

solving the above equation by using a Lagrange multiplier method to obtain:

and 6, classifying the combined feature vectors through an ELM classifier.

2. The method for recognizing the traffic sign based on the multi-attribute combined characteristic as claimed in claim 1, wherein: the convolutional neural network structure in the step 2 is composed of 8 layers, and comprises 1 input layer, 1 full-connection layer, 3 convolutional layers and 3 pooling layers, wherein the convolutional layers and the pooling layers are arranged in a cross mode.

3. The method for recognizing traffic signs based on multi-attribute combined features as claimed in claim 2, wherein: the ELM classifier model is as follows:

the labeled feature vector input by the ELM classifier is represented as (f)_i,t_i) N, N represents the number of training samples; wherein f is_i＝[f_i1,f_i2,...,f_in]^T∈RⁿRepresenting the joint feature vector of the ith sample, n is an ELM modelInputting the number of neurons; t is t_i＝[t_i1,t_i2,...,t_im]^T∈R^mRepresenting a label vector of the ith sample, wherein m is the number of output neurons of the ELM model; j 1, 2.. M, M denotes the number of hidden layer neurons, β_jRepresenting a weight value connecting the jth hidden node and the output node; w is a_ijRepresenting a weight vector connecting the ith sample and the jth hidden node; b_jRepresents the bias of the jth hidden node; o_iAn output vector representing the ith sample; t is t_iA label vector representing the ith sample; g (-) represents the activation function.