CN113255555A

CN113255555A - Method, system, processing equipment and storage medium for identifying Chinese traffic sign board

Info

Publication number: CN113255555A
Application number: CN202110628945.8A
Authority: CN
Inventors: 江昆; 杨殿阁; 冯润泽; 于伟光; 杨蒙蒙
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2021-06-04
Filing date: 2021-06-04
Publication date: 2021-08-13

Abstract

The invention relates to a method, a system, a processing device and a storage medium for identifying a Chinese traffic sign board, wherein the method comprises the following steps: the method comprises the steps that data set labeling is carried out on a traffic sign image data set by using two-dimensional bounding box information of the traffic sign and category information of the sign; classifying the labeled data sets to obtain training sets and test sets of major classes and minor classes; designing a detection network and a classification network; and training the detection network and the classification network according to the acquired training data for recognizing the Chinese traffic sign. The invention is based on the deep neural network, and can be widely applied to the identification of the traffic signboards in the complex road scene of China in order to improve the identification and detection precision.

Description

Method, system, processing equipment and storage medium for identifying Chinese traffic sign board

Technical Field

The invention relates to the technical field of traffic sign board identification, in particular to a Chinese traffic sign board identification method, a Chinese traffic sign board identification system, Chinese traffic sign board identification processing equipment and a storage medium, wherein the Chinese traffic sign board identification method, the Chinese traffic sign board identification system, the Chinese traffic sign board identification processing equipment and the storage medium are based on computer vision and adopt multi-stage (detection-major classification-minor classification) identification.

Background

It is crucial for autonomous vehicles to be able to accurately and consistently identify traffic signs in a road environment. With the development of the effect of the deep neural network, the deep neural network is commonly used by the academic community for target identification. However, since a large amount of data is required for training the deep neural network, many foreign enterprises and universities release traffic sign data sets, such as a german traffic sign data set GTSRB, a belgium traffic sign data set BelgiumTS, and a U.S. traffic sign data set LISA.

However, the traffic sign board in china is different from the foreign traffic sign board, so the neural network trained based on the foreign data is not suitable for the complex traffic scene in china. Zhu Zheng et al proposed the Chinese traffic sign data set TT-100K (Tsinghua Tencent 100K) and CCTSDB (Changsha University of Science and Technology Chinese traffic sign detection benchmark). Domestic data sets represented by TT-100K and CCTSDB data sets do not include lane marks, so that the lane marks are frequently recognized as indication marks by mistake in practical application, and the classification of foreign data sets is similar to the classification method of domestic data sets, and the same problems exist.

For an automatic driving vehicle, the real-time performance of visual detection is very important, and the single-stage target recognition algorithm has better real-time performance and good recognition effect. The current popular single-stage target recognition algorithm comprises a YOLO series algorithm and an SSD algorithm, wherein the YOLOv3-spp (spatial pyramid posing) algorithm in the YOLO series has stronger generalization capability and higher recall rate of small targets, and is very suitable for the recognition of traffic signboards. But since the YOLOv3-spp algorithm is a single-stage recognition algorithm, the accuracy of classification is not high.

Disclosure of Invention

In view of the above problems, an object of the present invention is to provide a method, a system, a processing device and a storage medium for identifying a china traffic sign, which can effectively improve the detection accuracy and adopt multi-stage identification.

In order to achieve the purpose, the invention adopts the following technical scheme:

in a first aspect, the invention provides a method for identifying a Chinese traffic sign board, which comprises the following steps:

the method comprises the steps that data set labeling is carried out on a traffic sign image data set by using two-dimensional bounding box information of the traffic sign and category information of the sign;

classifying the labeled data sets to obtain training sets and test sets of major classes and minor classes;

designing a detection network and a classification network;

and training the detection network and the classification network according to the acquired training data for recognizing the Chinese traffic sign.

Further, the two-dimensional bounding box information of the traffic sign board refers to the position coordinates x, y, unit of the geometric center point of the bounding box in the image coordinate system: width and height w, h, unit of pixel and bounding box: a pixel; the class information of the traffic sign comprises a major class C and a minor class Sc of the traffic sign, namely, a traffic sign example in the image is represented by a 6-dimensional array [ x, y, w, h, C and Sc ].

Further, the process of classifying the labeled data set to obtain the training set and the test set of the major class and the minor class includes:

dividing pictures in the data set into a training set and a test set according to a set proportion;

extracting traffic sign board examples in a data set, dividing the traffic sign board examples into a training set and a testing set according to a set proportion, and training a large-class classification network, wherein the large-class classification network comprises a forbidden class, an indication class and a warning class;

and dividing the examples into a training set and a test set according to a set proportion in each major category, and using the training sets to train a subclass classification network capable of understanding the specific semantic information contained in the traffic sign.

Further, the improved YOLOv3-spp detection network is adopted to detect the traffic sign example in the image, namely, a prediction frame of the traffic sign in the picture is generated, and the specific process is as follows:

the loss function of the YOLOv3-spp algorithm comprises 3 parts, namely errors of the position and the size of the prediction frame, errors of the confidence coefficient and errors of the probability of each class, and the modified loss function does not comprise the errors of the probability of each class to which the prediction frame belongs, and is shown as the following formula:

wherein λ is_coordIs the weight of the error in the position of the prediction box, lambda_noobjAnd λ_objIs the weight of the confidence error of the prediction box, S²The number of meshes contained in the feature map generated for the algorithm, B the number of a priori boxes generated on a per mesh basis, x_i,y_i,w_i,h_i,c_iThe true center point abscissa, ordinate, width, height and confidence of the signboard example with the center point on the ith grid,

the horizontal coordinate, the vertical coordinate, the width, the height and the confidence coefficient of the central point of the prediction frame,

and

representing whether the jth prediction box of the ith grid has foreground or not, if so

Otherwise

The part of the loss function where the prediction box is generated is determined by the generalized intersection of the bounding box generated by the algorithmic prediction and the real bounding box labeled in the dataset, which is defined as:

wherein IoU is an intersection-union ratio, which is the proportion of the intersection and union of the target prediction box and the real box; u is the union area of the target prediction box and the real box, A_cThe minimum occlusion region area for the target prediction box and the real box.

Further, a classification network is designed, and Efficientnet-B6 is selected as a framework of the classification network, and the concrete implementation process of the Efficientnet-B6 network comprises the following steps:

first, a 3 x 3 sized convolutional layer is processed as the input dimension required for a moving reverse bottleneck convolutional layer;

then, extracting a feature map by 43 moving reverse bottleneck convolution layers with convolution kernels of 3 × 3 or 5 × 5;

then, by taking the idea of the full convolution network FCN as reference, inputting the feature map into a convolution layer with a convolution kernel of 1 × 1, and converting the feature map with any size into a specific channel number;

and finally, obtaining the probability that the input image belongs to each category through 1 pooling layer and 1 full-connection layer, and optimizing network parameters by utilizing an Adam algorithm by calculating a cross entropy loss function of a network prediction result and a real category marked by the data set as a loss function.

Further, the dimension of the full connection layer output is the type matching of the division, specifically:

the traffic sign board examples are divided into three classes of 'forbidden' class, 'indication' class and 'warning' class by the large-class dividing network, so that the dimension output by the final full-connection layer of the classifying network is 3, and the dimension represents the probability that the image input by the network belongs to the 'forbidden' class, 'indication' class and 'warning' class respectively;

the subclass division network of the 'forbidden' class needs to divide the traffic sign board containing 'forbidden' information into 17 'forbidden' subclasses, the dimension output by the last full-connection layer of the classification network is 17, and the dimension represents the probability that the image input by the network belongs to each 'forbidden' subclass;

the subclass division network of the indication class needs to divide the traffic sign board containing the indication information into 27 indication subclasses, the dimension of the final full-connection layer output of the classification network is 27, and the dimension represents the probability that the image input by the network belongs to each indication subclass;

the subclassing network of the "warning" class needs to divide the traffic sign board containing the "warning" information into 9 types of "warning" subclasses, the dimension of the last full-connection layer output of the classification network is 9, and the dimension represents the probability that the image input by the network belongs to each "warning" subclass.

Further, according to the obtained training data, training a detection network and a classification network for identifying the Chinese traffic sign board, and the specific process is as follows:

the input of the YOLOv3-spp network is a square image, the input image is scaled to 325 multiplied by 325 pixel size and is used as the input of the detection network, and the detection network is obtained through training;

the network input of the EfficientNet-B6 is an RGB three-color channel image, each traffic sign example is subjected to non-equal-scale scaling to 528 x 528 pixel size, and the traffic sign examples are input into a classification network for training to obtain the classification network.

In a second aspect, the present invention also provides a system for identifying a chinese traffic sign, the system comprising:

the data set labeling unit is configured to perform data set labeling on the traffic sign image data set by adopting two-dimensional bounding box information of the traffic sign and category information of the sign;

the data set splitting unit is configured to classify the standard data set to obtain a training set and a test set of a major class and a minor class;

a network design unit configured to design a detection network and a classification network;

and the network training unit is configured to train the detection network and the classification network according to the acquired training data, and is used for carrying out Chinese traffic sign identification.

In a third aspect, the present invention further provides a processing device, which at least includes a processor and a memory, where the memory stores a computer program, and is characterized in that the processor executes the computer program to implement the method for identifying a chinese traffic sign.

In a fourth aspect, the present invention further provides a computer storage medium having computer readable instructions stored thereon, which are executable by a processor to implement the method for identifying a chinese traffic sign.

Due to the adoption of the technical scheme, the invention has the following advantages:

based on a deep neural network, in order to improve the identification detection precision, images in a source data set TT-100K and a CCTSDB are re-labeled, an identification algorithm comprises three stages, a detection stage based on YOLOv3-spp is used for dividing examples into a large class stage and dividing examples in each large class into small classes, compared with a reference YOLOv3-spp algorithm, the detection accuracy is improved by 2.8% under the condition that the recall rate and the detection speed are not changed, and the confusion problem of an indication mark and a lane mark is effectively solved in practical application; the invention can be widely applied to the identification of the traffic signboards in the complex road scene of China.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Like reference numerals refer to like parts throughout the drawings. In the drawings:

FIG. 1 is a schematic flow chart illustrating the implementation of an embodiment of the present invention;

FIG. 2 is a schematic diagram of an algorithm framework of an embodiment of the invention.

Detailed Description

Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention can be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.

It is to be understood that the terminology used herein is for the purpose of describing particular example embodiments only, and is not intended to be limiting. As used herein, the singular forms "a", "an" and "the" may be intended to include the plural forms as well, unless the context clearly indicates otherwise. The terms "comprises," "comprising," "including," and "having" are inclusive and therefore specify the presence of stated features, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, elements, components, and/or groups thereof. The method steps, processes, and operations described herein are not to be construed as necessarily requiring their performance in the particular order described or illustrated, unless specifically identified as an order of performance. It should also be understood that additional or alternative steps may be used.

For convenience of description, spatially relative terms, such as "inner", "outer", "lower", "upper", and the like, may be used herein to describe one element or feature's relationship to another element or feature as illustrated in the figures. Such spatially relative terms are intended to encompass different orientations of the device in use or operation in addition to the orientation depicted in the figures.

Example one

As shown in fig. 1, the method for identifying a chinese traffic sign using multi-stage identification according to an embodiment of the present invention includes the following steps:

and S1, labeling the data set of the traffic sign image data set by adopting the two-dimensional surrounding frame information of the traffic sign in the image and the category information of the sign.

Specifically, the traffic sign examples in the complex road scene in china have large size difference and complex types, and in this embodiment, the two-dimensional bounding box information of the traffic sign in the image and the category information of the traffic sign are used, where the two-dimensional bounding box information of the traffic sign refers to the position coordinates x, y (unit: pixel) of the geometric center point of the bounding box in the image coordinate system and the width and height w, h (unit: pixel) of the bounding box, and the category information of the traffic sign includes the major category C ("indication", "warning", "prohibition") and the minor category Sc ("driving on the right", "sharp turn ahead", and "no stop", etc.) to which the traffic sign belongs, so that one traffic sign example in the image can be represented by a 6-dimensional array [ x, y, w, h, C, Sc ].

In this embodiment, the pictures in the TT-100K and CCSTDB traffic sign data sets are relabeled, and the obtained data set contains 18955 images, which contain 41028 traffic sign instances.

S2, classifying the data set to obtain a training set and a test set of a major class and a minor class, specifically:

s21, the pictures in the data set are scaled by, for example, 4: 1 into a training set and a test set;

s22, extracting the traffic sign board examples in the data set, and performing the following steps of 4: 1, a training set and a test set are used for training a large-class classification network for classifying the traffic signboards into 'indication' class traffic signboards, 'prohibition' class traffic signboards and 'caution' class traffic signboards;

s23, dividing the traffic sign board examples into three categories of forbidden categories, indication categories and warning categories, and according to the ratio of 4: the scale of 1 is divided into a training set and a test set for training a subclass classification network capable of understanding specific semantic information ("no parking", "right turn driving", "turn around driving", etc.) contained in the traffic sign.

S3, designing a detection network and a classification network, specifically:

s31, using the improved YOLOv3-spp detection network for detecting the traffic sign instances in the image, i.e. generating a prediction box for the traffic sign in the picture.

The loss function of the YOLOv3-spp algorithm contains 3 parts, namely errors of the position and the size of a predicted frame, errors of confidence coefficient and errors of the probability of each class. The invention removes the error of the original loss function about the class probability, and the modified loss is shown as the following formula:

and

Otherwise

The portion of the loss function where the prediction box is generated is determined by the generalized intersection ratio (GIoU) of the bounding box generated by the algorithmic prediction and the real bounding box labeled in the dataset, which is defined as:

In the invention, the loss determined by the type of the prediction box in the loss function is removed, and the YOLOv3-spp network is specially used for detecting the target.

And S32, designing a classification network.

Selecting Efficientnet-B6 as a skeleton (backbone) of a classification network, wherein the input of the Efficientnet-B6 network is an RGB three-color channel image with the resolution of 528 pixels by 528 pixels, and the implementation process comprises the following steps:

first, the input dimension required to be processed as a moving reverse bottleneck convolutional layer (MBConv) through a convolutional layer of size 3 × 3;

then, extracting a feature map through 43 MBConv moving inversion bottleneck convolution layers with convolution kernels of 3 × 3 or 5 × 5, wherein the number of layers of the MBConv, the number of channels of each layer and the size of the convolution kernels are finely adjusted to ensure that the performance of the network (accuracy in ImageNet) is optimal under the condition of certain network parameters;

then, by taking the idea of a Full Convolution Network (FCN) as a reference, inputting the feature map into a convolution layer with a convolution kernel of 1 × 1, which can convert the feature map with any size into a specific number of channels;

and finally, obtaining the probability that the input image belongs to each category through 1 pooling layer and 1 full-connection layer (the output dimension is equal to the number of the classification categories).

And optimizing network parameters by utilizing an Adam algorithm by calculating a cross entropy loss function of the network prediction result and the real category marked by the data set as a loss function.

In some specific implementations, the output dimension of the full connection layer is equal to the number of classification categories, and the specific process includes:

the traffic sign board examples need to be divided into three classes, namely a 'forbidden' class, an 'indicating' class and a 'warning' class, so that the dimension of the final full-connection layer output of the classification network is 3, and the dimension represents the probability that the image input by the network belongs to the 'forbidden' class, the 'indicating' class and the 'warning' class respectively.

Since the traffic sign including the "no" information needs to be divided into 17 "no" subclasses, such as "no stop", "no entry", "no turn around", etc., by the subclass division network of the "no" class, the dimension of the last full-link layer output of the classification network is 17, which represents the probability that the image input by the network belongs to each "no" subclass.

Similarly, the subclass division network of the "indication" class needs to divide the traffic sign board containing the "indication" information into 27 types of "indication" subclasses such as "driving right", "going straight lane" and "motor lane", so that the dimension of the last full-link layer output of the classification network is 27, and represents the probability that the image input by the network belongs to each "indication" subclass;

since the subclass division network of the "warning" class needs to divide the traffic sign including the "warning" information into 9 "warning" subclasses such as "pay attention to children", "pay attention to rivers", "slow down driving", and the like, the dimension of the last full-link layer output of the classification network is 9, and represents the probability that the image input by the network belongs to each "warning" subclass.

In summary, 1 classification network is used for classifying the traffic signboard examples output by the detection network into a large category (classified into a "prohibition" category, an "indication" category and a "warning" category), and another 3 classification networks are used for classifying the specific categories of the traffic signboard examples under each large category.

And S4, training the detection network and the classification network according to the acquired training data.

The input of the YOLOv3-spp network needs to be a square image, and the memory consumption and the detection precision are comprehensively considered, in this embodiment, the input image is scaled to 325 × 325 pixels, and then the scaled input image is used as the input of the detection network to train to obtain the detection network. Specifically, λ is set in the present embodiment_noobjAnd λ_objIs set to 1, lambda_coordSet to 1.54.

The input of the EfficientNet-B6 network is an RGB three-color channel image, the RGB three-color channel image is input into a classification network to train the classification network, the input image is 528 x 528 pixel in size, in the training process of the embodiment, each traffic sign example is scaled in an unequal proportion to 528 x 528 pixel in size and then input into the classification network to train, and the classification network is obtained.

And S5, testing and comparing.

After the training of the detection network and the four classification networks is completed, the recognition algorithm is tested by using a test set, and an algorithm framework is shown in fig. 2:

the first stage, identifying the traffic sign example in the input image by using a detection network;

in the second stage, the traffic sign board example is cut from the image, and input into a large class classification algorithm through scaling transformation, so that a large class (an indication class, a warning class and a prohibition class) to which the traffic sign board example belongs can be obtained;

and in the third stage, inputting the traffic sign instances into the corresponding subclass division algorithm (for example, the traffic sign instances belonging to the indication class are input into the indication subclass division algorithm) to obtain the subclasses to which the traffic sign instances belong (such as driving right, paying attention to rivers and the like).

Meanwhile, the trained improved YOLOv3-spp algorithm is used for testing, the result shows that the recognition speed of the two algorithms is the same, and the recognition accuracy of the algorithm provided by the invention is improved by 2.8% under the condition that the recognition recall rate is the same.

Example two

The first embodiment provides a method for identifying a Chinese traffic sign board by adopting multi-stage identification, and correspondingly, the first embodiment provides a system for identifying a Chinese traffic sign board. The system for identifying a chinese traffic sign provided in this embodiment may implement the method for identifying a chinese traffic sign according to the first embodiment, and the system may be implemented by software, hardware, or a combination of software and hardware. For example, the system may comprise integrated or separate functional modules or units to perform the corresponding steps in the method of an embodiment. Since the chinese traffic sign recognition system of the present embodiment is basically similar to the method embodiment, the description process of the present embodiment is relatively simple, and reference may be made to part of the description of the first embodiment for relevant points.

The present embodiment provides a chinese traffic sign tablet recognition system, this system includes:

the data set labeling unit is configured to label the data set of the image data set by adopting the two-dimensional bounding box information of the traffic signboard in the image and the category information of the signboard;

the data set splitting unit is configured to classify the data set to obtain a training set and a test set of a major class and a minor class;

EXAMPLE III

The present embodiment provides a processing device for implementing the method for identifying a chinese traffic sign provided in the first embodiment, where the processing device may be a processing device for a client, such as a mobile phone, a laptop, a tablet computer, a desktop computer, etc., so as to execute the method for identifying a chinese traffic sign in the first embodiment.

The processing equipment comprises a processor, a memory, a communication interface and a bus, wherein the processor, the memory and the communication interface are connected through the bus so as to complete mutual communication. The memory stores a computer program capable of running on the processor, and the processor executes the method for identifying the Chinese traffic sign provided by the embodiment when running the computer program.

Preferably, the Memory may be a high-speed Random Access Memory (RAM), and may also include a non-volatile Memory, such as at least one disk Memory.

Preferably, the processor may be various general processors such as a Central Processing Unit (CPU), a Digital Signal Processor (DSP), and the like, which are not limited herein.

Example four

The method for identifying a chinese traffic sign according to the embodiment is embodied as a computer program product, and the computer program product may include a computer readable storage medium on which computer readable program instructions for executing the method for identifying a chinese traffic sign according to the embodiment are loaded.

The computer readable storage medium may be a tangible device that retains and stores instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any combination of the foregoing.

Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: it is to be understood that modifications may be made to the above-described arrangements in the embodiments or equivalents may be substituted for some of the features of the embodiments without departing from the spirit or scope of the present invention.

Claims

1. A method for identifying a Chinese traffic sign board is characterized by comprising the following steps:

designing a detection network and a classification network;

2. The method for identifying a Chinese traffic sign according to claim 1, wherein the two-dimensional bounding box information of the traffic sign indicates the position coordinates x, y, in units, of the geometric center point of the bounding box in the image coordinate system: width and height w, h, unit of pixel and bounding box: a pixel; the class information of the traffic sign comprises a major class C and a minor class Sc of the traffic sign, namely, a traffic sign example in the image is represented by a 6-dimensional array [ x, y, w, h, C and Sc ].

3. The method for identifying Chinese traffic signs according to claim 1, wherein the process of classifying the labeled data sets to obtain training sets and test sets of major and minor classes comprises:

4. The method for recognizing Chinese traffic signs according to claim 1, wherein the improved YOLOv3-spp detection network is adopted to detect the traffic signs in the images, namely, a prediction frame for the traffic signs in the images is generated, and the specific process is as follows:

wherein λ is_coordIs the weight of the error in the position of the prediction box, lambda_noobjAnd λ_objIs the weight of the confidence error of the prediction box, S²The number of meshes contained in the feature map generated for the algorithm, B the number of a priori boxes generated on a per mesh basis, x_i，y_i，w_i，h_i，c_iThe true center point abscissa, ordinate, width, height and confidence of the signboard example with the center point on the ith grid,

and

Otherwise

5. The method for identifying Chinese traffic signs according to claim 1, wherein a classification network is designed, and Efficientnet-B6 is selected as a framework of the classification network, and the concrete implementation process of the Efficientnet-B6 network comprises the following steps:

6. The method for identifying a chinese traffic sign according to claim 5, wherein the dimension of the full link layer output is a class match of the division, specifically:

7. The method for recognizing Chinese traffic signs according to claim 4, wherein the detection network and the classification network are trained according to the obtained training data for recognizing Chinese traffic signs, and the specific process is as follows:

8. A Chinese traffic sign board recognition system is characterized by comprising:

9. A processing device comprising at least a processor and a memory, the memory having stored thereon a computer program, characterized in that the processor executes when running the computer program to implement the method of chinese traffic sign recognition according to any one of claims 1 to 7.

10. A computer storage medium having computer readable instructions stored thereon which are executable by a processor to implement the method of identifying chinese traffic signs according to any one of claims 1 to 7.