CN113255555A - Method, system, processing equipment and storage medium for identifying Chinese traffic sign board - Google Patents

Method, system, processing equipment and storage medium for identifying Chinese traffic sign board Download PDF

Info

Publication number
CN113255555A
CN113255555A CN202110628945.8A CN202110628945A CN113255555A CN 113255555 A CN113255555 A CN 113255555A CN 202110628945 A CN202110628945 A CN 202110628945A CN 113255555 A CN113255555 A CN 113255555A
Authority
CN
China
Prior art keywords
network
traffic sign
class
training
chinese
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110628945.8A
Other languages
Chinese (zh)
Inventor
江昆
杨殿阁
冯润泽
于伟光
杨蒙蒙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202110628945.8A priority Critical patent/CN113255555A/en
Publication of CN113255555A publication Critical patent/CN113255555A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • G06V20/582Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads of traffic signs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components

Abstract

The invention relates to a method, a system, a processing device and a storage medium for identifying a Chinese traffic sign board, wherein the method comprises the following steps: the method comprises the steps that data set labeling is carried out on a traffic sign image data set by using two-dimensional bounding box information of the traffic sign and category information of the sign; classifying the labeled data sets to obtain training sets and test sets of major classes and minor classes; designing a detection network and a classification network; and training the detection network and the classification network according to the acquired training data for recognizing the Chinese traffic sign. The invention is based on the deep neural network, and can be widely applied to the identification of the traffic signboards in the complex road scene of China in order to improve the identification and detection precision.

Description

Method, system, processing equipment and storage medium for identifying Chinese traffic sign board
Technical Field
The invention relates to the technical field of traffic sign board identification, in particular to a Chinese traffic sign board identification method, a Chinese traffic sign board identification system, Chinese traffic sign board identification processing equipment and a storage medium, wherein the Chinese traffic sign board identification method, the Chinese traffic sign board identification system, the Chinese traffic sign board identification processing equipment and the storage medium are based on computer vision and adopt multi-stage (detection-major classification-minor classification) identification.
Background
It is crucial for autonomous vehicles to be able to accurately and consistently identify traffic signs in a road environment. With the development of the effect of the deep neural network, the deep neural network is commonly used by the academic community for target identification. However, since a large amount of data is required for training the deep neural network, many foreign enterprises and universities release traffic sign data sets, such as a german traffic sign data set GTSRB, a belgium traffic sign data set BelgiumTS, and a U.S. traffic sign data set LISA.
However, the traffic sign board in china is different from the foreign traffic sign board, so the neural network trained based on the foreign data is not suitable for the complex traffic scene in china. Zhu Zheng et al proposed the Chinese traffic sign data set TT-100K (Tsinghua Tencent 100K) and CCTSDB (Changsha University of Science and Technology Chinese traffic sign detection benchmark). Domestic data sets represented by TT-100K and CCTSDB data sets do not include lane marks, so that the lane marks are frequently recognized as indication marks by mistake in practical application, and the classification of foreign data sets is similar to the classification method of domestic data sets, and the same problems exist.
For an automatic driving vehicle, the real-time performance of visual detection is very important, and the single-stage target recognition algorithm has better real-time performance and good recognition effect. The current popular single-stage target recognition algorithm comprises a YOLO series algorithm and an SSD algorithm, wherein the YOLOv3-spp (spatial pyramid posing) algorithm in the YOLO series has stronger generalization capability and higher recall rate of small targets, and is very suitable for the recognition of traffic signboards. But since the YOLOv3-spp algorithm is a single-stage recognition algorithm, the accuracy of classification is not high.
Disclosure of Invention
In view of the above problems, an object of the present invention is to provide a method, a system, a processing device and a storage medium for identifying a china traffic sign, which can effectively improve the detection accuracy and adopt multi-stage identification.
In order to achieve the purpose, the invention adopts the following technical scheme:
in a first aspect, the invention provides a method for identifying a Chinese traffic sign board, which comprises the following steps:
the method comprises the steps that data set labeling is carried out on a traffic sign image data set by using two-dimensional bounding box information of the traffic sign and category information of the sign;
classifying the labeled data sets to obtain training sets and test sets of major classes and minor classes;
designing a detection network and a classification network;
and training the detection network and the classification network according to the acquired training data for recognizing the Chinese traffic sign.
Further, the two-dimensional bounding box information of the traffic sign board refers to the position coordinates x, y, unit of the geometric center point of the bounding box in the image coordinate system: width and height w, h, unit of pixel and bounding box: a pixel; the class information of the traffic sign comprises a major class C and a minor class Sc of the traffic sign, namely, a traffic sign example in the image is represented by a 6-dimensional array [ x, y, w, h, C and Sc ].
Further, the process of classifying the labeled data set to obtain the training set and the test set of the major class and the minor class includes:
dividing pictures in the data set into a training set and a test set according to a set proportion;
extracting traffic sign board examples in a data set, dividing the traffic sign board examples into a training set and a testing set according to a set proportion, and training a large-class classification network, wherein the large-class classification network comprises a forbidden class, an indication class and a warning class;
and dividing the examples into a training set and a test set according to a set proportion in each major category, and using the training sets to train a subclass classification network capable of understanding the specific semantic information contained in the traffic sign.
Further, the improved YOLOv3-spp detection network is adopted to detect the traffic sign example in the image, namely, a prediction frame of the traffic sign in the picture is generated, and the specific process is as follows:
the loss function of the YOLOv3-spp algorithm comprises 3 parts, namely errors of the position and the size of the prediction frame, errors of the confidence coefficient and errors of the probability of each class, and the modified loss function does not comprise the errors of the probability of each class to which the prediction frame belongs, and is shown as the following formula:
Figure BDA0003100603330000021
wherein λ iscoordIs the weight of the error in the position of the prediction box, lambdanoobjAnd λobjIs the weight of the confidence error of the prediction box, S2The number of meshes contained in the feature map generated for the algorithm, B the number of a priori boxes generated on a per mesh basis, xi,yi,wi,hi,ciThe true center point abscissa, ordinate, width, height and confidence of the signboard example with the center point on the ith grid,
Figure BDA0003100603330000022
the horizontal coordinate, the vertical coordinate, the width, the height and the confidence coefficient of the central point of the prediction frame,
Figure BDA0003100603330000031
and
Figure BDA0003100603330000032
representing whether the jth prediction box of the ith grid has foreground or not, if so
Figure BDA0003100603330000034
Otherwise
Figure BDA0003100603330000035
The part of the loss function where the prediction box is generated is determined by the generalized intersection of the bounding box generated by the algorithmic prediction and the real bounding box labeled in the dataset, which is defined as:
Figure BDA0003100603330000037
wherein IoU is an intersection-union ratio, which is the proportion of the intersection and union of the target prediction box and the real box; u is the union area of the target prediction box and the real box, AcThe minimum occlusion region area for the target prediction box and the real box.
Further, a classification network is designed, and Efficientnet-B6 is selected as a framework of the classification network, and the concrete implementation process of the Efficientnet-B6 network comprises the following steps:
first, a 3 x 3 sized convolutional layer is processed as the input dimension required for a moving reverse bottleneck convolutional layer;
then, extracting a feature map by 43 moving reverse bottleneck convolution layers with convolution kernels of 3 × 3 or 5 × 5;
then, by taking the idea of the full convolution network FCN as reference, inputting the feature map into a convolution layer with a convolution kernel of 1 × 1, and converting the feature map with any size into a specific channel number;
and finally, obtaining the probability that the input image belongs to each category through 1 pooling layer and 1 full-connection layer, and optimizing network parameters by utilizing an Adam algorithm by calculating a cross entropy loss function of a network prediction result and a real category marked by the data set as a loss function.
Further, the dimension of the full connection layer output is the type matching of the division, specifically:
the traffic sign board examples are divided into three classes of 'forbidden' class, 'indication' class and 'warning' class by the large-class dividing network, so that the dimension output by the final full-connection layer of the classifying network is 3, and the dimension represents the probability that the image input by the network belongs to the 'forbidden' class, 'indication' class and 'warning' class respectively;
the subclass division network of the 'forbidden' class needs to divide the traffic sign board containing 'forbidden' information into 17 'forbidden' subclasses, the dimension output by the last full-connection layer of the classification network is 17, and the dimension represents the probability that the image input by the network belongs to each 'forbidden' subclass;
the subclass division network of the indication class needs to divide the traffic sign board containing the indication information into 27 indication subclasses, the dimension of the final full-connection layer output of the classification network is 27, and the dimension represents the probability that the image input by the network belongs to each indication subclass;
the subclassing network of the "warning" class needs to divide the traffic sign board containing the "warning" information into 9 types of "warning" subclasses, the dimension of the last full-connection layer output of the classification network is 9, and the dimension represents the probability that the image input by the network belongs to each "warning" subclass.
Further, according to the obtained training data, training a detection network and a classification network for identifying the Chinese traffic sign board, and the specific process is as follows:
the input of the YOLOv3-spp network is a square image, the input image is scaled to 325 multiplied by 325 pixel size and is used as the input of the detection network, and the detection network is obtained through training;
the network input of the EfficientNet-B6 is an RGB three-color channel image, each traffic sign example is subjected to non-equal-scale scaling to 528 x 528 pixel size, and the traffic sign examples are input into a classification network for training to obtain the classification network.
In a second aspect, the present invention also provides a system for identifying a chinese traffic sign, the system comprising:
the data set labeling unit is configured to perform data set labeling on the traffic sign image data set by adopting two-dimensional bounding box information of the traffic sign and category information of the sign;
the data set splitting unit is configured to classify the standard data set to obtain a training set and a test set of a major class and a minor class;
a network design unit configured to design a detection network and a classification network;
and the network training unit is configured to train the detection network and the classification network according to the acquired training data, and is used for carrying out Chinese traffic sign identification.
In a third aspect, the present invention further provides a processing device, which at least includes a processor and a memory, where the memory stores a computer program, and is characterized in that the processor executes the computer program to implement the method for identifying a chinese traffic sign.
In a fourth aspect, the present invention further provides a computer storage medium having computer readable instructions stored thereon, which are executable by a processor to implement the method for identifying a chinese traffic sign.
Due to the adoption of the technical scheme, the invention has the following advantages:
based on a deep neural network, in order to improve the identification detection precision, images in a source data set TT-100K and a CCTSDB are re-labeled, an identification algorithm comprises three stages, a detection stage based on YOLOv3-spp is used for dividing examples into a large class stage and dividing examples in each large class into small classes, compared with a reference YOLOv3-spp algorithm, the detection accuracy is improved by 2.8% under the condition that the recall rate and the detection speed are not changed, and the confusion problem of an indication mark and a lane mark is effectively solved in practical application; the invention can be widely applied to the identification of the traffic signboards in the complex road scene of China.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Like reference numerals refer to like parts throughout the drawings. In the drawings:
FIG. 1 is a schematic flow chart illustrating the implementation of an embodiment of the present invention;
FIG. 2 is a schematic diagram of an algorithm framework of an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention can be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
It is to be understood that the terminology used herein is for the purpose of describing particular example embodiments only, and is not intended to be limiting. As used herein, the singular forms "a", "an" and "the" may be intended to include the plural forms as well, unless the context clearly indicates otherwise. The terms "comprises," "comprising," "including," and "having" are inclusive and therefore specify the presence of stated features, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, elements, components, and/or groups thereof. The method steps, processes, and operations described herein are not to be construed as necessarily requiring their performance in the particular order described or illustrated, unless specifically identified as an order of performance. It should also be understood that additional or alternative steps may be used.
For convenience of description, spatially relative terms, such as "inner", "outer", "lower", "upper", and the like, may be used herein to describe one element or feature's relationship to another element or feature as illustrated in the figures. Such spatially relative terms are intended to encompass different orientations of the device in use or operation in addition to the orientation depicted in the figures.
Example one
As shown in fig. 1, the method for identifying a chinese traffic sign using multi-stage identification according to an embodiment of the present invention includes the following steps:
and S1, labeling the data set of the traffic sign image data set by adopting the two-dimensional surrounding frame information of the traffic sign in the image and the category information of the sign.
Specifically, the traffic sign examples in the complex road scene in china have large size difference and complex types, and in this embodiment, the two-dimensional bounding box information of the traffic sign in the image and the category information of the traffic sign are used, where the two-dimensional bounding box information of the traffic sign refers to the position coordinates x, y (unit: pixel) of the geometric center point of the bounding box in the image coordinate system and the width and height w, h (unit: pixel) of the bounding box, and the category information of the traffic sign includes the major category C ("indication", "warning", "prohibition") and the minor category Sc ("driving on the right", "sharp turn ahead", and "no stop", etc.) to which the traffic sign belongs, so that one traffic sign example in the image can be represented by a 6-dimensional array [ x, y, w, h, C, Sc ].
In this embodiment, the pictures in the TT-100K and CCSTDB traffic sign data sets are relabeled, and the obtained data set contains 18955 images, which contain 41028 traffic sign instances.
S2, classifying the data set to obtain a training set and a test set of a major class and a minor class, specifically:
s21, the pictures in the data set are scaled by, for example, 4: 1 into a training set and a test set;
s22, extracting the traffic sign board examples in the data set, and performing the following steps of 4: 1, a training set and a test set are used for training a large-class classification network for classifying the traffic signboards into 'indication' class traffic signboards, 'prohibition' class traffic signboards and 'caution' class traffic signboards;
s23, dividing the traffic sign board examples into three categories of forbidden categories, indication categories and warning categories, and according to the ratio of 4: the scale of 1 is divided into a training set and a test set for training a subclass classification network capable of understanding specific semantic information ("no parking", "right turn driving", "turn around driving", etc.) contained in the traffic sign.
S3, designing a detection network and a classification network, specifically:
s31, using the improved YOLOv3-spp detection network for detecting the traffic sign instances in the image, i.e. generating a prediction box for the traffic sign in the picture.
The loss function of the YOLOv3-spp algorithm contains 3 parts, namely errors of the position and the size of a predicted frame, errors of confidence coefficient and errors of the probability of each class. The invention removes the error of the original loss function about the class probability, and the modified loss is shown as the following formula:
Figure BDA0003100603330000061
wherein λ iscoordIs the weight of the error in the position of the prediction box, lambdanoobjAnd λobjIs the weight of the confidence error of the prediction box, S2The number of meshes contained in the feature map generated for the algorithm, B the number of a priori boxes generated on a per mesh basis, xi,yi,wi,hi,ciThe true center point abscissa, ordinate, width, height and confidence of the signboard example with the center point on the ith grid,
Figure BDA0003100603330000062
the horizontal coordinate, the vertical coordinate, the width, the height and the confidence coefficient of the central point of the prediction frame,
Figure BDA0003100603330000063
and
Figure BDA0003100603330000064
representing whether the jth prediction box of the ith grid has foreground or not, if so
Figure BDA0003100603330000066
Otherwise
Figure BDA0003100603330000067
The portion of the loss function where the prediction box is generated is determined by the generalized intersection ratio (GIoU) of the bounding box generated by the algorithmic prediction and the real bounding box labeled in the dataset, which is defined as:
Figure BDA0003100603330000071
wherein IoU is an intersection-union ratio, which is the proportion of the intersection and union of the target prediction box and the real box; u is the union area of the target prediction box and the real box, AcThe minimum occlusion region area for the target prediction box and the real box.
In the invention, the loss determined by the type of the prediction box in the loss function is removed, and the YOLOv3-spp network is specially used for detecting the target.
And S32, designing a classification network.
Selecting Efficientnet-B6 as a skeleton (backbone) of a classification network, wherein the input of the Efficientnet-B6 network is an RGB three-color channel image with the resolution of 528 pixels by 528 pixels, and the implementation process comprises the following steps:
first, the input dimension required to be processed as a moving reverse bottleneck convolutional layer (MBConv) through a convolutional layer of size 3 × 3;
then, extracting a feature map through 43 MBConv moving inversion bottleneck convolution layers with convolution kernels of 3 × 3 or 5 × 5, wherein the number of layers of the MBConv, the number of channels of each layer and the size of the convolution kernels are finely adjusted to ensure that the performance of the network (accuracy in ImageNet) is optimal under the condition of certain network parameters;
then, by taking the idea of a Full Convolution Network (FCN) as a reference, inputting the feature map into a convolution layer with a convolution kernel of 1 × 1, which can convert the feature map with any size into a specific number of channels;
and finally, obtaining the probability that the input image belongs to each category through 1 pooling layer and 1 full-connection layer (the output dimension is equal to the number of the classification categories).
And optimizing network parameters by utilizing an Adam algorithm by calculating a cross entropy loss function of the network prediction result and the real category marked by the data set as a loss function.
In some specific implementations, the output dimension of the full connection layer is equal to the number of classification categories, and the specific process includes:
the traffic sign board examples need to be divided into three classes, namely a 'forbidden' class, an 'indicating' class and a 'warning' class, so that the dimension of the final full-connection layer output of the classification network is 3, and the dimension represents the probability that the image input by the network belongs to the 'forbidden' class, the 'indicating' class and the 'warning' class respectively.
Since the traffic sign including the "no" information needs to be divided into 17 "no" subclasses, such as "no stop", "no entry", "no turn around", etc., by the subclass division network of the "no" class, the dimension of the last full-link layer output of the classification network is 17, which represents the probability that the image input by the network belongs to each "no" subclass.
Similarly, the subclass division network of the "indication" class needs to divide the traffic sign board containing the "indication" information into 27 types of "indication" subclasses such as "driving right", "going straight lane" and "motor lane", so that the dimension of the last full-link layer output of the classification network is 27, and represents the probability that the image input by the network belongs to each "indication" subclass;
since the subclass division network of the "warning" class needs to divide the traffic sign including the "warning" information into 9 "warning" subclasses such as "pay attention to children", "pay attention to rivers", "slow down driving", and the like, the dimension of the last full-link layer output of the classification network is 9, and represents the probability that the image input by the network belongs to each "warning" subclass.
In summary, 1 classification network is used for classifying the traffic signboard examples output by the detection network into a large category (classified into a "prohibition" category, an "indication" category and a "warning" category), and another 3 classification networks are used for classifying the specific categories of the traffic signboard examples under each large category.
And S4, training the detection network and the classification network according to the acquired training data.
The input of the YOLOv3-spp network needs to be a square image, and the memory consumption and the detection precision are comprehensively considered, in this embodiment, the input image is scaled to 325 × 325 pixels, and then the scaled input image is used as the input of the detection network to train to obtain the detection network. Specifically, λ is set in the present embodimentnoobjAnd λobjIs set to 1, lambdacoordSet to 1.54.
The input of the EfficientNet-B6 network is an RGB three-color channel image, the RGB three-color channel image is input into a classification network to train the classification network, the input image is 528 x 528 pixel in size, in the training process of the embodiment, each traffic sign example is scaled in an unequal proportion to 528 x 528 pixel in size and then input into the classification network to train, and the classification network is obtained.
And S5, testing and comparing.
After the training of the detection network and the four classification networks is completed, the recognition algorithm is tested by using a test set, and an algorithm framework is shown in fig. 2:
the first stage, identifying the traffic sign example in the input image by using a detection network;
in the second stage, the traffic sign board example is cut from the image, and input into a large class classification algorithm through scaling transformation, so that a large class (an indication class, a warning class and a prohibition class) to which the traffic sign board example belongs can be obtained;
and in the third stage, inputting the traffic sign instances into the corresponding subclass division algorithm (for example, the traffic sign instances belonging to the indication class are input into the indication subclass division algorithm) to obtain the subclasses to which the traffic sign instances belong (such as driving right, paying attention to rivers and the like).
Meanwhile, the trained improved YOLOv3-spp algorithm is used for testing, the result shows that the recognition speed of the two algorithms is the same, and the recognition accuracy of the algorithm provided by the invention is improved by 2.8% under the condition that the recognition recall rate is the same.
Example two
The first embodiment provides a method for identifying a Chinese traffic sign board by adopting multi-stage identification, and correspondingly, the first embodiment provides a system for identifying a Chinese traffic sign board. The system for identifying a chinese traffic sign provided in this embodiment may implement the method for identifying a chinese traffic sign according to the first embodiment, and the system may be implemented by software, hardware, or a combination of software and hardware. For example, the system may comprise integrated or separate functional modules or units to perform the corresponding steps in the method of an embodiment. Since the chinese traffic sign recognition system of the present embodiment is basically similar to the method embodiment, the description process of the present embodiment is relatively simple, and reference may be made to part of the description of the first embodiment for relevant points.
The present embodiment provides a chinese traffic sign tablet recognition system, this system includes:
the data set labeling unit is configured to label the data set of the image data set by adopting the two-dimensional bounding box information of the traffic signboard in the image and the category information of the signboard;
the data set splitting unit is configured to classify the data set to obtain a training set and a test set of a major class and a minor class;
a network design unit configured to design a detection network and a classification network;
and the network training unit is configured to train the detection network and the classification network according to the acquired training data, and is used for carrying out Chinese traffic sign identification.
EXAMPLE III
The present embodiment provides a processing device for implementing the method for identifying a chinese traffic sign provided in the first embodiment, where the processing device may be a processing device for a client, such as a mobile phone, a laptop, a tablet computer, a desktop computer, etc., so as to execute the method for identifying a chinese traffic sign in the first embodiment.
The processing equipment comprises a processor, a memory, a communication interface and a bus, wherein the processor, the memory and the communication interface are connected through the bus so as to complete mutual communication. The memory stores a computer program capable of running on the processor, and the processor executes the method for identifying the Chinese traffic sign provided by the embodiment when running the computer program.
Preferably, the Memory may be a high-speed Random Access Memory (RAM), and may also include a non-volatile Memory, such as at least one disk Memory.
Preferably, the processor may be various general processors such as a Central Processing Unit (CPU), a Digital Signal Processor (DSP), and the like, which are not limited herein.
Example four
The method for identifying a chinese traffic sign according to the embodiment is embodied as a computer program product, and the computer program product may include a computer readable storage medium on which computer readable program instructions for executing the method for identifying a chinese traffic sign according to the embodiment are loaded.
The computer readable storage medium may be a tangible device that retains and stores instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any combination of the foregoing.
Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: it is to be understood that modifications may be made to the above-described arrangements in the embodiments or equivalents may be substituted for some of the features of the embodiments without departing from the spirit or scope of the present invention.

Claims (10)

1. A method for identifying a Chinese traffic sign board is characterized by comprising the following steps:
the method comprises the steps that data set labeling is carried out on a traffic sign image data set by using two-dimensional bounding box information of the traffic sign and category information of the sign;
classifying the labeled data sets to obtain training sets and test sets of major classes and minor classes;
designing a detection network and a classification network;
and training the detection network and the classification network according to the acquired training data for recognizing the Chinese traffic sign.
2. The method for identifying a Chinese traffic sign according to claim 1, wherein the two-dimensional bounding box information of the traffic sign indicates the position coordinates x, y, in units, of the geometric center point of the bounding box in the image coordinate system: width and height w, h, unit of pixel and bounding box: a pixel; the class information of the traffic sign comprises a major class C and a minor class Sc of the traffic sign, namely, a traffic sign example in the image is represented by a 6-dimensional array [ x, y, w, h, C and Sc ].
3. The method for identifying Chinese traffic signs according to claim 1, wherein the process of classifying the labeled data sets to obtain training sets and test sets of major and minor classes comprises:
dividing pictures in the data set into a training set and a test set according to a set proportion;
extracting traffic sign board examples in a data set, dividing the traffic sign board examples into a training set and a testing set according to a set proportion, and training a large-class classification network, wherein the large-class classification network comprises a forbidden class, an indication class and a warning class;
and dividing the examples into a training set and a test set according to a set proportion in each major category, and using the training sets to train a subclass classification network capable of understanding the specific semantic information contained in the traffic sign.
4. The method for recognizing Chinese traffic signs according to claim 1, wherein the improved YOLOv3-spp detection network is adopted to detect the traffic signs in the images, namely, a prediction frame for the traffic signs in the images is generated, and the specific process is as follows:
the loss function of the YOLOv3-spp algorithm comprises 3 parts, namely errors of the position and the size of the prediction frame, errors of the confidence coefficient and errors of the probability of each class, and the modified loss function does not comprise the errors of the probability of each class to which the prediction frame belongs, and is shown as the following formula:
Figure FDA0003100603320000011
wherein λ iscoordIs the weight of the error in the position of the prediction box, lambdanoobjAnd λobjIs the weight of the confidence error of the prediction box, S2The number of meshes contained in the feature map generated for the algorithm, B the number of a priori boxes generated on a per mesh basis, xi,yi,wi,hi,ciThe true center point abscissa, ordinate, width, height and confidence of the signboard example with the center point on the ith grid,
Figure FDA0003100603320000021
the horizontal coordinate, the vertical coordinate, the width, the height and the confidence coefficient of the central point of the prediction frame,
Figure FDA0003100603320000022
and
Figure FDA0003100603320000023
representing whether the jth prediction box of the ith grid has foreground or not, if so
Figure FDA0003100603320000024
Otherwise
Figure FDA0003100603320000025
The part of the loss function where the prediction box is generated is determined by the generalized intersection of the bounding box generated by the algorithmic prediction and the real bounding box labeled in the dataset, which is defined as:
Figure FDA0003100603320000026
wherein IoU is an intersection-union ratio, which is the proportion of the intersection and union of the target prediction box and the real box; u is the union area of the target prediction box and the real box, AcThe minimum occlusion region area for the target prediction box and the real box.
5. The method for identifying Chinese traffic signs according to claim 1, wherein a classification network is designed, and Efficientnet-B6 is selected as a framework of the classification network, and the concrete implementation process of the Efficientnet-B6 network comprises the following steps:
first, a 3 x 3 sized convolutional layer is processed as the input dimension required for a moving reverse bottleneck convolutional layer;
then, extracting a feature map by 43 moving reverse bottleneck convolution layers with convolution kernels of 3 × 3 or 5 × 5;
then, by taking the idea of the full convolution network FCN as reference, inputting the feature map into a convolution layer with a convolution kernel of 1 × 1, and converting the feature map with any size into a specific channel number;
and finally, obtaining the probability that the input image belongs to each category through 1 pooling layer and 1 full-connection layer, and optimizing network parameters by utilizing an Adam algorithm by calculating a cross entropy loss function of a network prediction result and a real category marked by the data set as a loss function.
6. The method for identifying a chinese traffic sign according to claim 5, wherein the dimension of the full link layer output is a class match of the division, specifically:
the traffic sign board examples are divided into three classes of 'forbidden' class, 'indication' class and 'warning' class by the large-class dividing network, so that the dimension output by the final full-connection layer of the classifying network is 3, and the dimension represents the probability that the image input by the network belongs to the 'forbidden' class, 'indication' class and 'warning' class respectively;
the subclass division network of the 'forbidden' class needs to divide the traffic sign board containing 'forbidden' information into 17 'forbidden' subclasses, the dimension output by the last full-connection layer of the classification network is 17, and the dimension represents the probability that the image input by the network belongs to each 'forbidden' subclass;
the subclass division network of the indication class needs to divide the traffic sign board containing the indication information into 27 indication subclasses, the dimension of the final full-connection layer output of the classification network is 27, and the dimension represents the probability that the image input by the network belongs to each indication subclass;
the subclassing network of the "warning" class needs to divide the traffic sign board containing the "warning" information into 9 types of "warning" subclasses, the dimension of the last full-connection layer output of the classification network is 9, and the dimension represents the probability that the image input by the network belongs to each "warning" subclass.
7. The method for recognizing Chinese traffic signs according to claim 4, wherein the detection network and the classification network are trained according to the obtained training data for recognizing Chinese traffic signs, and the specific process is as follows:
the input of the YOLOv3-spp network is a square image, the input image is scaled to 325 multiplied by 325 pixel size and is used as the input of the detection network, and the detection network is obtained through training;
the network input of the EfficientNet-B6 is an RGB three-color channel image, each traffic sign example is subjected to non-equal-scale scaling to 528 x 528 pixel size, and the traffic sign examples are input into a classification network for training to obtain the classification network.
8. A Chinese traffic sign board recognition system is characterized by comprising:
the data set labeling unit is configured to perform data set labeling on the traffic sign image data set by adopting two-dimensional bounding box information of the traffic sign and category information of the sign;
the data set splitting unit is configured to classify the standard data set to obtain a training set and a test set of a major class and a minor class;
a network design unit configured to design a detection network and a classification network;
and the network training unit is configured to train the detection network and the classification network according to the acquired training data, and is used for carrying out Chinese traffic sign identification.
9. A processing device comprising at least a processor and a memory, the memory having stored thereon a computer program, characterized in that the processor executes when running the computer program to implement the method of chinese traffic sign recognition according to any one of claims 1 to 7.
10. A computer storage medium having computer readable instructions stored thereon which are executable by a processor to implement the method of identifying chinese traffic signs according to any one of claims 1 to 7.
CN202110628945.8A 2021-06-04 2021-06-04 Method, system, processing equipment and storage medium for identifying Chinese traffic sign board Pending CN113255555A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110628945.8A CN113255555A (en) 2021-06-04 2021-06-04 Method, system, processing equipment and storage medium for identifying Chinese traffic sign board

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110628945.8A CN113255555A (en) 2021-06-04 2021-06-04 Method, system, processing equipment and storage medium for identifying Chinese traffic sign board

Publications (1)

Publication Number Publication Date
CN113255555A true CN113255555A (en) 2021-08-13

Family

ID=77186632

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110628945.8A Pending CN113255555A (en) 2021-06-04 2021-06-04 Method, system, processing equipment and storage medium for identifying Chinese traffic sign board

Country Status (1)

Country Link
CN (1) CN113255555A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114973207A (en) * 2022-08-01 2022-08-30 成都航空职业技术学院 Road sign identification method based on target detection
CN115620265A (en) * 2022-12-19 2023-01-17 华南理工大学 Locomotive signboard information intelligent identification method and system based on deep learning

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107679508A (en) * 2017-10-17 2018-02-09 广州汽车集团股份有限公司 Road traffic sign detection recognition methods, apparatus and system
CN108009518A (en) * 2017-12-19 2018-05-08 大连理工大学 A kind of stratification traffic mark recognition methods based on quick two points of convolutional neural networks
US20190171904A1 (en) * 2017-12-01 2019-06-06 Baidu Online Network Technology (Beijing) Co., Ltd. Method and apparatus for training fine-grained image recognition model, fine-grained image recognition method and apparatus, and storage mediums
WO2020000253A1 (en) * 2018-06-27 2020-01-02 潍坊学院 Traffic sign recognizing method in rain and snow

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107679508A (en) * 2017-10-17 2018-02-09 广州汽车集团股份有限公司 Road traffic sign detection recognition methods, apparatus and system
US20190171904A1 (en) * 2017-12-01 2019-06-06 Baidu Online Network Technology (Beijing) Co., Ltd. Method and apparatus for training fine-grained image recognition model, fine-grained image recognition method and apparatus, and storage mediums
CN108009518A (en) * 2017-12-19 2018-05-08 大连理工大学 A kind of stratification traffic mark recognition methods based on quick two points of convolutional neural networks
WO2020000253A1 (en) * 2018-06-27 2020-01-02 潍坊学院 Traffic sign recognizing method in rain and snow

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
慕思侣: "《https://blog.csdn.net/u014090429》", 14 May 2021 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114973207A (en) * 2022-08-01 2022-08-30 成都航空职业技术学院 Road sign identification method based on target detection
CN114973207B (en) * 2022-08-01 2022-10-21 成都航空职业技术学院 Road sign identification method based on target detection
CN115620265A (en) * 2022-12-19 2023-01-17 华南理工大学 Locomotive signboard information intelligent identification method and system based on deep learning

Similar Documents

Publication Publication Date Title
Spencer Jr et al. Advances in computer vision-based civil infrastructure inspection and monitoring
CN108647585B (en) Traffic identifier detection method based on multi-scale circulation attention network
CN110084292B (en) Target detection method based on DenseNet and multi-scale feature fusion
US8620026B2 (en) Video-based detection of multiple object types under varying poses
Siriborvornratanakul An automatic road distress visual inspection system using an onboard in-car camera
CN111179217A (en) Attention mechanism-based remote sensing image multi-scale target detection method
CN106951830B (en) Image scene multi-object marking method based on prior condition constraint
CN108038409A (en) A kind of pedestrian detection method
CN107545263B (en) Object detection method and device
CN113723377B (en) Traffic sign detection method based on LD-SSD network
CN116188999B (en) Small target detection method based on visible light and infrared image data fusion
CN112200186B (en) Vehicle logo identification method based on improved YOLO_V3 model
CN111915583B (en) Vehicle and pedestrian detection method based on vehicle-mounted thermal infrared imager in complex scene
CN111985466A (en) Container dangerous goods mark identification method
CN113255555A (en) Method, system, processing equipment and storage medium for identifying Chinese traffic sign board
CN111127516A (en) Target detection and tracking method and system without search box
CN115631344B (en) Target detection method based on feature self-adaptive aggregation
Xiang et al. License plate detection based on fully convolutional networks
CN115100741A (en) Point cloud pedestrian distance risk detection method, system, equipment and medium
Zhang et al. Improved Lane Detection Method Based on Convolutional Neural Network Using Self-attention Distillation.
CN114119621A (en) SAR remote sensing image water area segmentation method based on depth coding and decoding fusion network
CN117351352A (en) SAR ship image target recognition method based on lightweight YOLOv5 network model
Zhang et al. A YOLOv3-Based Industrial Instrument Classification and Reading Recognition Method
CN111967287A (en) Pedestrian detection method based on deep learning
CN115424237A (en) Forward vehicle identification and distance detection method based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination