CN113591543A

CN113591543A - Traffic sign recognition method and device, electronic equipment and computer storage medium

Info

Publication number: CN113591543A
Application number: CN202110636627.6A
Authority: CN
Inventors: 李晓欢; 马新舒; 陈倩; 唐欣
Original assignee: Guangxi Comprehensive Transportation Big Data Research Institute; Guilin University of Electronic Technology
Current assignee: Guangxi Comprehensive Transportation Big Data Research Institute; Guilin University of Electronic Technology
Priority date: 2021-06-08
Filing date: 2021-06-08
Publication date: 2021-11-02
Anticipated expiration: 2041-06-08
Also published as: CN113591543B

Abstract

The embodiment of the application provides a traffic sign identification method and device, electronic equipment and a computer storage medium, and relates to the technical field of image identification. The method comprises the following steps: acquiring a preset traffic sign data set, clustering traffic signs in the traffic sign data set by adopting a preset clustering algorithm, and determining the sizes of various traffic signs; and identifying the traffic sign to be identified by adopting the trained traffic sign identification model, and confirming the category of the traffic sign to be identified. According to the embodiment of the application, the data set is clustered through a preset clustering method, the traffic sign recognition model is trained, the receptive field information of different scales in the image can be obtained, the obtained information of different scales can be fused, the capability of a shallow network for predicting the traffic sign is enhanced, and the detection accuracy of the traffic sign is improved. The network parameters are reduced, the network complexity is reduced, the detection speed of the traffic sign is increased, and the real-time detection of the traffic sign is achieved.

Description

Traffic sign recognition method and device, electronic equipment and computer storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a method and an apparatus for identifying a traffic sign, an electronic device, and a computer storage medium.

Background

In recent years, unmanned driving is more and more concerned globally, and in order to ensure that an unmanned vehicle can safely go on the road, the information of the surrounding environment needs to be sensed, in the prior art, the actual distance between a front target and the vehicle is determined through a laser radar, but the type of the target cannot be determined, so that an auxiliary camera is needed to detect all targets on the road surface accurately in real time, the farther the target is to be identified, the response time of the vehicle can be increased, braking can be timely avoided, collision can be avoided, traffic sign detection and identification are an important part of an unmanned driving sensing part, and an automatic driving system can timely make correct decision information by acquiring the type and distance information of a traffic sign.

The pixels of the traffic sign in real life account for 0.001% -5% of a visual field image, the size is small, the occupied pixels are few, the characteristics do not obviously cause the traffic sign to be difficult to detect compared with a large target, meanwhile, the traffic sign is also influenced by weather conditions, the traffic sign is difficult to identify due to severe weather conditions, such as fog, dim weather, complex scenes and the like, and the existing deep learning detection algorithm is difficult to effectively and accurately detect and identify the small target traffic sign in the real scene.

Therefore, the accuracy of the identification of the small target traffic sign in the prior art is not high, and improvement is needed.

Disclosure of Invention

The present application aims to solve at least one of the above technical drawbacks, in particular, the technical drawback of the prior art that the accuracy of the identification of small target traffic signs is not high.

According to one aspect of the present application, there is provided a traffic sign recognition method, including:

acquiring a preset traffic sign data set, wherein the traffic sign data set comprises a training set and a verification set;

clustering the traffic signs in the traffic sign data set by adopting a preset clustering algorithm, and determining the sizes of all types of traffic signs;

training a preset traffic sign recognition model by using the training set to obtain a trained traffic sign recognition model;

and identifying the traffic sign to be identified by adopting the trained traffic sign identification model, and confirming the category of the traffic sign to be identified.

As a possible implementation manner of the present application, in this implementation manner, the clustering traffic signs in the traffic sign data set by using a preset clustering algorithm to determine sizes of the various types of traffic signs includes:

setting a preset number of clusters, and determining a corresponding number of initial cluster centers;

calculating the distance between the data point in each traffic sign data in the traffic sign data set and the initial clustering center;

determining the data points with the distance within a preset range as clusters with cluster centers;

and taking the central points of all the data of the cluster as the central points of the cluster until the central points of the cluster do not move any more, and taking the central points of the cluster as the sizes of the traffic signs of the cluster.

As a possible embodiment of the present application, in this embodiment, the training step of the traffic sign recognition model includes:

inputting the training data into a first branch to calculate a first output, wherein the first branch comprises a convolution module;

inputting the training data into a second branch to calculate to obtain a second output, wherein the second branch comprises two convolution modules;

inputting the training data into a third branch to obtain a third output through calculation, wherein the third branch comprises four convolution modules;

inputting the training data into a fourth branch to calculate a fourth output, wherein the fourth branch comprises a maximum pooling module;

the first output, the second output, the third output and the fourth output are fused, and the fused result is subjected to convolution calculation once to obtain an output result,

as a possible embodiment of the present application, in this embodiment, the method further includes:

and calculating the loss error of the trained traffic sign recognition model by adopting the training set, and obtaining the trained traffic sign recognition model when the loss error is within a preset range.

As a possible embodiment of the present application, in this embodiment, one convolution module of the first branch is a 1 × 1 convolution module, two convolution modules of the second branch are respectively a 1 × 1 convolution module and a 3 × 3 convolution module, and four convolution modules of the third branch are respectively a 1 × 1 convolution module and three 3 × 3 convolution modules.

As a possible embodiment of the present application, in this embodiment,

the traffic sign recognition model further comprises a deep network feature map module for outputting a large target prediction size and a shallow network feature map module for outputting a target prediction size and a small target prediction size.

and fusing the large target prediction size, the medium target prediction size and the small target prediction size with the output result to obtain a final output result.

According to another aspect of the present application, there is provided a traffic sign recognition apparatus, including:

the system comprises a data acquisition module, a verification module and a data processing module, wherein the data acquisition module is used for acquiring a preset traffic sign data set, and the traffic sign data set comprises a training set and a verification set;

the clustering module is used for clustering the traffic signs in the traffic sign data set by adopting a preset clustering algorithm and determining the sizes of all types of traffic signs;

the training module is used for training a preset traffic sign recognition model by using the training set to obtain a trained traffic sign recognition model;

and the recognition module is used for recognizing the traffic sign to be recognized by adopting the trained traffic sign recognition model and confirming the category of the traffic sign to be recognized.

According to another aspect of the present application, there is provided an electronic device including:

one or more processors;

a memory;

one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to: the above-described traffic sign recognition method is performed.

According to yet another aspect of the present application, there is provided a computer storage medium having stored thereon at least one instruction, at least one program, set of codes, or set of instructions that is loaded and executed by the processor to implement the above-mentioned traffic sign recognition method.

According to the embodiment of the application, the data set is clustered through a preset clustering method, the traffic sign recognition model is trained through the clustered data set, the sensed field information of different scales in an image can be acquired, the acquired information of different scales can be fused, the capacity of a shallow network for predicting a traffic sign is enhanced, the detection accuracy of the traffic sign is further improved, network parameters are reduced, the network complexity is reduced, the detection speed of the traffic sign is improved, and the real-time detection of the traffic sign is achieved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments of the present application will be briefly described below.

Fig. 1 is a schematic flow chart of a traffic sign identification method according to an embodiment of the present disclosure;

fig. 2 is a schematic flow chart of a data point clustering method according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of an initial redefinition module according to an embodiment of the present disclosure;

fig. 4 is a schematic diagram of a deep network structure according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a traffic sign recognition apparatus according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

The above and other features, advantages and aspects of various embodiments of the present application will become more apparent from the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale.

Detailed Description

Embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present application. It should be understood that the drawings and embodiments of the present application are for illustration purposes only and are not intended to limit the scope of the present application.

It should be understood that the various steps recited in the method embodiments of the present application may be performed in a different order and/or in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present application is not limited in this respect.

The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.

It should be noted that the terms "first", "second", and the like in the present application are only used for distinguishing the devices, modules or units, and are not used for limiting the devices, modules or units to be different devices, modules or units, and are not used for limiting the sequence or interdependence relationship of the functions executed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this application are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that reference to "one or more" unless the context clearly dictates otherwise.

The names of messages or information exchanged between a plurality of devices in the embodiments of the present application are for illustrative purposes only, and are not intended to limit the scope of the messages or information. In order to make the objects, technical solutions and advantages of the present application more apparent, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

The traffic sign identification method provided by the embodiment of the application can be applied to an unmanned system and is used for identifying traffic signs and further carrying out correct road planning, because pixels of the traffic signs in real life account for 0.001-5% of a visual field image, the traffic signs are small in size and small in occupied pixel points, the detection difficulty of the traffic signs compared with a large target is caused by unobvious characteristics, meanwhile, the detection of the traffic signs is also influenced by weather conditions, the traffic signs are difficult to recognize due to severe weather conditions, such as fog, dim weather, complex scenes and the like, and the existing deep learning detection algorithm is difficult to effectively and accurately detect and identify the small target traffic signs in real scenes. According to the embodiment of the application, the data set is clustered through a preset clustering method, the traffic sign recognition model is trained through the clustered data set, the receptive field information of different scales in an image can be obtained, the obtained information of different scales can be fused, the capability of a shallow network for predicting a traffic sign is enhanced, the detection accuracy of the traffic sign is further improved, network parameters are reduced, the network complexity is reduced, the detection speed of the traffic sign is improved, and the real-time detection of the traffic sign is achieved.

The present application provides a traffic sign recognition method, apparatus, electronic device and computer-readable storage medium, which are intended to solve the above technical problems in the prior art.

The following describes the technical solution of the present application and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.

The embodiment of the application provides a traffic sign identification method, and as shown in fig. 1, the method comprises the following steps:

step S101, acquiring a preset traffic sign data set, wherein the traffic sign data set comprises a training set and a verification set;

step S102, clustering the traffic signs in the traffic sign data set by adopting a preset clustering algorithm, and determining the sizes of all the traffic signs;

step S103, training a preset traffic sign recognition model by using the training set to obtain a trained traffic sign recognition model;

and step S104, identifying the traffic sign to be identified by adopting the trained traffic sign identification model, and confirming the category of the traffic sign to be identified.

In the embodiment of the present application, the traffic sign data set refers to image data including a traffic sign, which may be a picture or a video, and as an embodiment of the present application, the traffic sign data set may adopt a TT100K data set, and the data set is divided into a training set and a verification set, where the training set may include most data, and the verification set only needs to include a small part of data. After a traffic sign data set is obtained, clustering data in the data set by adopting a preset clustering algorithm to determine the type of a traffic sign in the data set, wherein the distinguishing standard of the type is the size of the traffic sign, classifying the data in the traffic sign data set, determining the size of the traffic sign of each type, identifying image data to be identified by adopting a preset target detection model based on the size, and determining a target traffic sign to be detected, wherein the size of the target traffic sign corresponds to one of the sizes identified by clustering. The method comprises the steps of training a preset traffic sign recognition model by using a training set, recognizing a traffic sign to be recognized by using the trained model, and obtaining the category of the traffic sign to be recognized.

In this implementation, as shown in fig. 2, the clustering traffic signs in the traffic sign data set by using a preset clustering algorithm to determine sizes of various types of traffic signs includes:

step S201, setting a preset number of clusters, and determining a corresponding number of initial cluster centers;

step S202, calculating the distance between the data point in each traffic sign data in the traffic sign data set and the initial clustering center;

step S203, determining the data points with the distance within a preset range as clusters with cluster centers;

and S204, taking the central points of all the data of the cluster as the central points of the cluster until the central points of the cluster do not move any more, and taking the central points of the cluster as the size of the traffic sign of the cluster.

In the embodiment of the application, when the data are clustered, a K-means clustering algorithm is adopted, when the traffic signs are predicted by the prediction characteristic diagram, the sizes of countless traffic signs can be predicted, the size of an anchor frame is required to be referred to for prediction, and the anchor frame is the width and the height which are closest to the traffic signs. Firstly, manually selecting K clusters, randomly selecting K initial cluster centers, circularly calculating the distance d from each data point to the cluster centers, dividing the data points with smaller d values into clusters where the cluster centers are located, selecting the centers of all the data points in each cluster as new cluster centers, obtaining the final cluster center as an anchor frame after multiple iterations, and replacing the anchor value in the configuration file according to the proportion of 9 anchor frames generated by a K-means clustering algorithm.

By adopting the K-means clustering algorithm, the embodiment of the application can effectively confirm the proportion of the anchor frame and is convenient for the identification of the subsequent traffic signs.

and fusing the first output, the second output, the third output and the fourth output, and performing convolution calculation on the fused result to obtain an output result.

One convolution module of the first branch is a convolution module of 1 x 1, two convolution modules of the second branch are a convolution module of 1 x 1 and a convolution module of 3 x 3 respectively, and four convolution modules of the third branch are a convolution module of 1 x 1 and a convolution module of three convolution modules of 3 x 3 respectively.

The traffic sign recognition model further comprises a deep network feature map module and a shallow network feature map module, wherein the deep network feature map is used for outputting the predicted size of the large target, and the shallow network feature map module is used for outputting the predicted size of the target and the predicted size of the small target.

The method further comprises the following steps:

In the embodiment of the present application, as shown in fig. 3, a schematic diagram of an initial redefined module (inclusion-redefined module) structure provided in the embodiment of the present application is shown, where the structure includes four branches, and a first branch is formed by a convolution of 1 × 1; the second branch consists of a convolution of 1 x 1 followed by a convolution of 3 x 3; the third branch consists of a convolution of 1 x 1 followed by three convolutions of 3 x 3; the fourth branch consists of the largest pooling layer, the 7 × 7 convolution is very effective in providing semantic information for a large range of small targets, and the use of three 3 × 3 convolutions instead of one 7 × 7 convolution herein can save 7 × 7/3 × 3 × 1.81 times the amount of computation, which in turn can increase the computation speed. The 3 x 3 convolution layer is preceded by the 1 x 1 convolution layer, which is used for reducing the number of input channels, greatly reducing the number of network parameters and increasing the depth of the network, and the pooling layer is used for extracting image features. The four branches are respectively subjected to feature extraction of different scales, adaptability of the network to different scales is increased, information of multiple scales is obtained, feature graphs under the four branches are fused, and finally the number of output channels is reduced through 1-by-1 convolution layers.

An inclusion-refined module structure is respectively introduced between an output 52 x 52 (and 26 x 26) feature map of a traffic sign recognition model and a cascade module, as shown in fig. 4, a shallow-network feature map (shallow-network layer) is connected to a deep-network feature map (deep-network layer), and the deep-layer feature information and the shallow-layer feature information are combined together, so that the prediction capability of the shallow network on the traffic sign is enhanced, and the detection accuracy is further improved.

In the embodiment of the application, the traffic sign recognition model performs 5 times of downsampling by using convolution with convolution kernel size of 3 × 3 and step length of 2, and forms feature maps of 5 sizes, namely 208 × 208, 104 × 104, 52 × 52, 26 × 26 and 13 × 13. Wherein, the 13 × 13 output feature graph outputs a large target prediction scale Y1 after being convolved by ConvSet, 3 × 3 and 1 × 1. And (2) firstly performing an upsampling operation with the step size of 2 on the 26 x 26 output feature graph, then connecting the upsampling operation to an increment-refined module structure through a concatenate, and outputting a medium-size target prediction scale Y2 after convolution by ConvSet, 3 x 3 and 1 x 1. Similarly, the 52 × 52 output feature graph is subjected to an upsampling operation with the step size of 2, then is connected to an increment-refined module structure through a concatenate, and is subjected to convolution by ConvSet, 3 × 3 and 1 × 1 to output a small target prediction scale Y3. Meanwhile, traffic sign prediction is carried out on three different scales of Y1, Y2 and Y3.

In the embodiment of the application, 8 residual modules are reduced by pruning the basic structure, so that the network parameters are reduced, the complexity of the model is reduced, and the detection speed of the model is finally improved.

An embodiment of the present application provides a traffic sign recognition apparatus, and as shown in fig. 5, the traffic sign recognition apparatus 50 may include: a data acquisition module 501, a clustering module 502, a training module 503, and a recognition module 504, wherein,

a data obtaining module 501, configured to obtain a preset traffic sign data set, where the traffic sign data set includes a training set and a verification set;

a clustering module 502, configured to cluster the traffic signs in the traffic sign data set by using a preset clustering algorithm, and determine sizes of the various traffic signs;

the training module 503 is configured to train a preset traffic sign recognition model by using the training set to obtain a trained traffic sign recognition model;

the identification module 504 is configured to identify the traffic sign to be identified by using the trained traffic sign identification model, and determine the category of the traffic sign to be identified.

As a possible implementation manner of the present application, in this implementation manner, the clustering module 502, when clustering the traffic signs in the traffic sign data set by using a preset clustering algorithm and determining the sizes of the various types of traffic signs, may be configured to:

calculating the distance between the data points in each traffic sign data and the initial clustering center;

and taking the central point of the cluster as the central point of the cluster, and determining the size of the traffic sign corresponding to the cluster.

as a possible embodiment of the present application, in this embodiment, the apparatus is further configured to:

As a possible embodiment of the present application, in this embodiment,

An embodiment of the present application provides an electronic device, including: a memory and a processor; at least one program stored in the memory for, when executed by the processor, obtaining a preset traffic sign data set, the traffic sign data set including a training set and a validation set; clustering the traffic signs in the traffic sign data set by adopting a preset clustering algorithm, and determining the sizes of all types of traffic signs; training a preset traffic sign recognition model by using the training set and the verification set to obtain a trained traffic sign recognition model; and identifying the traffic sign to be identified by adopting the trained traffic sign identification model, and confirming the category of the traffic sign to be identified.

Compared with the prior art, the method can realize that: according to the embodiment of the application, the data set is clustered through a preset clustering method, the traffic sign recognition model is trained through the clustered data set, the receptive field information of different scales in an image can be obtained, the obtained information of different scales can be fused, the capacity of a shallow network for predicting the traffic sign is enhanced, the detection accuracy of the traffic sign is improved, network parameters are reduced, the network complexity is reduced, the detection speed of the traffic sign is improved, and the real-time detection of the traffic sign is achieved.

In an alternative embodiment, an electronic device is provided, as shown in fig. 6, the electronic device 4000 shown in fig. 6 comprising: a processor 4001 and a memory 4003. Processor 4001 is coupled to memory 4003, such as via bus 4002. Optionally, the electronic device 4000 may further include a transceiver 4004, and the transceiver 4004 may be used for data interaction between the electronic device and other electronic devices, such as data transmission and/or data reception. In addition, the transceiver 4004 is not limited to one in practical applications, and the structure of the electronic apparatus 4000 is not limited to the embodiment of the present application.

The Processor 4001 may be a CPU (Central Processing Unit), a general-purpose Processor, a DSP (Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array) or other Programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure herein. The processor 4001 may also be a combination that performs a computational function, including, for example, a combination of one or more microprocessors, a combination of a DSP and a microprocessor, or the like.

Bus 4002 may include a path that carries information between the aforementioned components. The bus 4002 may be a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus 4002 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 6, but this is not intended to represent only one bus or type of bus.

The Memory 4003 may be a ROM (Read Only Memory) or other type of static storage device that can store static information and instructions, a RAM (Random Access Memory) or other type of dynamic storage device that can store information and instructions, an EEPROM (Electrically Erasable Programmable Read Only Memory), a CD-ROM (Compact Disc Read Only Memory) or other optical Disc storage, optical Disc storage (including Compact Disc, laser Disc, optical Disc, digital versatile Disc, blu-ray Disc, etc.), a magnetic Disc storage medium or other magnetic storage device, or any other medium that can be used to carry or store desired program codes in the form of instructions or data structures and that can be accessed by a computer, but is not limited to.

The memory 4003 is used for storing application program codes (computer programs) for executing the present scheme, and is controlled by the processor 4001 to execute. Processor 4001 is configured to execute application code stored in memory 4003 to implement what is shown in the foregoing method embodiments.

The present application provides a computer-readable storage medium, which stores a computer program, and when the computer program runs on a computer, the computer can execute the corresponding content in the foregoing method embodiments.

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figures may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, in different orders, and may be performed alternately or in turns with other steps or at least a portion of the sub-steps or stages of other steps.

The foregoing is only a partial embodiment of the present application, and it should be noted that, for those skilled in the art, several modifications and decorations can be made without departing from the principle of the present application, and these modifications and decorations should also be regarded as the protection scope of the present application.

Claims

1. A traffic sign recognition method, comprising:

2. The method of claim 1, wherein the determining the size of each type of traffic sign by clustering the traffic signs in the traffic sign data set using a predetermined clustering algorithm comprises:

3. The traffic sign recognition method of claim 1, wherein the training step of the traffic sign recognition model comprises:

4. The traffic sign recognition method of claim 3, further comprising:

5. The method according to claim 3, wherein the convolution module of the first branch is a 1 x 1 convolution module, the two convolution modules of the second branch are a 1 x 1 convolution module and a 3 x 3 convolution module, respectively, and the four convolution modules of the third branch are a 1 x 1 convolution module and three 3 x 3 convolution modules, respectively.

6. The method of claim 3, wherein the traffic sign recognition model further comprises a deep network feature map module for outputting a large target prediction size and a shallow network feature map module for outputting a target prediction size and a small target prediction size.

7. The traffic sign recognition method of claim 6, further comprising:

8. A traffic sign recognition apparatus, comprising:

9. An electronic device, characterized in that the electronic device comprises:

one or more processors;

a memory;

one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to: -performing a traffic sign recognition method according to any of claims 1 to 7.

10. A computer readable storage medium storing at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by a processor to implement a method of identifying a traffic sign according to any one of claims 1 to 7.