CN113591543B

CN113591543B - Traffic sign recognition method, device, electronic equipment and computer storage medium

Info

Publication number: CN113591543B
Application number: CN202110636627.6A
Authority: CN
Inventors: 李晓欢; 马新舒; 陈倩; 唐欣
Original assignee: Guangxi Comprehensive Transportation Big Data Research Institute; Guilin University of Electronic Technology
Current assignee: Guangxi Comprehensive Transportation Big Data Research Institute; Guilin University of Electronic Technology
Priority date: 2021-06-08
Filing date: 2021-06-08
Publication date: 2024-03-26
Anticipated expiration: 2041-06-08
Also published as: CN113591543A

Abstract

The embodiment of the application provides a traffic sign recognition method, a traffic sign recognition device, electronic equipment and a computer storage medium, and relates to the technical field of image recognition. The method comprises the following steps: acquiring a preset traffic sign data set, clustering traffic signs in the traffic sign data set by adopting a preset clustering algorithm, and determining the sizes of various traffic signs; and identifying the traffic sign to be identified by adopting the trained traffic sign identification model, and confirming the category of the traffic sign to be identified. According to the embodiment of the application, the data set is clustered firstly through the preset clustering method, the traffic sign recognition model is trained, receptive field information of different scales in the image can be obtained, the obtained information of different scales can be fused, the capability of predicting the traffic sign by the shallow network is enhanced, and the detection accuracy of the traffic sign is further improved. Network parameters are reduced, network complexity is reduced, and therefore the detection speed of the traffic sign is improved, and real-time detection of the traffic sign is achieved.

Description

Traffic sign recognition method, device, electronic equipment and computer storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a traffic sign recognition method, a traffic sign recognition device, an electronic device, and a computer storage medium.

Background

In recent years, unmanned vehicles are receiving more and more global attention, surrounding environment information needs to be perceived in order to ensure that unmanned vehicles can get on the road safely, in the prior art, the actual distance between a front target and the vehicle is determined through a laser radar, but the type of the target cannot be determined, so that an auxiliary camera is needed to detect all targets on a road surface in real time and accurately, the farther targets can be identified, the response time of the vehicles can be increased to brake in time to avoid collision, traffic sign detection and identification is an important ring of unmanned sensing parts, and an automatic driving system timely makes correct decision information by acquiring the type and distance information of traffic signs.

The pixels of the traffic sign in real life account for about 0.001% to 5% of the visual field image, the size is small, the occupied pixels are few, the detection of the traffic sign is difficult compared with a large target due to the fact that the characteristics are not obvious, meanwhile, the detection of the traffic sign is also influenced by weather conditions, the traffic sign is difficult to recognize due to bad weather conditions, such as fog, dim weather, complex scenes and the like, and the existing deep learning detection algorithm is difficult to detect and recognize the small target traffic sign in the real scene effectively and with high precision.

It follows that the accuracy of the identification of small target traffic signs in the prior art is not high and improvements are needed.

Disclosure of Invention

The object of the present application is to solve at least one of the above technical drawbacks, in particular the technical drawbacks of the prior art, which do not have high accuracy in the identification of small target traffic signs.

According to one aspect of the present application, there is provided a traffic sign recognition method, the method comprising:

acquiring a preset traffic sign data set, wherein the traffic sign data set comprises a training set and a verification set;

clustering the traffic marks in the traffic mark data set by adopting a preset clustering algorithm, and determining the sizes of various traffic marks;

training a preset traffic sign recognition model by using the training set to obtain a trained traffic sign recognition model;

and identifying the traffic sign to be identified by adopting the trained traffic sign identification model, and identifying the category of the traffic sign to be identified.

As a possible implementation manner of the present application, in this implementation manner, the clustering, using a preset clustering algorithm, of traffic signs in the traffic sign dataset, and determining sizes of various traffic signs includes:

setting a preset number of clusters, and determining initial cluster centers with corresponding numbers;

calculating the distance between the data point in each traffic sign data in the traffic sign data set and the initial clustering center;

determining the data points with the distances within a preset range as clusters where the cluster centers are located;

and taking the central point of all the data of the cluster as the central point of the cluster until the central point of the cluster is not moved, and taking the central point of the cluster as the traffic sign size of the cluster.

As a possible embodiment of the present application, in this embodiment, the training step of the traffic sign recognition model includes:

inputting the training data into a first branch to calculate to obtain a first output, wherein the first branch comprises a convolution module;

inputting the training data into a second branch to calculate to obtain a second output, wherein the second branch comprises two convolution modules;

inputting the training data into a third branch to calculate to obtain a third output, wherein the third branch comprises four convolution modules;

inputting the training data into a fourth branch to calculate to obtain a fourth output, wherein the fourth branch comprises a maximum pooling module;

fusing the first output, the second output, the third output and the fourth output, performing convolution calculation on the fused result for one time to obtain an output result,

as a possible embodiment of the present application, in this embodiment, the method further includes:

and calculating the loss error of the trained traffic sign recognition model by adopting the training set, and obtaining the trained traffic sign recognition model when the loss error is within a preset range.

As a possible implementation manner of the present application, in this implementation manner, one convolution module of the first branch is a convolution module of 1*1, two convolution modules of the second branch are a convolution module of 1*1 and a convolution module of 3*3, respectively, and four convolution modules of the third branch are a convolution module of 1*1 and convolution modules of three 3*3, respectively.

As one possible embodiment of the present application, in this embodiment,

the traffic sign recognition model further comprises a deep network feature map module and a shallow network feature map module, wherein the deep network feature map is used for outputting a large target prediction size, and the shallow network feature map module is used for outputting a target prediction size and a small target prediction size.

and fusing the large target predicted size, the medium target predicted size and the small target predicted size with the output result to obtain a final output result.

According to another aspect of the present application, there is provided a traffic sign recognition apparatus including:

the system comprises a data acquisition module, a data processing module and a data processing module, wherein the data acquisition module is used for acquiring a preset traffic sign data set, and the traffic sign data set comprises a training set and a verification set;

the clustering module is used for clustering the traffic marks in the traffic mark data set by adopting a preset clustering algorithm, and determining the sizes of various traffic marks;

the training module is used for training a preset traffic sign recognition model by using the training set to obtain a trained traffic sign recognition model;

and the identification module is used for identifying the traffic sign to be identified by adopting the trained traffic sign identification model and confirming the category of the traffic sign to be identified.

According to another aspect of the present application, there is provided an electronic device including:

one or more processors;

a memory;

one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more applications configured to: the traffic sign recognition method described above is performed.

According to yet another aspect of the present application, there is provided a computer storage medium storing at least one instruction, at least one program, a code set, or an instruction set, which is loaded and executed by the processor to implement the traffic sign recognition method described above.

According to the embodiment of the application, the data sets are clustered through the preset clustering method, the clustered data sets are used for training the traffic sign recognition model, sensing field information of different scales in the image can be obtained, the obtained information of different scales can be fused, the capability of the shallow network for predicting traffic signs is enhanced, the detection accuracy of the traffic signs is further improved, network parameters are reduced, network complexity is reduced, the detection speed of the traffic signs is improved, and the real-time detection of the traffic signs is achieved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings that are required to be used in the description of the embodiments of the present application will be briefly described below.

Fig. 1 is a schematic flow chart of a traffic sign recognition method according to an embodiment of the present application;

fig. 2 is a schematic flow chart of a data point clustering method according to an embodiment of the present application;

FIG. 3 is a schematic structural diagram of an initial redefinition module according to an embodiment of the present disclosure;

fig. 4 is a schematic diagram of a deep network structure according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a traffic sign recognition device according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

The above and other features, advantages, and aspects of embodiments of the present application will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements. It should be understood that the figures are schematic and that elements and components are not necessarily drawn to scale.

Detailed Description

Embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present application are shown in the drawings, it is to be understood that the present application may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided to provide a more thorough and complete understanding of the present application. It should be understood that the drawings and examples of the present application are for illustrative purposes only and are not intended to limit the scope of the present application.

It should be understood that the various steps recited in the method embodiments of the present application may be performed in a different order and/or performed in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present application is not limited in this respect.

The term "including" and variations thereof as used herein are intended to be open-ended, i.e., including, but not limited to. The term "based on" is based at least in part on. The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments. Related definitions of other terms will be given in the description below.

It should be noted that the terms "first," "second," and the like herein are used merely to distinguish one device, module, or unit from another device, module, or unit, and are not intended to limit the order or interdependence of the functions performed by the devices, modules, or units.

It should be noted that references to "one" or "a plurality" in this application are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be interpreted as "one or more" unless the context clearly indicates otherwise.

The names of messages or information interacted between the various devices in the embodiments of the present application are for illustrative purposes only and are not intended to limit the scope of such messages or information. For the purpose of making the objects, technical solutions and advantages of the present application more apparent, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

The traffic sign recognition method provided by the embodiment of the application can be applied to an unmanned system and used for recognizing traffic signs and further carrying out correct road planning, as the pixels of the traffic signs in real life account for about 0.001 to 5 percent of the visual field image, the size is small, the occupied pixels are few, the detection of the traffic signs is difficult compared with a large target due to the fact that the characteristics are not obvious, meanwhile, the detection of the traffic signs is also influenced by weather conditions, the traffic signs are difficult to recognize due to bad weather conditions, such as fog, dim weather, complex scenes and the like, and the existing deep learning detection algorithm is difficult to effectively and accurately detect and recognize the small target traffic signs in the real scenes. According to the embodiment of the application, the data sets are clustered through the preset clustering method, the clustered data sets are used for training the traffic sign recognition model, the receptive field information of different scales in the image can be obtained, the obtained information of different scales can be fused, the capability of the shallow network for predicting the traffic sign is enhanced, the detection accuracy of the traffic sign is further improved, the network parameters are reduced, the network complexity is reduced, the detection speed of the traffic sign is improved, and the real-time detection of the traffic sign is achieved.

The traffic sign recognition method, the traffic sign recognition device, the electronic equipment and the computer readable storage medium aim to solve the technical problems in the prior art.

The technical scheme of the present application and how the technical scheme of the present application solves the above technical problems are described in detail below with specific embodiments. The following embodiments may be combined with each other and may not be repeated in some embodiments for the same or similar concepts or processes. Embodiments of the present application will be described below with reference to the accompanying drawings.

The embodiment of the application provides a traffic sign recognition method, as shown in fig. 1, which comprises the following steps:

step S101, a preset traffic sign data set is obtained, wherein the traffic sign data set comprises a training set and a verification set;

step S102, clustering traffic marks in the traffic mark data set by adopting a preset clustering algorithm, and determining the sizes of various traffic marks;

step S103, training a preset traffic sign recognition model by using the training set to obtain a trained traffic sign recognition model;

and step S104, identifying the traffic sign to be identified by adopting the trained traffic sign identification model, and confirming the category of the traffic sign to be identified.

In this embodiment of the present application, the traffic sign dataset refers to image data including traffic signs, which may be a picture or a video, and as one embodiment of the present application, the traffic sign dataset may be a TT100K dataset, and the dataset is divided into a training set and a verification set, where the training set may include most of the data, and the verification set only needs to include a small portion of the data. After the traffic sign dataset is obtained, clustering data in the dataset by adopting a preset clustering algorithm, and determining the types of traffic signs in the dataset, wherein the distinguishing standard of the types is the size of the traffic sign, determining the size of the traffic sign of each type after classifying the data in the traffic sign dataset, identifying the image data to be identified by adopting a preset target detection model based on the size of the traffic sign, and determining the target traffic sign to be detected, wherein the size of the target traffic sign corresponds to one of the sizes identified by clustering. The method comprises the steps of training a preset traffic sign recognition model by adopting a training set, recognizing traffic signs to be recognized by adopting the trained model to obtain the types of the traffic signs to be recognized, clustering the data sets by adopting a preset clustering method, training the traffic sign recognition model by adopting the clustered data sets, acquiring the receptive field information of different scales in an image, fusing the acquired information of different scales, enhancing the capability of predicting the traffic signs of a shallow network, further improving the detection accuracy of the traffic signs, reducing network parameters, reducing network complexity, improving the detection speed of the traffic signs, and achieving the real-time detection of the traffic signs.

The embodiment of the present application provides a possible implementation manner, in this implementation manner, as shown in fig. 2, the clustering of the traffic signs in the traffic sign dataset by using a preset clustering algorithm, and determining the sizes of the traffic signs of various types, including:

step S201, setting a preset number of clusters, and determining a corresponding number of initial cluster centers;

step S202, calculating the distance between the data point in each traffic sign data in the traffic sign data set and the initial clustering center;

step S203, determining the data points with the distances within a preset range as clusters where cluster centers are located;

and S204, taking the central point of all data of the cluster as the central point of the cluster until the central point of the cluster is not moved any more, and taking the central point of the cluster as the traffic sign size of the cluster.

In the embodiment of the application, when data are clustered, a K-means clustering algorithm is adopted, and when traffic signs are predicted by the prediction feature map, the sizes of countless traffic signs can be predicted, and the sizes of anchor frames need to be referred to for prediction, wherein the anchor frames are the width and the height closest to the traffic signs. Firstly, K clusters are selected manually, K initial cluster centers are selected randomly, the distance d between each data point and the cluster center is calculated in a circulating mode, the data point with smaller d value is divided into clusters where the cluster centers are located, the centers of all data points in each cluster are selected as new cluster centers, the final cluster centers are obtained after multiple iterations to be anchor frames, and the anchor frames are used for replacing the anchor values in the configuration file according to the proportion of 9 anchor frames generated after the K-means clustering algorithm.

According to the embodiment of the application, the K-means clustering algorithm is adopted, so that the anchor frame proportion can be effectively confirmed, and the follow-up traffic sign can be conveniently identified.

and fusing the first output, the second output, the third output and the fourth output, and performing convolution calculation on the fused result to obtain an output result.

One convolution module of the first branch is a convolution module of 1*1, two convolution modules of the second branch are a convolution module of 1*1 and a convolution module of 3*3 respectively, and four convolution modules of the third branch are a convolution module of 1*1 and a convolution module of three 3*3 respectively.

The method further comprises the steps of:

In this embodiment of the present application, as shown in fig. 3, a schematic structure diagram of an initial redefined module (defined module) provided in this embodiment of the present application is shown, where the structure includes four branches in total, and the first branch is composed of a convolution of 1*1; the second branch consists of a convolution of 1*1 followed by a convolution of 3*3; the third branch consists of one 1*1 convolution followed by three 3*3 convolutions; the fourth branch consists of a maximum pooling layer, 7*7 convolution is very effective in providing semantic information for a wide range of small objects, and using three 3*3 convolutions instead of one 7*7 convolution can save 7*7/3×3×3=1.81 times the computational load, thereby improving the computational speed. The 3*3 convolution layer is preceded by a 1*1 convolution layer for reducing the number of input channels, greatly reducing the number of network parameters while increasing the depth of the network, and a pooling layer for extracting image features. The four branches are respectively subjected to feature extraction of different scales, the adaptability of the network to the different scales is improved, information of multiple scales is obtained, then feature graphs under the four branches are fused, and finally the number of output channels is reduced through a 1*1 convolution layer.

The method comprises the steps of respectively introducing an acceptance-defined module structure between an output 52 x 52 (and 26 x 26) feature map of a traffic sign recognition model and a cascading module, and as shown in fig. 4, connecting a shallow-layer network feature map (shallow-network layer) to a feature map (deep-network layer) of a deep network to combine deep feature information with the shallow feature information, so that the prediction capability of the shallow network on traffic signs is enhanced, and the detection accuracy is improved.

In the embodiment of the present application, the traffic sign recognition model uses convolution with a convolution kernel size of 3*3 and a step length of 2 to perform 5 downsampling, so as to form 5 feature maps, i.e., 208×208, 104×104, 52×52, 26×26, and 13×13. The 13 x 13 output feature map is convolved by ConvSet,3*3 and 1*1 to output a large target prediction scale Y1. The 26 x 26 output feature map is first up-sampled with step length of 2, then connected to the admission-defined module structure through the con-cate, and then output the medium-size target prediction scale Y2 after ConvSet,3*3 and 1*1 convolution. Similarly, the 52×52 output feature map performs an up-sampling operation with a step length of 2, and then is connected to an acceptance-defined module structure through a con-cate, and then outputs a small target prediction scale Y3 after being convolved by ConvSet,3*3 and 1*1. And simultaneously, traffic sign prediction is carried out on three different scales of Y1, Y2 and Y3.

In the embodiment of the application, 8 residual error modules are reduced in total by pruning the base structure, so that network parameters and model complexity are reduced, and finally the detection speed of the model is improved.

The embodiment of the application provides a traffic sign recognition device, as shown in fig. 5, the traffic sign recognition device 50 may include: a data acquisition module 501, a clustering module 502, a training module 503, and an identification module 504, wherein,

the data acquisition module 501 is configured to acquire a preset traffic sign data set, where the traffic sign data set includes a training set and a verification set;

the clustering module 502 is configured to cluster traffic signs in the traffic sign dataset by using a preset clustering algorithm, and determine the sizes of the traffic signs;

the training module 503 is configured to train a preset traffic sign recognition model by using the training set, so as to obtain a trained traffic sign recognition model;

and the identifying module 504 is configured to identify the traffic sign to be identified by using the trained traffic sign identifying model, and confirm the category of the traffic sign to be identified.

As a possible implementation manner of the present application, in this implementation manner, when the clustering module 502 clusters the traffic signs in the traffic sign dataset by adopting a preset clustering algorithm, the clustering module may be configured to:

calculating the distance between the data point in each traffic sign data and the initial clustering center;

and taking the central point of the cluster as the central point of the cluster, and determining the size of the traffic sign corresponding to the cluster.

as a possible embodiment of the present application, in this embodiment, the device is further configured to:

As one possible embodiment of the present application, in this embodiment,

An embodiment of the present application provides an electronic device, including: a memory and a processor; at least one program stored in the memory for, when executed by the processor, obtaining a preset traffic sign dataset comprising a training set and a validation set; clustering the traffic signs in the traffic sign dataset by adopting a preset clustering algorithm, and determining the sizes of various traffic signs; training a preset traffic sign recognition model by using the training set and the verification set to obtain a trained traffic sign recognition model; and identifying the traffic sign to be identified by adopting the trained traffic sign identification model, and confirming the type of the traffic sign to be identified.

Compared with the prior art, can realize: according to the embodiment of the application, the data sets are clustered through the preset clustering method, the clustered data sets are used for training the traffic sign recognition model, the receptive field information of different scales in the image can be obtained, the obtained information of different scales can be fused, the capability of predicting the traffic sign by the shallow network is enhanced, the detection accuracy of the traffic sign is further improved, the network parameters are reduced, the network complexity is reduced, the detection speed of the traffic sign is improved, and the real-time detection of the traffic sign is achieved.

In an alternative embodiment, an electronic device is provided, as shown in fig. 6, the electronic device 4000 shown in fig. 6 includes: a processor 4001 and a memory 4003. Wherein the processor 4001 is coupled to the memory 4003, such as via a bus 4002. Optionally, the electronic device 4000 may further comprise a transceiver 4004, the transceiver 4004 may be used for data interaction between the electronic device and other electronic devices, such as transmission of data and/or reception of data, etc. It should be noted that, in practical applications, the transceiver 4004 is not limited to one, and the structure of the electronic device 4000 is not limited to the embodiment of the present application.

The processor 4001 may be a CPU (Central Processing Unit ), general purpose processor, DSP (Digital Signal Processor, data signal processor), ASIC (Application Specific Integrated Circuit ), FPGA (Field Programmable Gate Array, field programmable gate array) or other programmable logic device, transistor logic device, hardware components, or any combination thereof. Which may implement or perform the various exemplary logic blocks, modules, and circuits described in connection with this application disclosure. The processor 4001 may also be a combination that implements computing functionality, e.g., comprising one or more microprocessor combinations, a combination of a DSP and a microprocessor, etc.

Bus 4002 may include a path to transfer information between the aforementioned components. Bus 4002 may be a PCI (Peripheral Component Interconnect, peripheral component interconnect standard) bus or an EISA (Extended Industry Standard Architecture ) bus, or the like. The bus 4002 can be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in fig. 6, but not only one bus or one type of bus.

The Memory 4003 may be, but is not limited to, ROM (Read Only Memory) or other type of static storage device that can store static information and instructions, RAM (Random Access Memory ) or other type of dynamic storage device that can store information and instructions, EEPROM (Electrically Erasable Programmable Read Only Memory ), CD-ROM (Compact Disc Read Only Memory, compact disc Read Only Memory) or other optical disc storage, optical disc storage (including compact disc, laser disc, optical disc, digital versatile disc, blu-ray disc, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.

The memory 4003 is used for storing application program codes (computer programs) for executing the present application, and execution is controlled by the processor 4001. The processor 4001 is configured to execute application program codes stored in the memory 4003 to realize what is shown in the foregoing method embodiment.

The present application provides a computer readable storage medium having a computer program stored thereon, which when run on a computer, causes the computer to perform the corresponding method embodiments described above.

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.

The foregoing is only a partial embodiment of the present application, and it should be noted that, for a person skilled in the art, several improvements and modifications can be made without departing from the principle of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.

Claims

1. A traffic sign recognition method, comprising:

identifying the traffic sign to be identified by adopting the trained traffic sign identification model, and confirming the category of the traffic sign to be identified;

clustering the traffic signs in the traffic sign dataset by adopting a preset clustering algorithm to determine the sizes of various traffic signs, wherein the method comprises the following steps:

taking the central points of all data of the clusters as the central points of the clusters until the central points of the clusters are not moved, and taking the central points of the clusters as the traffic sign size of the clusters;

the training step of the traffic sign recognition model comprises the following steps:

fusing the first output, the second output, the third output and the fourth output, and performing convolution calculation on the fused result to obtain an output result;

2. The traffic sign recognition method according to claim 1, further comprising:

3. The traffic sign recognition method of claim 1, wherein the traffic sign recognition model further comprises a deep network feature map module and a shallow network feature map module, the deep network feature map being configured to output a large target predicted size and the shallow network feature map module being configured to output a target predicted size and a small target predicted size.

4. The traffic sign recognition method of claim 3, further comprising:

5. A traffic sign recognition device, comprising:

the recognition module is used for recognizing the traffic sign to be recognized by adopting the trained traffic sign recognition model and confirming the category of the traffic sign to be recognized;

6. An electronic device, the electronic device comprising:

one or more processors;

a memory;

one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more applications configured to: the traffic sign recognition method according to any one of claims 1 to 4 is performed.

7. A computer readable storage medium storing at least one instruction, at least one program, code set, or instruction set, the at least one instruction, the at least one program, the code set, or instruction set being loaded and executed by a processor to implement the traffic sign recognition method of any one of claims 1-4.