CN111597875A

CN111597875A - Traffic sign identification method, device, equipment and storage medium

Info

Publication number: CN111597875A
Application number: CN202010249474.5A
Authority: CN
Inventors: 许成舜; 施亮; 张骋
Original assignee: Zhejiang Geely Holding Group Co Ltd; Geely Automobile Research Institute Ningbo Co Ltd
Current assignee: Zhejiang Geely Holding Group Co Ltd; Geely Automobile Research Institute Ningbo Co Ltd
Priority date: 2020-04-01
Filing date: 2020-04-01
Publication date: 2020-08-28

Abstract

The invention discloses a traffic sign identification method, which comprises the following steps: acquiring an image to be identified; preprocessing the image to be identified and generating an image pyramid; respectively extracting the edge features and the texture features of each layer of image in the image pyramid; performing feature association fusion on the edge features and the texture features to obtain traffic sign information in the image to be identified; and carrying out identification processing based on the traffic sign information and a traffic sign classifier to obtain a traffic sign category corresponding to the traffic sign information. The invention also discloses a traffic sign recognition device, equipment and a storage medium. By adopting the invention, the position and the size of the traffic sign can be better detected, and the detection capability of the traffic sign area is improved; and the recognition speed can be increased, and the recognition rate is improved.

Description

Traffic sign identification method, device, equipment and storage medium

Technical Field

The present invention relates to image recognition technologies, and in particular, to a method, an apparatus, a device, and a storage medium for recognizing a traffic sign.

Background

At present, the traffic sign detection and calculation method is based on two algorithms: the method is easily influenced by factors such as light, deformation, shielding and the like, and generally has higher error recognition rate. The deep learning algorithm is based on a deep neural network, and through feature extraction and sampling processing layer by layer, the classification performance is strong, but massive data and strong hardware processing capacity are needed, the power consumption is high, and the cost is high.

Disclosure of Invention

In order to solve the above technical problem, in a first aspect, the present invention discloses a traffic sign identification method, where the identification method includes:

acquiring an image to be identified;

preprocessing the image to be identified and generating an image pyramid;

respectively extracting the edge features and the texture features of each layer of image in the image pyramid;

performing feature association fusion on the edge features and the texture features to obtain traffic sign information in the image to be identified;

performing identification processing based on the traffic sign information and a traffic sign classifier to obtain a traffic sign category corresponding to the traffic sign information;

the traffic sign classifier is determined by performing machine learning training based on a traffic sign sample image and a corresponding traffic sign class, and the class corresponding to the traffic sign sample image and the class corresponding to the image to be detected belong to the same type of class.

Further, the acquiring the image to be recognized includes:

acquiring a video image acquired by a camera;

and extracting the video image in alternate lines to obtain the image to be identified.

Further, the preprocessing the image to be recognized and generating the image pyramid includes:

carrying out gray level processing on the image to be identified;

and zooming the image to be recognized after the gray processing according to a preset zooming factor to obtain the image pyramid with a preset number of layers.

Further, the separately extracting the edge feature and the texture feature of each layer of image in the image pyramid includes:

detecting each layer of image in the image pyramid through a detection window;

calculating a feature operator of the current detection window in a sliding manner in the current detection window through a sublayer feature integration unit; the sublayer integration unit comprises a plurality of minimum feature detection units, and each minimum feature detection unit consists of a plurality of pixels;

calculating and generating an edge vector as an edge feature according to the feature operator of each detection window;

the pixel value of the current pixel point is obtained,

acquiring a neighborhood pixel point where the current pixel point is located and a pixel value of the neighborhood pixel point;

acquiring feature information in a neighborhood range according to the pixel value of the current pixel point and the pixel value of the neighborhood pixel point;

and calculating to obtain texture features according to the feature information.

Further, obtaining feature information in a neighborhood range according to the pixel value of the current pixel point and the pixel value of the neighborhood pixel point includes:

acquiring the position information of the current pixel point and the neighborhood pixel point;

and generating a feature histogram of the neighborhood range according to the position information, the pixel value of the current pixel point and the pixel value of the neighborhood pixel point, and taking the feature histogram as feature information.

Further, before the acquiring the image to be recognized, the method further includes:

acquiring a training sample image marked with a traffic sign category;

preprocessing the training sample image and generating an image pyramid;

taking the traffic sign information as input information of a traffic sign classifier, and taking the traffic sign category as output information of the traffic sign classifier;

comparing the output information of the traffic sign classifier with the training sample image;

and training and generating the traffic sign classifier according to the comparison result.

Further, the traffic sign classifier comprises an input layer, a convolution layer, a down-sampling layer, a widening layer, a full-connection layer and an output layer;

the input layer is used for acquiring the traffic sign information and generating a two-dimensional image matrix after adjusting the size of the traffic sign information to a preset size;

the convolution layer is used for performing discrete convolution operation on the two-dimensional image matrix to obtain a convolution result;

the down-sampling layer is used for selecting the pixel value of a certain pixel point in the pooling domain as the whole pixel value of the pooling domain area;

the broadening layer is used for increasing the network width of the traffic sign classifier;

the full connection layer is used for extracting output results of all layers;

and the output layer counts the probability of each traffic sign category through a softmax function to obtain the category of the traffic sign.

In a second aspect, a traffic sign recognition apparatus, the recognition apparatus comprising:

the image to be recognized acquisition module is used for acquiring an image to be recognized;

the image pyramid generation module is used for preprocessing the image to be identified and generating an image pyramid;

the characteristic extraction module is used for respectively extracting the edge characteristic and the texture characteristic of each layer of image in the image pyramid;

the traffic sign information acquisition module is used for performing feature association fusion on the edge features and the texture features to obtain traffic sign information in the image to be identified;

the traffic sign identification module is used for carrying out identification processing based on the traffic sign information and a traffic sign classifier to obtain a traffic sign category corresponding to the traffic sign information;

In a third aspect, the present invention provides an apparatus comprising a processor and a memory, the memory having stored therein at least one instruction, at least one program, set of codes, or set of instructions, which is loaded and executed by the processor to implement a method of identifying a traffic sign according to any one of the preceding claims.

In a fourth aspect, the present invention provides a storage medium comprising a processor and a memory, the memory having stored therein at least one instruction, at least one program, set of codes, or set of instructions, which is loaded and executed by the processor to implement a method of identifying a traffic sign according to any one of the preceding claims.

By adopting the technical scheme, the invention has the following beneficial effects: after the edge features and the texture features are fused, the position and the size of the traffic sign can be better detected, and the detection capability of a traffic sign area is improved. In addition, the deep network has strong capability of extracting features, has good generalization capability on fuzzy images, adhered characters, incomplete structures and other targets, and has strong recognition capability. Therefore, the recognition speed is increased, and the recognition rate is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flow chart of a traffic sign identification method according to an embodiment of the present invention;

fig. 2 is a schematic flowchart of a process of acquiring an image to be recognized according to an embodiment of the present invention;

fig. 3 is a schematic flowchart of processing an image to be recognized according to an embodiment of the present invention;

fig. 4 is a schematic flowchart of a process for extracting edge features according to an embodiment of the present invention;

fig. 5 is a schematic flow chart of extracting texture features according to an embodiment of the present invention;

fig. 6 is a schematic flowchart of the implementation of step S323 provided in the embodiment of the present invention;

fig. 7 is a diagram of a cell feature structure according to an embodiment of the present invention;

fig. 8 is a cell feature histogram according to an embodiment of the present invention;

FIG. 9 is a flowchart illustrating a method for training a traffic sign classifier according to an embodiment of the present invention;

FIG. 10 is a diagram illustrating a deep neural network according to an embodiment of the present invention;

fig. 11 is a schematic diagram of a module of a widened layer network layer according to an embodiment of the present invention;

fig. 12 is a schematic structural diagram of a traffic sign recognition apparatus according to an embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

The following describes an embodiment of a traffic sign recognition method according to the present invention, and fig. 1 is a schematic flow chart of a traffic sign recognition method according to an embodiment of the present invention, which provides the method operation steps described in the embodiment or the flow chart, but may include more or less operation steps based on conventional or non-inventive labor. The order of steps recited in the embodiments is merely one manner of performing the steps in a multitude of orders and does not represent the only order of execution. As shown in fig. 1, the traffic sign recognition method may include:

s100: and acquiring an image to be identified.

In particular, the invention can be applied to the identification of traffic signs in the surrounding environment during the driving of a vehicle. When the method is applied to vehicle driving, the image to be recognized can be acquired by a camera arranged on the vehicle, and specifically, as shown in fig. 2, the method may include the following steps:

s110: acquiring a video image acquired by a camera;

s120: and extracting the video image in alternate lines to obtain the image to be identified.

In specific implementation, the extraction is carried out through alternate lines, so that the interference caused by image shaking can be reduced, and the reliability of the image to be identified is improved.

In addition, the video image may also be scaled to a uniform resolution before or after the interlaced decimation of the video image. For example, in the stage of detecting the traffic sign, the video image collected by the front-looking camera is scaled to the original image with the resolution of 1280 × 720 until the resolution is 640 × 360, and the video image is extracted by every two interlaced pixels to obtain the image to be identified.

S200: and preprocessing the image to be identified and generating an image pyramid.

An image pyramid is a kind of multi-scale representation of an image, which is a structure that interprets an image in multiple resolutions. Typically, a pyramid of an image is a series of image sets of progressively lower resolution arranged in a pyramid shape and derived from the same original image. In particular implementation, as shown in fig. 3, the image to be recognized may be preprocessed and the image pyramid may be generated by:

s210: and carrying out gray processing on the image to be identified.

In an embodiment of the present invention, a histogram of gradient directions may be used for traffic sign identification. The gray level preprocessing is carried out on the image to be recognized, so that unnecessary color information can be removed, the data processing amount is reduced, and the performance is improved. In specific implementation, the image to be identified, which is zoomed to a uniform size, can be subjected to gray scale conversion, specifically, three-channel RGB images can be converted into a gray scale image of a single channel, and the gray scale value range is 0-255.

S220: and zooming the image to be recognized after the gray processing according to a preset zooming factor to obtain the image pyramid with a preset number of layers.

In specific implementation, the preset scaling factor may be one scaling value or a plurality of scaling values, the image to be recognized is scaled according to the preset scaling factor, and the resolution is gradually reduced as the number of layers increases. For example, the scaling factor is 1.15, the number of scaled layers is 5, and multi-scale scaling is performed on the grayscale image with the resolution of 640 × 360, so that 5 scaled image golden sub-towers of the first layer size 556 × 313, the second layer size 483 × 272, the third layer size 420 × 236, the fourth layer size 365 × 205, and the fifth layer size 317 × 178 can be obtained.

In specific implementation, histogram equalization can be performed on each scale image to enhance contrast, and pixel filling is performed on image boundaries to avoid losing image edge information.

S300: and respectively extracting the edge features and the texture features of each layer of image in the image pyramid.

In particular implementation, as shown in fig. 4, the edge feature may be extracted by:

s311: detecting each layer of image in the image pyramid through a detection window;

s312: calculating a feature operator of the current detection window in a sliding manner in the current detection window through a sublayer feature integration unit; the sublayer integration unit comprises a plurality of minimum feature detection units, and each minimum feature detection unit consists of a plurality of pixels;

s313: and calculating and generating an edge vector as an edge feature according to the feature operator of each detection window.

In practical implementation, a minimum feature detection unit cell is designed, wherein the cell is composed of a plurality of pixels, and the shape of the cell is square; and designing a sub-layer feature integration unit block, wherein the block generally comprises a plurality of cells, each block slides in a view finding window win, and feature operators collected by all blocks are integrated to form a group of edge vectors capable of describing target features, so that the edge features are extracted. The edge feature can be calculated by the following method, and if the pixel value of the current pixel point (x, y) is pixel (x, y), then:

horizontal edge: granent_x＝pixel(x+1,y)-pixel(x-1,y)

Vertical edge: granent_y＝pixel(x,y+1)-pixel(x,y-1)

Modulus of the edge of the spot: granent² _x+grandent² _y

Orientation of the point edge: tan (r) is^-1(grandent_y/grandent_x)

The calculation formula of the edge vector dimension is as follows:

N＝((winsize-block)/step+1)²*(block/cell)²*bin

wherein winsize is the size of the detection window, block is the size of the characteristic block, step is the sliding step length, cell is the size of the cell, and bin is the number of angle separation intervals.

In particular implementation, as shown in fig. 5, the texture features may be extracted by:

s321: and acquiring the pixel value of the current pixel point.

S322: and acquiring a neighborhood pixel point where the current pixel point is located and the pixel value of the neighborhood pixel point.

S323: and acquiring feature information in a neighborhood range according to the pixel value of the current pixel point and the pixel value of the neighborhood pixel point.

In a specific implementation, as shown in fig. 6, the step S323 may include the following steps:

s3231: acquiring the position information of the current pixel point and the neighborhood pixel point;

s3232: and generating a feature histogram of the neighborhood range according to the position information, the pixel value of the current pixel point and the pixel value of the neighborhood pixel point, and taking the feature histogram as feature information.

S324: and calculating to obtain texture features according to the feature information.

In actual implementation, when the pixel value of the current pixel (x, y) is pixel (x, y), the texture feature extraction method of the point may be that the pixel value of pixel (x, y) is set as a threshold T, and the pixel values of m feature pixels with a neighborhood distance of d are compared with the threshold T to obtain a weight w_i. If the value is larger than the threshold value T, reserving and setting the value as 1; if the value is smaller than the threshold value, the value is discarded and set as 0, and the calculation method is as follows:

then using a table look-up method, wherein the value of each adjacent pixel point is a multiple relation of 2, namely 2 is sequentially arranged from left to right and from top to bottom in the image coordinate system⁰、2¹、…、2^n-1And n is the total number of the pixel points in the cell. And accumulating the binary values of the corresponding positions compared with the T to obtain a texture feature of the point, wherein the calculation formula is as follows:

by the method, each point in the cell is sequentially traversed and counted to obtain a feature statistical histogram of the cell, and then the feature statistical histograms of the cells in the view window are integrated to obtain the overall texture feature. When the neighborhood is d equal to 2 and the feature pixel m is 8, the cell feature structure diagram is shown in fig. 7.

As shown in FIG. 8, 2 in the feature histogram of a cell²⁴The characteristic has large data volume, is not beneficial to characteristic expression and training, and can reduce the dimension of the characteristic by adopting a method that the binary jump times are not more than 2. Specifically, the original binary number in the feature histogram is cyclically shifted once, and the xor operation of each bit is performed to calculate the number of "1" bits in the xor result. For example, 8 is the source code 00000001, 10000000 after cyclic shift, 10000001 after xor, and the number of "1" in the code is 2, with the following result: the xor result is equal to 2 codes, each of which has 8 × 7 codes as one feature; 2 codes with the exclusive or result equal to 0; and other code words are uniformly classified into one class. Then for 8-ary codes there are 8 x 7+2+1 classes. For the 24 codes of the present invention, there are 24 × 23+2+1 classes, which are much smaller than 2²⁴。

S400: and performing feature association fusion on the edge features and the texture features to obtain the traffic sign information in the image to be identified.

In specific implementation, the edge features and the texture features can be subjected to pixel feature cascade fusion of feature maps, so that the traffic sign is highlightedThe detection area of (1). And carrying out normalization processing to count the characteristic vectors of the detection window, and sending the characteristic vectors into a traffic sign classifier for classification, thereby judging whether the traffic sign exists in the image to be identified, and if so, outputting the specific position coordinate and size of the traffic sign in the image. For example, its coordinate at the upper left corner in the image is t_lAnd (x, y) has a width of width and a height of height. And if the traffic sign does not exist, jumping to the next frame of the video to continue detection.

The traffic sign is detected by adopting the method, the feature after the fusion of the texture and the edge is extracted through the binary local feature, and the texture feature emphasizes on describing the details of the target, particularly the warning sign; the edge features focus on describing target contour information, such as the gradient features of circles and triangles; when the edge features are interfered, such as fading, shadow interference, glare and the like of the traffic sign board, under the conditions, the detection effect is poor, and after the texture fusion features are added for fusion, the position and the size of the traffic sign are better detected compared with a single texture feature or edge feature method, and the detection capability of a traffic sign area is improved.

S500: performing identification processing based on the traffic sign information and a traffic sign classifier to obtain a traffic sign category corresponding to the traffic sign information;

In some possible embodiments, before acquiring the image to be identified, as shown in fig. 9, the method further includes:

s010: and acquiring a training sample image marked with a traffic sign category.

The training sample image may include a training sample, a verification sample and a test sample, the training sample is used for network learning target class characteristics, the verification sample is used for verifying learning effects and optimizing network learned parameters, and the test sample is used for detecting classification effects of the generated model. In specific implementation, a video acquisition sample recorded by a front-view camera of a test vehicle can be acquired, the recorded video is uploaded to a workstation, whether a traffic sign appears in the video or not is observed, and an image of the appearing traffic sign is intercepted to serve as original sample data. The original sample can be expanded by methods of rotation, inclination, scale transformation, artificial noise increase and the like, so that the richness of the sample set is improved, and the generalization of the algorithm is improved. Specifically, for a video with a frame rate of 30, during the occurrence of a traffic sign, a sample is captured every 10 frames to make 10000 traffic sign samples of 3 types, wherein the training set, the verification set and the test set may respectively account for 70%, 15% and 15% of the total weight. The expanded sample set can be used for target detection and classification at the same time, one part is used for gradient direction feature extraction and support classifier (SVM) training, and the other part is used for training of a deep learning network model in a classification method.

S020: and preprocessing the training sample image and generating an image pyramid.

S030: and respectively extracting the edge features and the texture features of each layer of image in the image pyramid.

S040: and performing feature association fusion on the edge features and the texture features to obtain the traffic sign information in the image to be identified.

S050: and taking the traffic sign information as input information of a traffic sign classifier, and taking the traffic sign category as output information of the traffic sign classifier.

S060: and comparing the output information of the traffic sign classifier with the training sample image.

S070: and training and generating the traffic sign classifier according to the comparison result.

In some possible embodiments, the traffic sign classifier includes an input layer, a convolutional layer, a downsampling layer, a topology layer, a fully-connected layer, and an output layer;

the full connection layer is used for extracting output results of all layers;

The traffic sign classifier is obtained by constructing a deep classification neural network and training, the deep classification neural network can extract the characteristics of the content of the detected traffic sign, the deep network has strong characteristic extraction capability, has good generalization capability on fuzzy images, adhesive characters, incomplete structures and other targets, and has strong recognition capability.

In some possible embodiments, a deep neural network of 3 convolutional layers (volumes), 5 downsampling layers (Down-firing), 9 extension layers (inclusion), and 1 fully connected layers (fully connected layers) may be constructed. And the layer depth of the widening layer is 2, so that the deep neural network with 21 network layers is constructed. The input layer zooms the acquired traffic sign information to a uniform size and then sends the traffic sign information into a neural network for layer-by-layer feature extraction, the feature of the previous layer of image is extracted by a convolutional layer, the weight is reduced by a downsampling layer, network parameters are simplified to prevent overfitting, feature vectors are integrated by a full-connection layer and transmitted to an activation function for processing, and finally classification is achieved. Specifically, the convolutional layer extracts image features through convolution of a convolution kernel and an input image, and the extracted features are higher as the depth of a network increases, so that target characteristics can be expressed more abstractly. The downsampling layer represents the overall pixel value of a certain pixel in the region by downsampling the value of the pixel in the region. For example, sampling can be performed through maximum downsampling, wherein the maximum downsampling is to use the value of the maximum pixel point in the region to represent the region characteristic, so that other points with small pixel values are omitted, the contribution of other pixel points to a subsequent link layer is omitted, the number of link weights is reduced, the calculation efficiency is improved, and the image translation invariance capability expressed by the characteristic can be increased; the sampling may also be performed by averaging the samples. The broadening layer can increase the network width and reduce training parameters at the same time, and the network structure is sparse. For example, the broadening layer may consist of 4 branches, the first branch 1 × 1 convolution layer, extracting features, reducing the number of channels; the second branch firstly performs 1 × 1 convolution to reduce the number of input feature maps, and then performs 3 × 3 convolution to extract features; the third branch also reduces the number of input feature graphs by performing 1 × 1 convolution, and then performs 5 × 5 convolution to extract features; the fourth branch is to perform maximum down-sampling of 3 × 3 to improve the translation invariance, and then generate a feature map by using the convolution layer of 1 × 1. The full connection layer can integrate the feature vectors extracted by the previous network layer, then calculate and generate a loss value of the target through a softmax function, and finally classify the target according to the loss value.

Further, after the input layer adjusts 224 × 224 the image size of the acquired traffic sign information, the convolution layer (convolution (1)) performs convolution operation on the image size, the convolution kernel size of the layer is 5 × 5 pixels, the picture boundary compensation pixels 3, and the sliding step size is 2. Then pass through a sampling slice (sampling (1)) with a sampling kernel size of 2 x 2 pixels, picture boundary compensation pixel1, in this specification, sampling mean value sampling, i.e., (pixel1+ pixel2+ pixel3+ pixel4)/4 as the whole pixel value. The principle of the following convolution kernel and sampling process is as described above, with the parameters as in fig. 10. In addition, in the deep neural network, with the increase of the number of network layers, training parameters are exponentially increased, and huge links and parameter amounts are generated, so that on one hand, training is time-consuming, and on the other hand, when the features are extracted from the target, a certain correlation exists between partial features, data redundancy exists, and the over-fitting problem of the classifier is caused. According to the method, by adding the dropout structure, the full connection layer is converted into the sparse link, the weight values of partial links are set to be zero, the training of the model is optimized, and the training speed is increased. Specifically, as shown in fig. 11, the widened layer is composed of 4 branches, and the first branch can be subjected to convolution processing of 1 × 64 dimensions and ReLU nonlinear operation to obtain a convolution layer of 28 × 64 dimensions; the second branch can perform convolution processing of 1 × 96 and the ReLU operation to generate a 28 × 96 dimensional reduced parameter layer, and then perform convolution processing of 3 × 128 to generate a 28 × 96 dimensional convolution layer; the third branch can perform convolution processing of 1 × 16 and the ReLU nonlinear operation to generate convolution layers with dimensions of 28 × 16, and then perform convolution processing of 5 × 32 to generate convolution layers with dimensions of 28 × 32; the fourth branch is to perform a 3 x 3 downsampling operation and then perform a 1 x 32 convolution process to generate a 28 x 32 dimensional convolution layer.

Further, when discrete convolution operation is performed on the two-dimensional image matrix through the convolution layer, the convolution result can be obtained by performing operation according to the following formula:

S(i,j)＝(I*K)(i,j)＝∑_x∑_yI(x,y)K(i-x,j-y)

where I is the input image and K is the convolution kernel.

Further, a softmax function is adopted in an output layer to complete the multi-classification task of the traffic sign. Specifically, under the action of P, the predicted value of the target x has two cases, namely, y is 0 or y is 1, and the expression is as follows:

it is extended to the multi-classification problem of class i (i > 1):

expressed logarithmically:

z_i＝logP(y＝i|x)

and (3) counting the probability of each category to represent the probability as an effective probability distribution problem, and performing normalization processing on the probability to obtain a softmax function:

when the maximization result of the formula is calculated, the exponential term can be effectively counteracted by the logarithm likelihood training, and the calculation complexity is reduced:

log(softmax(z)_i)＝z_i-log∑_jexp(z_j)

then the probability that the sample x is predicted as class i under the action of the parameter θ is:

after the traffic sign classifier is built, the training sample image is normalized, scaled or cut into a scale received by a network input layer, the creation of an input data set is completed, and the input data set is transmitted to the traffic sign classifier for training. And finally, obtaining the optimal network parameters through multiple iterative training to generate the traffic sign classifier.

After the training of the traffic sign classifier is completed, the test analysis can be performed by collecting videos, and specifically, the test content is divided into two parts, namely detection and identification. Firstly, defining parameters of a detection module: the number of correct detections of positive samples tp (number of True positive), the number of missed detections fn (number of false positive), the number of false detections fp (number of false positive), and the number of correct detections of negative samples tn (number of True positive), then:

tpr (truepositive rate) detection rate (recall rate):

precision accuracy:

an embodiment of the present invention further provides a traffic sign identification apparatus, as shown in fig. 12, where the identification apparatus 1 includes:

the image to be recognized acquiring module 101 is used for acquiring an image to be recognized;

an image pyramid generation module 102, configured to preprocess the image to be identified and generate an image pyramid;

a feature extraction module 103, configured to extract an edge feature and a texture feature of each layer of image in the image pyramid respectively;

a traffic sign information obtaining module 104, configured to perform feature association fusion on the edge features and the texture features to obtain traffic sign information in the image to be identified;

a traffic sign identification module 105, configured to perform identification processing based on the traffic sign information and a traffic sign classifier, so as to obtain a traffic sign category corresponding to the traffic sign information;

An embodiment of the present invention further provides an apparatus, which includes a processor and a memory, where the memory stores at least one instruction, at least one program, a code set, or an instruction set, and the at least one instruction, the at least one program, the code set, or the instruction set is loaded and executed by the processor to implement the traffic sign recognition method according to any one of the above.

An embodiment of the present invention further provides a storage medium, which includes a processor and a memory, where the memory stores at least one instruction, at least one program, a code set, or an instruction set, and the at least one instruction, the at least one program, the code set, or the instruction set is loaded and executed by the processor to implement the traffic sign recognition method according to any one of the above items.

It should be noted that: the precedence order of the above embodiments of the present invention is only for description, and does not represent the merits of the embodiments. And specific embodiments thereof have been described above. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the apparatus, system and server embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference may be made to some descriptions of the method embodiments for relevant points.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A traffic sign recognition method, characterized in that the recognition method comprises:

acquiring an image to be identified;

preprocessing the image to be identified and generating an image pyramid;

2. The traffic sign recognition method of claim 1, wherein the obtaining the image to be recognized comprises:

acquiring a video image acquired by a camera;

3. The method of claim 1, wherein preprocessing the image to be recognized and generating an image pyramid comprises:

carrying out gray level processing on the image to be identified;

4. The method of claim 1, wherein the extracting the edge feature and the texture feature of each layer of the image pyramid respectively comprises:

detecting each layer of image in the image pyramid through a detection window;

the pixel value of the current pixel point is obtained,

5. The method of claim 4, wherein obtaining feature information in a neighborhood range according to the pixel values of the current pixel and the neighborhood pixels comprises:

6. The method of claim 1, wherein before the obtaining the image to be recognized, the method further comprises:

acquiring a training sample image marked with a traffic sign category;

preprocessing the training sample image and generating an image pyramid;

7. The traffic sign recognition method of claim 1,

the traffic sign classifier comprises an input layer, a convolution layer, a down-sampling layer, a widening layer, a full-connection layer and an output layer;

the full connection layer is used for extracting output results of all layers;

8. A traffic sign recognition apparatus, comprising:

9. An apparatus comprising a processor and a memory, the memory having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, the at least one instruction, the at least one program, the set of codes, or the set of instructions being loaded and executed by the processor to implement a traffic sign recognition method according to any one of claims 1-7.

10. A storage medium comprising a processor and a memory, the memory having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, the at least one instruction, the at least one program, the set of codes, or the set of instructions being loaded and executed by the processor to implement a traffic sign recognition method according to any one of claims 1-7.