CN111597875A - Traffic sign identification method, device, equipment and storage medium - Google Patents

Traffic sign identification method, device, equipment and storage medium Download PDF

Info

Publication number
CN111597875A
CN111597875A CN202010249474.5A CN202010249474A CN111597875A CN 111597875 A CN111597875 A CN 111597875A CN 202010249474 A CN202010249474 A CN 202010249474A CN 111597875 A CN111597875 A CN 111597875A
Authority
CN
China
Prior art keywords
traffic sign
image
layer
feature
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010249474.5A
Other languages
Chinese (zh)
Inventor
许成舜
施亮
张骋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Geely Holding Group Co Ltd
Geely Automobile Research Institute Ningbo Co Ltd
Original Assignee
Zhejiang Geely Holding Group Co Ltd
Geely Automobile Research Institute Ningbo Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Geely Holding Group Co Ltd, Geely Automobile Research Institute Ningbo Co Ltd filed Critical Zhejiang Geely Holding Group Co Ltd
Priority to CN202010249474.5A priority Critical patent/CN111597875A/en
Publication of CN111597875A publication Critical patent/CN111597875A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • G06V20/582Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads of traffic signs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • G06F18/2414Smoothing the distance, e.g. radial basis function networks [RBFN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a traffic sign identification method, which comprises the following steps: acquiring an image to be identified; preprocessing the image to be identified and generating an image pyramid; respectively extracting the edge features and the texture features of each layer of image in the image pyramid; performing feature association fusion on the edge features and the texture features to obtain traffic sign information in the image to be identified; and carrying out identification processing based on the traffic sign information and a traffic sign classifier to obtain a traffic sign category corresponding to the traffic sign information. The invention also discloses a traffic sign recognition device, equipment and a storage medium. By adopting the invention, the position and the size of the traffic sign can be better detected, and the detection capability of the traffic sign area is improved; and the recognition speed can be increased, and the recognition rate is improved.

Description

Traffic sign identification method, device, equipment and storage medium
Technical Field
The present invention relates to image recognition technologies, and in particular, to a method, an apparatus, a device, and a storage medium for recognizing a traffic sign.
Background
At present, the traffic sign detection and calculation method is based on two algorithms: the method is easily influenced by factors such as light, deformation, shielding and the like, and generally has higher error recognition rate. The deep learning algorithm is based on a deep neural network, and through feature extraction and sampling processing layer by layer, the classification performance is strong, but massive data and strong hardware processing capacity are needed, the power consumption is high, and the cost is high.
Disclosure of Invention
In order to solve the above technical problem, in a first aspect, the present invention discloses a traffic sign identification method, where the identification method includes:
acquiring an image to be identified;
preprocessing the image to be identified and generating an image pyramid;
respectively extracting the edge features and the texture features of each layer of image in the image pyramid;
performing feature association fusion on the edge features and the texture features to obtain traffic sign information in the image to be identified;
performing identification processing based on the traffic sign information and a traffic sign classifier to obtain a traffic sign category corresponding to the traffic sign information;
the traffic sign classifier is determined by performing machine learning training based on a traffic sign sample image and a corresponding traffic sign class, and the class corresponding to the traffic sign sample image and the class corresponding to the image to be detected belong to the same type of class.
Further, the acquiring the image to be recognized includes:
acquiring a video image acquired by a camera;
and extracting the video image in alternate lines to obtain the image to be identified.
Further, the preprocessing the image to be recognized and generating the image pyramid includes:
carrying out gray level processing on the image to be identified;
and zooming the image to be recognized after the gray processing according to a preset zooming factor to obtain the image pyramid with a preset number of layers.
Further, the separately extracting the edge feature and the texture feature of each layer of image in the image pyramid includes:
detecting each layer of image in the image pyramid through a detection window;
calculating a feature operator of the current detection window in a sliding manner in the current detection window through a sublayer feature integration unit; the sublayer integration unit comprises a plurality of minimum feature detection units, and each minimum feature detection unit consists of a plurality of pixels;
calculating and generating an edge vector as an edge feature according to the feature operator of each detection window;
the pixel value of the current pixel point is obtained,
acquiring a neighborhood pixel point where the current pixel point is located and a pixel value of the neighborhood pixel point;
acquiring feature information in a neighborhood range according to the pixel value of the current pixel point and the pixel value of the neighborhood pixel point;
and calculating to obtain texture features according to the feature information.
Further, obtaining feature information in a neighborhood range according to the pixel value of the current pixel point and the pixel value of the neighborhood pixel point includes:
acquiring the position information of the current pixel point and the neighborhood pixel point;
and generating a feature histogram of the neighborhood range according to the position information, the pixel value of the current pixel point and the pixel value of the neighborhood pixel point, and taking the feature histogram as feature information.
Further, before the acquiring the image to be recognized, the method further includes:
acquiring a training sample image marked with a traffic sign category;
preprocessing the training sample image and generating an image pyramid;
respectively extracting the edge features and the texture features of each layer of image in the image pyramid;
performing feature association fusion on the edge features and the texture features to obtain traffic sign information in the image to be identified;
taking the traffic sign information as input information of a traffic sign classifier, and taking the traffic sign category as output information of the traffic sign classifier;
comparing the output information of the traffic sign classifier with the training sample image;
and training and generating the traffic sign classifier according to the comparison result.
Further, the traffic sign classifier comprises an input layer, a convolution layer, a down-sampling layer, a widening layer, a full-connection layer and an output layer;
the input layer is used for acquiring the traffic sign information and generating a two-dimensional image matrix after adjusting the size of the traffic sign information to a preset size;
the convolution layer is used for performing discrete convolution operation on the two-dimensional image matrix to obtain a convolution result;
the down-sampling layer is used for selecting the pixel value of a certain pixel point in the pooling domain as the whole pixel value of the pooling domain area;
the broadening layer is used for increasing the network width of the traffic sign classifier;
the full connection layer is used for extracting output results of all layers;
and the output layer counts the probability of each traffic sign category through a softmax function to obtain the category of the traffic sign.
In a second aspect, a traffic sign recognition apparatus, the recognition apparatus comprising:
the image to be recognized acquisition module is used for acquiring an image to be recognized;
the image pyramid generation module is used for preprocessing the image to be identified and generating an image pyramid;
the characteristic extraction module is used for respectively extracting the edge characteristic and the texture characteristic of each layer of image in the image pyramid;
the traffic sign information acquisition module is used for performing feature association fusion on the edge features and the texture features to obtain traffic sign information in the image to be identified;
the traffic sign identification module is used for carrying out identification processing based on the traffic sign information and a traffic sign classifier to obtain a traffic sign category corresponding to the traffic sign information;
the traffic sign classifier is determined by performing machine learning training based on a traffic sign sample image and a corresponding traffic sign class, and the class corresponding to the traffic sign sample image and the class corresponding to the image to be detected belong to the same type of class.
In a third aspect, the present invention provides an apparatus comprising a processor and a memory, the memory having stored therein at least one instruction, at least one program, set of codes, or set of instructions, which is loaded and executed by the processor to implement a method of identifying a traffic sign according to any one of the preceding claims.
In a fourth aspect, the present invention provides a storage medium comprising a processor and a memory, the memory having stored therein at least one instruction, at least one program, set of codes, or set of instructions, which is loaded and executed by the processor to implement a method of identifying a traffic sign according to any one of the preceding claims.
By adopting the technical scheme, the invention has the following beneficial effects: after the edge features and the texture features are fused, the position and the size of the traffic sign can be better detected, and the detection capability of a traffic sign area is improved. In addition, the deep network has strong capability of extracting features, has good generalization capability on fuzzy images, adhered characters, incomplete structures and other targets, and has strong recognition capability. Therefore, the recognition speed is increased, and the recognition rate is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic flow chart of a traffic sign identification method according to an embodiment of the present invention;
fig. 2 is a schematic flowchart of a process of acquiring an image to be recognized according to an embodiment of the present invention;
fig. 3 is a schematic flowchart of processing an image to be recognized according to an embodiment of the present invention;
fig. 4 is a schematic flowchart of a process for extracting edge features according to an embodiment of the present invention;
fig. 5 is a schematic flow chart of extracting texture features according to an embodiment of the present invention;
fig. 6 is a schematic flowchart of the implementation of step S323 provided in the embodiment of the present invention;
fig. 7 is a diagram of a cell feature structure according to an embodiment of the present invention;
fig. 8 is a cell feature histogram according to an embodiment of the present invention;
FIG. 9 is a flowchart illustrating a method for training a traffic sign classifier according to an embodiment of the present invention;
FIG. 10 is a diagram illustrating a deep neural network according to an embodiment of the present invention;
fig. 11 is a schematic diagram of a module of a widened layer network layer according to an embodiment of the present invention;
fig. 12 is a schematic structural diagram of a traffic sign recognition apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
The following describes an embodiment of a traffic sign recognition method according to the present invention, and fig. 1 is a schematic flow chart of a traffic sign recognition method according to an embodiment of the present invention, which provides the method operation steps described in the embodiment or the flow chart, but may include more or less operation steps based on conventional or non-inventive labor. The order of steps recited in the embodiments is merely one manner of performing the steps in a multitude of orders and does not represent the only order of execution. As shown in fig. 1, the traffic sign recognition method may include:
s100: and acquiring an image to be identified.
In particular, the invention can be applied to the identification of traffic signs in the surrounding environment during the driving of a vehicle. When the method is applied to vehicle driving, the image to be recognized can be acquired by a camera arranged on the vehicle, and specifically, as shown in fig. 2, the method may include the following steps:
s110: acquiring a video image acquired by a camera;
s120: and extracting the video image in alternate lines to obtain the image to be identified.
In specific implementation, the extraction is carried out through alternate lines, so that the interference caused by image shaking can be reduced, and the reliability of the image to be identified is improved.
In addition, the video image may also be scaled to a uniform resolution before or after the interlaced decimation of the video image. For example, in the stage of detecting the traffic sign, the video image collected by the front-looking camera is scaled to the original image with the resolution of 1280 × 720 until the resolution is 640 × 360, and the video image is extracted by every two interlaced pixels to obtain the image to be identified.
S200: and preprocessing the image to be identified and generating an image pyramid.
An image pyramid is a kind of multi-scale representation of an image, which is a structure that interprets an image in multiple resolutions. Typically, a pyramid of an image is a series of image sets of progressively lower resolution arranged in a pyramid shape and derived from the same original image. In particular implementation, as shown in fig. 3, the image to be recognized may be preprocessed and the image pyramid may be generated by:
s210: and carrying out gray processing on the image to be identified.
In an embodiment of the present invention, a histogram of gradient directions may be used for traffic sign identification. The gray level preprocessing is carried out on the image to be recognized, so that unnecessary color information can be removed, the data processing amount is reduced, and the performance is improved. In specific implementation, the image to be identified, which is zoomed to a uniform size, can be subjected to gray scale conversion, specifically, three-channel RGB images can be converted into a gray scale image of a single channel, and the gray scale value range is 0-255.
S220: and zooming the image to be recognized after the gray processing according to a preset zooming factor to obtain the image pyramid with a preset number of layers.
In specific implementation, the preset scaling factor may be one scaling value or a plurality of scaling values, the image to be recognized is scaled according to the preset scaling factor, and the resolution is gradually reduced as the number of layers increases. For example, the scaling factor is 1.15, the number of scaled layers is 5, and multi-scale scaling is performed on the grayscale image with the resolution of 640 × 360, so that 5 scaled image golden sub-towers of the first layer size 556 × 313, the second layer size 483 × 272, the third layer size 420 × 236, the fourth layer size 365 × 205, and the fifth layer size 317 × 178 can be obtained.
In specific implementation, histogram equalization can be performed on each scale image to enhance contrast, and pixel filling is performed on image boundaries to avoid losing image edge information.
S300: and respectively extracting the edge features and the texture features of each layer of image in the image pyramid.
In particular implementation, as shown in fig. 4, the edge feature may be extracted by:
s311: detecting each layer of image in the image pyramid through a detection window;
s312: calculating a feature operator of the current detection window in a sliding manner in the current detection window through a sublayer feature integration unit; the sublayer integration unit comprises a plurality of minimum feature detection units, and each minimum feature detection unit consists of a plurality of pixels;
s313: and calculating and generating an edge vector as an edge feature according to the feature operator of each detection window.
In practical implementation, a minimum feature detection unit cell is designed, wherein the cell is composed of a plurality of pixels, and the shape of the cell is square; and designing a sub-layer feature integration unit block, wherein the block generally comprises a plurality of cells, each block slides in a view finding window win, and feature operators collected by all blocks are integrated to form a group of edge vectors capable of describing target features, so that the edge features are extracted. The edge feature can be calculated by the following method, and if the pixel value of the current pixel point (x, y) is pixel (x, y), then:
horizontal edge: granentx=pixel(x+1,y)-pixel(x-1,y)
Vertical edge: granenty=pixel(x,y+1)-pixel(x,y-1)
Modulus of the edge of the spot: granent2 x+grandent2 y
Orientation of the point edge: tan (r) is-1(grandenty/grandentx)
The calculation formula of the edge vector dimension is as follows:
N=((winsize-block)/step+1)2*(block/cell)2*bin
wherein winsize is the size of the detection window, block is the size of the characteristic block, step is the sliding step length, cell is the size of the cell, and bin is the number of angle separation intervals.
In particular implementation, as shown in fig. 5, the texture features may be extracted by:
s321: and acquiring the pixel value of the current pixel point.
S322: and acquiring a neighborhood pixel point where the current pixel point is located and the pixel value of the neighborhood pixel point.
S323: and acquiring feature information in a neighborhood range according to the pixel value of the current pixel point and the pixel value of the neighborhood pixel point.
In a specific implementation, as shown in fig. 6, the step S323 may include the following steps:
s3231: acquiring the position information of the current pixel point and the neighborhood pixel point;
s3232: and generating a feature histogram of the neighborhood range according to the position information, the pixel value of the current pixel point and the pixel value of the neighborhood pixel point, and taking the feature histogram as feature information.
S324: and calculating to obtain texture features according to the feature information.
In actual implementation, when the pixel value of the current pixel (x, y) is pixel (x, y), the texture feature extraction method of the point may be that the pixel value of pixel (x, y) is set as a threshold T, and the pixel values of m feature pixels with a neighborhood distance of d are compared with the threshold T to obtain a weight wi. If the value is larger than the threshold value T, reserving and setting the value as 1; if the value is smaller than the threshold value, the value is discarded and set as 0, and the calculation method is as follows:
Figure BDA0002434956550000081
then using a table look-up method, wherein the value of each adjacent pixel point is a multiple relation of 2, namely 2 is sequentially arranged from left to right and from top to bottom in the image coordinate system0、21、…、2n-1And n is the total number of the pixel points in the cell. And accumulating the binary values of the corresponding positions compared with the T to obtain a texture feature of the point, wherein the calculation formula is as follows:
Figure BDA0002434956550000082
by the method, each point in the cell is sequentially traversed and counted to obtain a feature statistical histogram of the cell, and then the feature statistical histograms of the cells in the view window are integrated to obtain the overall texture feature. When the neighborhood is d equal to 2 and the feature pixel m is 8, the cell feature structure diagram is shown in fig. 7.
As shown in FIG. 8, 2 in the feature histogram of a cell24The characteristic has large data volume, is not beneficial to characteristic expression and training, and can reduce the dimension of the characteristic by adopting a method that the binary jump times are not more than 2. Specifically, the original binary number in the feature histogram is cyclically shifted once, and the xor operation of each bit is performed to calculate the number of "1" bits in the xor result. For example, 8 is the source code 00000001, 10000000 after cyclic shift, 10000001 after xor, and the number of "1" in the code is 2, with the following result: the xor result is equal to 2 codes, each of which has 8 × 7 codes as one feature; 2 codes with the exclusive or result equal to 0; and other code words are uniformly classified into one class. Then for 8-ary codes there are 8 x 7+2+1 classes. For the 24 codes of the present invention, there are 24 × 23+2+1 classes, which are much smaller than 224
S400: and performing feature association fusion on the edge features and the texture features to obtain the traffic sign information in the image to be identified.
In specific implementation, the edge features and the texture features can be subjected to pixel feature cascade fusion of feature maps, so that the traffic sign is highlightedThe detection area of (1). And carrying out normalization processing to count the characteristic vectors of the detection window, and sending the characteristic vectors into a traffic sign classifier for classification, thereby judging whether the traffic sign exists in the image to be identified, and if so, outputting the specific position coordinate and size of the traffic sign in the image. For example, its coordinate at the upper left corner in the image is tlAnd (x, y) has a width of width and a height of height. And if the traffic sign does not exist, jumping to the next frame of the video to continue detection.
The traffic sign is detected by adopting the method, the feature after the fusion of the texture and the edge is extracted through the binary local feature, and the texture feature emphasizes on describing the details of the target, particularly the warning sign; the edge features focus on describing target contour information, such as the gradient features of circles and triangles; when the edge features are interfered, such as fading, shadow interference, glare and the like of the traffic sign board, under the conditions, the detection effect is poor, and after the texture fusion features are added for fusion, the position and the size of the traffic sign are better detected compared with a single texture feature or edge feature method, and the detection capability of a traffic sign area is improved.
S500: performing identification processing based on the traffic sign information and a traffic sign classifier to obtain a traffic sign category corresponding to the traffic sign information;
the traffic sign classifier is determined by performing machine learning training based on a traffic sign sample image and a corresponding traffic sign class, and the class corresponding to the traffic sign sample image and the class corresponding to the image to be detected belong to the same type of class.
In some possible embodiments, before acquiring the image to be identified, as shown in fig. 9, the method further includes:
s010: and acquiring a training sample image marked with a traffic sign category.
The training sample image may include a training sample, a verification sample and a test sample, the training sample is used for network learning target class characteristics, the verification sample is used for verifying learning effects and optimizing network learned parameters, and the test sample is used for detecting classification effects of the generated model. In specific implementation, a video acquisition sample recorded by a front-view camera of a test vehicle can be acquired, the recorded video is uploaded to a workstation, whether a traffic sign appears in the video or not is observed, and an image of the appearing traffic sign is intercepted to serve as original sample data. The original sample can be expanded by methods of rotation, inclination, scale transformation, artificial noise increase and the like, so that the richness of the sample set is improved, and the generalization of the algorithm is improved. Specifically, for a video with a frame rate of 30, during the occurrence of a traffic sign, a sample is captured every 10 frames to make 10000 traffic sign samples of 3 types, wherein the training set, the verification set and the test set may respectively account for 70%, 15% and 15% of the total weight. The expanded sample set can be used for target detection and classification at the same time, one part is used for gradient direction feature extraction and support classifier (SVM) training, and the other part is used for training of a deep learning network model in a classification method.
S020: and preprocessing the training sample image and generating an image pyramid.
S030: and respectively extracting the edge features and the texture features of each layer of image in the image pyramid.
S040: and performing feature association fusion on the edge features and the texture features to obtain the traffic sign information in the image to be identified.
S050: and taking the traffic sign information as input information of a traffic sign classifier, and taking the traffic sign category as output information of the traffic sign classifier.
S060: and comparing the output information of the traffic sign classifier with the training sample image.
S070: and training and generating the traffic sign classifier according to the comparison result.
In some possible embodiments, the traffic sign classifier includes an input layer, a convolutional layer, a downsampling layer, a topology layer, a fully-connected layer, and an output layer;
the input layer is used for acquiring the traffic sign information and generating a two-dimensional image matrix after adjusting the size of the traffic sign information to a preset size;
the convolution layer is used for performing discrete convolution operation on the two-dimensional image matrix to obtain a convolution result;
the down-sampling layer is used for selecting the pixel value of a certain pixel point in the pooling domain as the whole pixel value of the pooling domain area;
the broadening layer is used for increasing the network width of the traffic sign classifier;
the full connection layer is used for extracting output results of all layers;
and the output layer counts the probability of each traffic sign category through a softmax function to obtain the category of the traffic sign.
The traffic sign classifier is obtained by constructing a deep classification neural network and training, the deep classification neural network can extract the characteristics of the content of the detected traffic sign, the deep network has strong characteristic extraction capability, has good generalization capability on fuzzy images, adhesive characters, incomplete structures and other targets, and has strong recognition capability.
In some possible embodiments, a deep neural network of 3 convolutional layers (volumes), 5 downsampling layers (Down-firing), 9 extension layers (inclusion), and 1 fully connected layers (fully connected layers) may be constructed. And the layer depth of the widening layer is 2, so that the deep neural network with 21 network layers is constructed. The input layer zooms the acquired traffic sign information to a uniform size and then sends the traffic sign information into a neural network for layer-by-layer feature extraction, the feature of the previous layer of image is extracted by a convolutional layer, the weight is reduced by a downsampling layer, network parameters are simplified to prevent overfitting, feature vectors are integrated by a full-connection layer and transmitted to an activation function for processing, and finally classification is achieved. Specifically, the convolutional layer extracts image features through convolution of a convolution kernel and an input image, and the extracted features are higher as the depth of a network increases, so that target characteristics can be expressed more abstractly. The downsampling layer represents the overall pixel value of a certain pixel in the region by downsampling the value of the pixel in the region. For example, sampling can be performed through maximum downsampling, wherein the maximum downsampling is to use the value of the maximum pixel point in the region to represent the region characteristic, so that other points with small pixel values are omitted, the contribution of other pixel points to a subsequent link layer is omitted, the number of link weights is reduced, the calculation efficiency is improved, and the image translation invariance capability expressed by the characteristic can be increased; the sampling may also be performed by averaging the samples. The broadening layer can increase the network width and reduce training parameters at the same time, and the network structure is sparse. For example, the broadening layer may consist of 4 branches, the first branch 1 × 1 convolution layer, extracting features, reducing the number of channels; the second branch firstly performs 1 × 1 convolution to reduce the number of input feature maps, and then performs 3 × 3 convolution to extract features; the third branch also reduces the number of input feature graphs by performing 1 × 1 convolution, and then performs 5 × 5 convolution to extract features; the fourth branch is to perform maximum down-sampling of 3 × 3 to improve the translation invariance, and then generate a feature map by using the convolution layer of 1 × 1. The full connection layer can integrate the feature vectors extracted by the previous network layer, then calculate and generate a loss value of the target through a softmax function, and finally classify the target according to the loss value.
Further, after the input layer adjusts 224 × 224 the image size of the acquired traffic sign information, the convolution layer (convolution (1)) performs convolution operation on the image size, the convolution kernel size of the layer is 5 × 5 pixels, the picture boundary compensation pixels 3, and the sliding step size is 2. Then pass through a sampling slice (sampling (1)) with a sampling kernel size of 2 x 2 pixels, picture boundary compensation pixel1, in this specification, sampling mean value sampling, i.e., (pixel1+ pixel2+ pixel3+ pixel4)/4 as the whole pixel value. The principle of the following convolution kernel and sampling process is as described above, with the parameters as in fig. 10. In addition, in the deep neural network, with the increase of the number of network layers, training parameters are exponentially increased, and huge links and parameter amounts are generated, so that on one hand, training is time-consuming, and on the other hand, when the features are extracted from the target, a certain correlation exists between partial features, data redundancy exists, and the over-fitting problem of the classifier is caused. According to the method, by adding the dropout structure, the full connection layer is converted into the sparse link, the weight values of partial links are set to be zero, the training of the model is optimized, and the training speed is increased. Specifically, as shown in fig. 11, the widened layer is composed of 4 branches, and the first branch can be subjected to convolution processing of 1 × 64 dimensions and ReLU nonlinear operation to obtain a convolution layer of 28 × 64 dimensions; the second branch can perform convolution processing of 1 × 96 and the ReLU operation to generate a 28 × 96 dimensional reduced parameter layer, and then perform convolution processing of 3 × 128 to generate a 28 × 96 dimensional convolution layer; the third branch can perform convolution processing of 1 × 16 and the ReLU nonlinear operation to generate convolution layers with dimensions of 28 × 16, and then perform convolution processing of 5 × 32 to generate convolution layers with dimensions of 28 × 32; the fourth branch is to perform a 3 x 3 downsampling operation and then perform a 1 x 32 convolution process to generate a 28 x 32 dimensional convolution layer.
Further, when discrete convolution operation is performed on the two-dimensional image matrix through the convolution layer, the convolution result can be obtained by performing operation according to the following formula:
S(i,j)=(I*K)(i,j)=∑xyI(x,y)K(i-x,j-y)
where I is the input image and K is the convolution kernel.
Further, a softmax function is adopted in an output layer to complete the multi-classification task of the traffic sign. Specifically, under the action of P, the predicted value of the target x has two cases, namely, y is 0 or y is 1, and the expression is as follows:
Figure BDA0002434956550000121
it is extended to the multi-classification problem of class i (i > 1):
Figure BDA0002434956550000122
expressed logarithmically:
zi=logP(y=i|x)
and (3) counting the probability of each category to represent the probability as an effective probability distribution problem, and performing normalization processing on the probability to obtain a softmax function:
Figure BDA0002434956550000123
when the maximization result of the formula is calculated, the exponential term can be effectively counteracted by the logarithm likelihood training, and the calculation complexity is reduced:
log(softmax(z)i)=zi-log∑jexp(zj)
then the probability that the sample x is predicted as class i under the action of the parameter θ is:
Figure BDA0002434956550000124
after the traffic sign classifier is built, the training sample image is normalized, scaled or cut into a scale received by a network input layer, the creation of an input data set is completed, and the input data set is transmitted to the traffic sign classifier for training. And finally, obtaining the optimal network parameters through multiple iterative training to generate the traffic sign classifier.
After the training of the traffic sign classifier is completed, the test analysis can be performed by collecting videos, and specifically, the test content is divided into two parts, namely detection and identification. Firstly, defining parameters of a detection module: the number of correct detections of positive samples tp (number of True positive), the number of missed detections fn (number of false positive), the number of false detections fp (number of false positive), and the number of correct detections of negative samples tn (number of True positive), then:
tpr (truepositive rate) detection rate (recall rate):
Figure BDA0002434956550000131
precision accuracy:
Figure BDA0002434956550000132
an embodiment of the present invention further provides a traffic sign identification apparatus, as shown in fig. 12, where the identification apparatus 1 includes:
the image to be recognized acquiring module 101 is used for acquiring an image to be recognized;
an image pyramid generation module 102, configured to preprocess the image to be identified and generate an image pyramid;
a feature extraction module 103, configured to extract an edge feature and a texture feature of each layer of image in the image pyramid respectively;
a traffic sign information obtaining module 104, configured to perform feature association fusion on the edge features and the texture features to obtain traffic sign information in the image to be identified;
a traffic sign identification module 105, configured to perform identification processing based on the traffic sign information and a traffic sign classifier, so as to obtain a traffic sign category corresponding to the traffic sign information;
the traffic sign classifier is determined by performing machine learning training based on a traffic sign sample image and a corresponding traffic sign class, and the class corresponding to the traffic sign sample image and the class corresponding to the image to be detected belong to the same type of class.
An embodiment of the present invention further provides an apparatus, which includes a processor and a memory, where the memory stores at least one instruction, at least one program, a code set, or an instruction set, and the at least one instruction, the at least one program, the code set, or the instruction set is loaded and executed by the processor to implement the traffic sign recognition method according to any one of the above.
An embodiment of the present invention further provides a storage medium, which includes a processor and a memory, where the memory stores at least one instruction, at least one program, a code set, or an instruction set, and the at least one instruction, the at least one program, the code set, or the instruction set is loaded and executed by the processor to implement the traffic sign recognition method according to any one of the above items.
It should be noted that: the precedence order of the above embodiments of the present invention is only for description, and does not represent the merits of the embodiments. And specific embodiments thereof have been described above. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the apparatus, system and server embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference may be made to some descriptions of the method embodiments for relevant points.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (10)

1. A traffic sign recognition method, characterized in that the recognition method comprises:
acquiring an image to be identified;
preprocessing the image to be identified and generating an image pyramid;
respectively extracting the edge features and the texture features of each layer of image in the image pyramid;
performing feature association fusion on the edge features and the texture features to obtain traffic sign information in the image to be identified;
performing identification processing based on the traffic sign information and a traffic sign classifier to obtain a traffic sign category corresponding to the traffic sign information;
the traffic sign classifier is determined by performing machine learning training based on a traffic sign sample image and a corresponding traffic sign class, and the class corresponding to the traffic sign sample image and the class corresponding to the image to be detected belong to the same type of class.
2. The traffic sign recognition method of claim 1, wherein the obtaining the image to be recognized comprises:
acquiring a video image acquired by a camera;
and extracting the video image in alternate lines to obtain the image to be identified.
3. The method of claim 1, wherein preprocessing the image to be recognized and generating an image pyramid comprises:
carrying out gray level processing on the image to be identified;
and zooming the image to be recognized after the gray processing according to a preset zooming factor to obtain the image pyramid with a preset number of layers.
4. The method of claim 1, wherein the extracting the edge feature and the texture feature of each layer of the image pyramid respectively comprises:
detecting each layer of image in the image pyramid through a detection window;
calculating a feature operator of the current detection window in a sliding manner in the current detection window through a sublayer feature integration unit; the sublayer integration unit comprises a plurality of minimum feature detection units, and each minimum feature detection unit consists of a plurality of pixels;
calculating and generating an edge vector as an edge feature according to the feature operator of each detection window;
the pixel value of the current pixel point is obtained,
acquiring a neighborhood pixel point where the current pixel point is located and a pixel value of the neighborhood pixel point;
acquiring feature information in a neighborhood range according to the pixel value of the current pixel point and the pixel value of the neighborhood pixel point;
and calculating to obtain texture features according to the feature information.
5. The method of claim 4, wherein obtaining feature information in a neighborhood range according to the pixel values of the current pixel and the neighborhood pixels comprises:
acquiring the position information of the current pixel point and the neighborhood pixel point;
and generating a feature histogram of the neighborhood range according to the position information, the pixel value of the current pixel point and the pixel value of the neighborhood pixel point, and taking the feature histogram as feature information.
6. The method of claim 1, wherein before the obtaining the image to be recognized, the method further comprises:
acquiring a training sample image marked with a traffic sign category;
preprocessing the training sample image and generating an image pyramid;
respectively extracting the edge features and the texture features of each layer of image in the image pyramid;
performing feature association fusion on the edge features and the texture features to obtain traffic sign information in the image to be identified;
taking the traffic sign information as input information of a traffic sign classifier, and taking the traffic sign category as output information of the traffic sign classifier;
comparing the output information of the traffic sign classifier with the training sample image;
and training and generating the traffic sign classifier according to the comparison result.
7. The traffic sign recognition method of claim 1,
the traffic sign classifier comprises an input layer, a convolution layer, a down-sampling layer, a widening layer, a full-connection layer and an output layer;
the input layer is used for acquiring the traffic sign information and generating a two-dimensional image matrix after adjusting the size of the traffic sign information to a preset size;
the convolution layer is used for performing discrete convolution operation on the two-dimensional image matrix to obtain a convolution result;
the down-sampling layer is used for selecting the pixel value of a certain pixel point in the pooling domain as the whole pixel value of the pooling domain area;
the broadening layer is used for increasing the network width of the traffic sign classifier;
the full connection layer is used for extracting output results of all layers;
and the output layer counts the probability of each traffic sign category through a softmax function to obtain the category of the traffic sign.
8. A traffic sign recognition apparatus, comprising:
the image to be recognized acquisition module is used for acquiring an image to be recognized;
the image pyramid generation module is used for preprocessing the image to be identified and generating an image pyramid;
the characteristic extraction module is used for respectively extracting the edge characteristic and the texture characteristic of each layer of image in the image pyramid;
the traffic sign information acquisition module is used for performing feature association fusion on the edge features and the texture features to obtain traffic sign information in the image to be identified;
the traffic sign identification module is used for carrying out identification processing based on the traffic sign information and a traffic sign classifier to obtain a traffic sign category corresponding to the traffic sign information;
the traffic sign classifier is determined by performing machine learning training based on a traffic sign sample image and a corresponding traffic sign class, and the class corresponding to the traffic sign sample image and the class corresponding to the image to be detected belong to the same type of class.
9. An apparatus comprising a processor and a memory, the memory having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, the at least one instruction, the at least one program, the set of codes, or the set of instructions being loaded and executed by the processor to implement a traffic sign recognition method according to any one of claims 1-7.
10. A storage medium comprising a processor and a memory, the memory having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, the at least one instruction, the at least one program, the set of codes, or the set of instructions being loaded and executed by the processor to implement a traffic sign recognition method according to any one of claims 1-7.
CN202010249474.5A 2020-04-01 2020-04-01 Traffic sign identification method, device, equipment and storage medium Pending CN111597875A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010249474.5A CN111597875A (en) 2020-04-01 2020-04-01 Traffic sign identification method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010249474.5A CN111597875A (en) 2020-04-01 2020-04-01 Traffic sign identification method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN111597875A true CN111597875A (en) 2020-08-28

Family

ID=72190481

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010249474.5A Pending CN111597875A (en) 2020-04-01 2020-04-01 Traffic sign identification method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111597875A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112668608A (en) * 2020-12-04 2021-04-16 北京达佳互联信息技术有限公司 Image identification method and device, electronic equipment and storage medium
CN112712066A (en) * 2021-01-19 2021-04-27 腾讯科技(深圳)有限公司 Image recognition method and device, computer equipment and storage medium
CN115147672A (en) * 2021-03-31 2022-10-04 广东高云半导体科技股份有限公司 Artificial intelligence system and method for identifying object types

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104573707A (en) * 2014-12-17 2015-04-29 安徽清新互联信息科技有限公司 Vehicle license plate Chinese character recognition method based on multi-feature fusion
WO2016155371A1 (en) * 2015-03-31 2016-10-06 百度在线网络技术(北京)有限公司 Method and device for recognizing traffic signs
CN108875454A (en) * 2017-05-11 2018-11-23 比亚迪股份有限公司 Traffic sign recognition method, device and vehicle
CN110659550A (en) * 2018-06-29 2020-01-07 比亚迪股份有限公司 Traffic sign recognition method, traffic sign recognition device, computer equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104573707A (en) * 2014-12-17 2015-04-29 安徽清新互联信息科技有限公司 Vehicle license plate Chinese character recognition method based on multi-feature fusion
WO2016155371A1 (en) * 2015-03-31 2016-10-06 百度在线网络技术(北京)有限公司 Method and device for recognizing traffic signs
CN108875454A (en) * 2017-05-11 2018-11-23 比亚迪股份有限公司 Traffic sign recognition method, device and vehicle
CN110659550A (en) * 2018-06-29 2020-01-07 比亚迪股份有限公司 Traffic sign recognition method, traffic sign recognition device, computer equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
曲仕茹;李桃;: "基于改进CoHOG-LQC的行人检测算法", 西北工业大学学报, no. 02 *
李红娣: ""采用金字塔纹理和边缘特征的图像烟雾检测"", 《中国图像图形学报》, 30 June 2015 (2015-06-30), pages 772 - 780 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112668608A (en) * 2020-12-04 2021-04-16 北京达佳互联信息技术有限公司 Image identification method and device, electronic equipment and storage medium
CN112668608B (en) * 2020-12-04 2024-03-15 北京达佳互联信息技术有限公司 Image recognition method and device, electronic equipment and storage medium
CN112712066A (en) * 2021-01-19 2021-04-27 腾讯科技(深圳)有限公司 Image recognition method and device, computer equipment and storage medium
CN115147672A (en) * 2021-03-31 2022-10-04 广东高云半导体科技股份有限公司 Artificial intelligence system and method for identifying object types

Similar Documents

Publication Publication Date Title
CN112418117B (en) Small target detection method based on unmanned aerial vehicle image
CN109977943B (en) Image target recognition method, system and storage medium based on YOLO
CN108961235B (en) Defective insulator identification method based on YOLOv3 network and particle filter algorithm
CN109800824B (en) Pipeline defect identification method based on computer vision and machine learning
CN107609549B (en) Text detection method for certificate image in natural scene
CN110414507B (en) License plate recognition method and device, computer equipment and storage medium
CN107633226B (en) Human body motion tracking feature processing method
CN111310861A (en) License plate recognition and positioning method based on deep neural network
US7983486B2 (en) Method and apparatus for automatic image categorization using image texture
CN111275082A (en) Indoor object target detection method based on improved end-to-end neural network
US20070058856A1 (en) Character recoginition in video data
CN113160192A (en) Visual sense-based snow pressing vehicle appearance defect detection method and device under complex background
CN111597875A (en) Traffic sign identification method, device, equipment and storage medium
CN110659550A (en) Traffic sign recognition method, traffic sign recognition device, computer equipment and storage medium
CN109934216B (en) Image processing method, device and computer readable storage medium
CN111833322B (en) Garbage multi-target detection method based on improved YOLOv3
CN110781882A (en) License plate positioning and identifying method based on YOLO model
CN111898621A (en) Outline shape recognition method
CN110991444A (en) Complex scene-oriented license plate recognition method and device
CN110807362A (en) Image detection method and device and computer readable storage medium
CN113221956B (en) Target identification method and device based on improved multi-scale depth model
CN115131325A (en) Breaker fault operation and maintenance monitoring method and system based on image recognition and analysis
CN112132151A (en) Image character recognition system and method based on recurrent neural network recognition algorithm
CN110689003A (en) Low-illumination imaging license plate recognition method and system, computer equipment and storage medium
CN114299383A (en) Remote sensing image target detection method based on integration of density map and attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination