CN111209858B

CN111209858B - Real-time license plate detection method based on deep convolutional neural network

Info

Publication number: CN111209858B
Application number: CN202010009981.1A
Authority: CN
Inventors: 张裕星; 殷光强; 候少麒; 刘春辉; 刘学婷; 李慧萍; 李超
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2020-01-06
Filing date: 2020-01-06
Publication date: 2023-06-20
Anticipated expiration: 2040-01-06
Also published as: CN111209858A

Abstract

The invention relates to the field of computer vision and deep learning, in particular to a real-time license plate detection method based on a deep convolutional neural network, which comprises the following steps: obtaining road surface monitoring images, selecting a plurality of target images meeting requirements, marking the positions of license plates in the target images, dividing the positions of the license plates into a training set, a testing set and a verification set according to preset proportions, carrying out data enhancement on the marked license plates, putting the training set after data enhancement treatment into a deep convolution neural network for training, optimizing a loss function of the whole improved SSD model by combining with Adam, inputting a license plate image into a trained deep convolution neural network structure, carrying out network calculation, outputting matrixes of n. 1*4 and n. 1*2, and carrying out non-maximum suppression to obtain a final prediction result. Therefore, the method has the advantages of time and performance, and the robustness of license plate detection and the generalization capability of the network are effectively improved.

Description

Real-time license plate detection method based on deep convolutional neural network

Technical Field

The invention relates to the field of computer vision and deep learning, in particular to a real-time license plate detection method based on a deep convolutional neural network.

Background

In recent years, with the rapid promotion of economic strength of China, the conservation amount of motor vehicles in China is increased year by year. The continuous increase of the number of motor vehicles also puts higher demands on the management work of motor vehicles in the production and life of a service society, and in order to effectively manage motor vehicles, motor vehicle license plates need to be applied for registration in traffic management departments, in this case, a high-efficiency and accurate license plate recognition system is needed for rapidly acquiring license plate information such as license plate numbers, and thus a license plate recognition system is generated.

License plate detection is an important component of a vehicle recognition system, and the accuracy of license plate detection can directly influence the accuracy of license plate recognition. In the early license plate detection method, the license plate detection is carried out by manually extracting features and training a classifier, but the detection effect of the method is not ideal when the environment changes strongly.

In recent years, the problem of environmental change, incomplete license plates and the like can be slightly adapted to by using Faster-RCNN, SSD, YOLO to detect license plates, but the time consumption, the computing power and the equipment storage consumption are very high.

Recently, CN 106709486A-automatic license plate recognition method based on deep convolutional neural network is disclosed, wherein the characteristics are extracted by adopting a conventional convolutional neural network, and a sliding window method is used for positioning the license plate. Too deep convolutional neural network can cause oversized network parameters, reduce the model speed, and the sliding window method can generate a large number of candidate frames to be predicted, so that the model speed is greatly influenced.

Disclosure of Invention

The invention provides a real-time license plate detection method based on a deep convolutional neural network, which adopts an end-to-end non-cascade structure, can better have the advantages of time and performance, has better adaptability to environmental changes such as illumination, effectively improves the robustness of license plate detection and the generalization capability of the network, greatly reduces false detection and omission detection of license plate detection, and achieves the real-time detection effect in complex scenes.

The invention provides a real-time license plate detection method based on a deep convolutional neural network, which comprises the following steps:

s1, obtaining a road surface monitoring image, selecting a plurality of target images meeting requirements from the road surface monitoring image, marking the positions of license plates in the target images to obtain marked images marked with the positions of the license plates, and dividing the marked images into a training set, a test set and a verification set according to a preset proportion;

s2, carrying out data enhancement on the marked license plate;

s3, constructing a deep convolutional neural network structure, wherein the deep convolutional neural network structure comprises a trunk and a characteristic extraction branch, and the trunk and the characteristic extraction branch totally comprise 13 convolutional layers and 4 pooling layers;

s4, putting the training set after data enhancement processing into a deep convolutional neural network structure for training, optimizing a loss function of the whole improved SSD model by combining with Adam, and checking the speed and accuracy of the improved SSD model by using a verification set after training;

s5, inputting a license plate image into the trained deep convolutional neural network structure, performing network calculation, outputting a matrix of n 1*4 and a matrix of n 1*2, and obtaining a final prediction result through non-maximum suppression of the matrix of n 1*4 and the matrix of n 1*2.

Optionally, the preset proportions of the training set, the test set and the verification set are: 50%:40%:10%.

Optionally, the data enhancement method includes:

s21, enhancing data of colors, including saturation, brightness, exposure, hue and contrast;

s22, performing scale transformation, namely randomly changing the size of a picture which is sent to an improved SSD model for training into a size of 32 times of integer multiple;

s23, angle transformation, wherein the picture randomly rotates 0-10 degrees each time or horizontally overturns or vertically overturns;

s24, random noise interference, wherein some Gaussian noise is randomly overlapped on the basis of an original picture;

s25, randomly blurring interference, on the basis of an original picture, reducing the difference of pixel values to realize picture blurring and smoothing pixels.

Optionally, the process of training the data after the enhancement processing is put into the deep convolutional neural network structure includes:

the method comprises the steps of carrying out partial calculation on input pictures from the size to 300 x 300, wherein the input pictures are equivalent to a matrix with the input dimension of b x3 x 300, b is batch_size, the matrix with the size of b x3 x 300 through a base network VgNet, and recording the result as x;

x is processed by basicRFB_s, and the recorded result is s;

x passes through basicRFB, and the recorded result is x1;

x1 passed through basicRFB, and the recorded result is x2;

x2 passed through basicRFB, the recorded result is x3;

x3 passed through two baseconv, with the recorded result being x4;

x4 passed through two basicConv, with the result recorded as x5.

Optionally, the process of training the data after the data enhancement processing in the deep convolutional neural network structure further comprises:

the s, the x1, the x2, the x3, the x4 and the x5 respectively pass through loc layer to obtain six vectors, and each vector is subjected to cat operation, and view is a matrix of n x 1*4;

and s, x1, x2, x3, x4 and x5 are respectively processed by conf_layer to obtain six vectors, and cat operation is carried out on each vector, and view is a matrix of n x 1*2.

Optionally, the loss function is:

compared with the prior art, the invention has the following beneficial effects:

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a real-time license plate detection method based on a deep convolutional neural network according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of a deep convolutional neural network model according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, but not all embodiments of the present invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.

Examples:

fig. 1 is a flowchart of a real-time license plate detection method based on a deep convolutional neural network according to an embodiment of the present invention, and a specific flow shown in fig. 1 will be described in detail below:

s1, obtaining a road surface monitoring image, selecting a plurality of target images meeting requirements from the road surface monitoring image, marking the positions of license plates in the target images to obtain marked images marked with the positions of the license plates, and dividing the marked images into a training set, a test set and a verification set according to a preset proportion.

S2, carrying out data enhancement on the marked license plate.

S3, constructing a deep convolutional neural network structure, wherein the deep convolutional neural network structure is an end-to-end non-cascade structure and comprises a trunk and a feature extraction branch, and the trunk and the feature extraction branch totally comprise 13 convolutional layers and 4 pooling layers.

S4, putting the training set after data enhancement processing into a deep convolutional neural network structure for training, optimizing a loss function of the whole improved SSD model by combining with Adam, and checking the speed and accuracy of the improved SSD model by using a verification set after training. The SSD model is a classical target detection algorithm, and the embodiment is completed on the basis of the SSD model.

S5, inputting a license plate image into the trained deep convolutional neural network structure, performing network calculation, outputting a matrix of n 1*4 and a matrix of n 1*2, and outputting a matrix of n 1*4 and a matrix of n 1*2 to obtain a final prediction result through non-maximum suppression.

Through S1-S5, the non-cascade structure from end to end is adopted, so that the method has the advantages of time and performance, has better adaptability to environmental changes such as illumination, effectively improves the robustness of license plate detection and the generalization capability of a network, greatly reduces false detection and omission detection of license plate detection, and achieves the real-time detection effect in complex scenes.

In a specific embodiment, in S1, in order to divide the marked image into a training set, a test set and a verification set according to a preset proportion, the following contents may be further included: collecting road monitoring photos of various provinces and cities, and screening 10000 target images meeting the requirements, wherein the target images meeting the requirements are: the license plate shielding proportion is lower than 10% when a large number of overlapped vehicles are not needed, marked images marked with license plate positions are obtained, the marked images are divided into a training set, a testing set and a verification set according to a preset proportion, in the embodiment, the training set can be 50%, the testing set can be 40%, and the verification set can be 10%. The training set is used for training the model, the testing set is used for testing the performance of the trained model, and the verification set is used for controlling parameters of the model in the training process so as to prevent the model from being fitted.

In S2, first, in the implementation process, the tag data is very precious, the number may not reach the requirement that a model meeting the requirement can be trained, at this time, the data enhancement is a very important step, the generalization capability of the model can be effectively improved by the data enhancement, meanwhile, the robustness of the model can be improved, and the performance of the model can be more stable. For data enhancement of the marked license plate, the method can further comprise the following steps:

in this embodiment, the data enhancement has a total of 5 types of methods:

s21, enhancing the data of the color, including aspects of saturation, brightness, exposure, tone, contrast and the like. The color transformation is enhanced, so that the model can be better adapted to the weather, illumination and other unreliability factors in a real scene.

S22, converting the scale, wherein the size of the picture which is fed into the improved SSD model for training is randomly changed into 32 integer multiple sizes, and 10 size choices are respectively 384, 416, 448, 512, 544, 576, 608, 640 and 672. The scale transformation is increased, so that the improved SSD model can be better adapted to videos and pictures with different resolutions and license plates with different sizes.

S23, changing angles, wherein the picture randomly rotates 0-10 degrees each time, or horizontally overturns or vertically overturns. The angle transformation is added, so that the improved SSD model can be better adapted to the real environment.

S24, random noise interference, wherein some Gaussian noise is randomly overlapped on the basis of an original picture.

S25, randomly blurring interference, namely reducing the difference of pixel point values on the basis of an original picture to realize picture blurring, realizing smoothing of pixels, randomly adding interference, and being beneficial to enhancing the anti-interference performance of an improved SSD model on the external environment.

In S4, the process of training the data after the data enhancement processing in the deep convolutional neural network structure includes:

the input pictures resize to 300×300 are equivalent to a matrix with an input dimension of b×3×300×300, b is batch_size, the matrix with b×3×300×300 is partially calculated by a base network VggNet, and a recording result is x, wherein VggNet comprises 13 convolution layers 3*3 and 4 pooling layers.

x is subjected to basicfb_s, and the result is recorded as s. x was passed over basicffb, and the result was recorded as x 1. x1 passed through basicfb, and the result was recorded as x 2. x2 passed through basicfb, and the result was recorded as x 3. x3 passed through two baseconv, with the result of x4 being recorded. x4 passed through two basicConv, with the result recorded as x5.[ s, x1, x2, x3, x4, x5] will be used to calculate the position of the license plate, here the likelihood of the license plate, by conf_layer, where resize represents a scaling, for example: the size of the image is 900 x 900 and is scaled up to 1800 x 1800 or scaled down to 300 x 300.batch_size represents the number of images per batch that are fed into the SSD model for training. Vgnet represents the underlying network. basicffb_s represents the normalized expanded convolutional layer, which is a convolutional neural network module modified. basicRFB, basicConv are all represented as basic expansion convolution layers.

Further, the loc_layer is 6 convolution layers with convolution kernel size 1*1, s, x1, x2, x3, x4, x5 are respectively processed by the loc_layer to obtain six vectors, the six vectors are subjected to cat operation (direct combination, for example, the cat result of two [ [1,1] ] is [ [1,1] ]) and view is a vector of n× 1*4, which corresponds to the prediction of n prediction frames, the dimension 4 represents the central point coordinate deviation of the prediction frames as x, y, and the length and width information of the prediction frames, wherein view represents dimension transformation and cat represents channel fusion.

Further, conf_layer is a convolution layer with 6 convolution kernels and a size of 1*1, s, x1, x2, x3, x4, x5 respectively pass through the convolution layer of 1*1 to obtain six vectors, and the six vectors are subjected to cat operation (direct combination, for example, the cat result of two [ [1,1] ] is [ [1,1] ]) and view is a vector with n x 1*2, which is equivalent to the n prediction frames corresponding to the n prediction frames above, and the confidence of no target and the confidence of license plate of each prediction frame.

In the present embodiment, s, x1, x2, x3, x4, x5 correspond to 6 feature maps, and loc_layer extracts position information from the 6 feature maps, thereby obtaining six vectors, that is, six results.

Similarly, conf_layer extracts position information from 6 feature maps s, x1, x2, x3, x4, and x5, thereby obtaining six vectors, that is, six results.

Further, a picture of batch is subjected to the above process to obtain a matrix of n× 1*4 and a matrix of n× 1*2, and real information of the same dimension can be calculated according to position information of license plates, so that loss values of each training are calculated according to a loss function, network parameters are adjusted through a gradient descent method, a predicted value is infinitely close to a real value, wherein batch represents a training batch, and batch represents a training batch. The batch_size is the size of one batch image.

In a specific embodiment, improvement is made on the SSD model without deepening the depth of the network model, and a special convolution layer is used for replacing a convolution layer with an excessive number of parameters in the SSD while vgg is used as a basic network, so that the accuracy of the model can be improved.

In detail, in the deep convolutional neural network through which the pictures input 300×300 pass, the convolutional neural network having the rfb module structure is used instead of a part of the convolutional neural network in the SSD. Wherein, rfb module structure is: firstly, a convolution layer of 1 multiplied by 1 is used, the number of the featuremap channels is reduced, a bottleck structure is formed on each branch, then an expansion convolution layer is connected, and finally a conventional n multiplied by n convolution layer is connected. By using the expansion convolution layer, the parameter quantity is reduced, the nonlinearity capability of the SSD model is increased, and a shortcut similar to a residual network is designed.

Furthermore, the loss function of the model uses the loss function in the traditional SSD algorithm, comprises the loss for classification and the smoothL1loss for regression, and controls positive and negative samples, so that the optimization speed and the stability of training results can be improved. Wherein smoothL1loss represents a "direct link" or "shortcut", which is a very effective structure that appears in CNN model development, connecting adjacent convolution layers for alleviating gradient divergence.

In a specific embodiment, the training and verification data set and the improved SSD model are used, and the loss function of the whole model is optimized by combining Adam, wherein Adam is a first-order optimization algorithm capable of replacing the traditional randomness descent process, and the neural network weight can be updated iteratively based on training data. The random gradient descent keeps a single learning rate updating all weights, and the learning rate does not change during training. The Adam designs independent adaptive learning rates for different parameters by calculating the first moment estimation and the second moment estimation of the gradient, the learning step length of each iteration parameter has a certain range, the large learning step length cannot be caused by the large gradient, and the value of the parameter is relatively stable.

Wherein, two kinds of loss are altogether used for training, the weight assigned by the two kinds of loss is different, namely a locating part loss and a classifying part, after 9.3 ten thousands (batch size=32) of iterations, the loss of the model is hardly reduced, and the training is further stopped.

In S5, to obtain the final test, the following may be included: inputting a target image into a deep convolutional neural network, reading the image as a matrix of 1 x3 x 300, performing trained network calculation, outputting a matrix of n x 1*4 and a matrix of n x 1*2, wherein n represents n prediction frames and confidence, and performing non-maximum suppression on the matrix of n x 1*4 and the matrix of n x 1*2, so as to obtain a final prediction frame, for example: inputting two hundred images with license plates into a deep convolutional neural network, and testing the positions of license plate frames in the images based on the deep convolutional neural network, so as to determine the number of the images which are tested correctly and the time required by testing each image. Thus, the speed and accuracy of the license plate in the test image can be determined.

In this embodiment, the prediction frame is a plurality of license plate frames for predicting the license plate position obtained according to the result of calculation, each prediction frame corresponds to a probability of determining that the prediction frame (license plate frame) is a license plate, and accordingly, the confidence level can be obtained according to the probability corresponding to the prediction frame.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. The real-time license plate detection method based on the deep convolutional neural network is characterized by comprising the following steps of:

s2, carrying out data enhancement on the marked license plate;

s4, putting the training set after data enhancement processing into a deep convolutional neural network structure for training, optimizing a loss function of the whole improved SSD model by combining with Adam, and checking the speed and accuracy of the improved SSD model by using a verification set after training; the convolution neural network with the overlarge parameter in the SSD model adopts a convolution neural network with an rfb module structure;

the process of training the data in the deep convolutional neural network structure after the data enhancement processing comprises the following steps:

x is processed by basicRFB_s, and the recorded result is s;

x passes through basicRFB, and the recorded result is x1;

x1 passed through basicRFB, and the recorded result is x2;

x2 passed through basicRFB, the recorded result is x3;

x3 passed through two baseconv, with the recorded result being x4;

x4 passed through two basicConv, with the recorded result being x5;

[ s, x1, x2, x3, x4, x5] is used for calculating the position of the license plate by using loc_layer, namely the s, the x1, the x2, the x3, the x4, the x5 respectively obtain six vectors through loc_layer, and each vector is subjected to cat operation, and view is a matrix of n x 1*4; calculating the possibility of the license plate through conf_layer, namely the s, the x1, the x2, the x3, the x4, and the x5 respectively obtain six vectors through conf_layer, and performing cat operation on each vector, wherein view is a matrix of n x 1*2; wherein, resize represents scaling, batch_size represents the number of images sent to the SSD model for training in each batch; basicffb_s represents a normalized expansion convolutional layer, which is an improved convolutional neural network module; basicRFB, basicConv are all represented as basic expansion convolutional layers; loc layer is a convolution layer with a convolution kernel size of 1*1 of 6; conf_layer is a convolution layer with a convolution kernel size of 1*1 of 6;

s5, inputting a license plate image into the trained deep convolutional neural network structure, performing network calculation, outputting a matrix n 1*4 and a matrix n 1*2, and performing non-maximum suppression on the matrix n 1*4 and the matrix n 1*2 to obtain a final prediction result.

2. The method for detecting the real-time license plate based on the deep convolutional neural network according to claim 1, wherein the preset proportions of the training set, the test set and the verification set are as follows: 50%:40%:10%.

3. The method for detecting the license plate in real time based on the deep convolutional neural network according to claim 1, wherein the method for enhancing the data comprises the following steps:

4. The method for detecting the license plate in real time based on the deep convolutional neural network according to claim 1, wherein the loss function is: