CN111368830B

CN111368830B - License plate detection and recognition method based on multi-video frame information and kernel correlation filtering algorithm

Info

Publication number: CN111368830B
Application number: CN202010138492.6A
Authority: CN
Inventors: 王�琦; 袁媛; 芦肖城
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2020-03-03
Filing date: 2020-03-03
Publication date: 2024-02-27
Anticipated expiration: 2040-03-03
Also published as: CN111368830A

Abstract

The invention provides a license plate detection and recognition method based on multi-video frame information and a nuclear phase optical filtering algorithm. Different deep learning models are respectively constructed to detect license plate regions of single-frame images and identify license plate information, and multi-frame information tracking fusion is carried out by utilizing a kernel correlation filtering algorithm, so that the accuracy of license plate identification and positioning can be improved, the calculation efficiency is high, and the method can be used for real-time processing.

Description

License plate detection and recognition method based on multi-video frame information and kernel correlation filtering algorithm

Technical Field

The invention belongs to the field of intelligent traffic systems, and particularly relates to a license plate detection and identification method based on multi-video frame information and a kernel correlation filtering algorithm.

Background

In recent years, with rapid development of social economy, intelligent transportation systems (Intelligence Transport System, ITS) have gradually become a subject of intense research in the world transportation control and management field. The license plate of an automobile serves as an "identity card" for the vehicle, and similar human fingerprints can be used to uniquely determine the identity of the vehicle. License plate recognition (License Plate Recognition, LPR) systems are an important element in vehicle detection systems, and the process of the system is as follows: firstly, the position of a license plate in a picture is obtained through a license plate detection technology, and then the license plate number is identified through an optical character recognition technology (OCR) to obtain the license plate number of a vehicle in the picture. The conventional license plate character recognition method is generally divided into two parts, namely character segmentation and character recognition.

If complex environmental background is not considered, the traditional license plate recognition method, such as recognition technology based on template matching and character features, can be used for detecting the license plate, and problems of complex background, blurred license plate, inclined angle and the like often exist in an actual image, and the current mainstream method adopts a deep neural network to learn the features of the image. The learning capacity of the convolutional neural network is improved by researching parameter optimization of a deep learning model, such as a convolutional kernel, a network structure, network depth, an activation function, an optimization function and the like, and utilizing a high-performance server to accelerate a GPU (graphics processing unit). The license plate recognition method can be divided into two parts, namely license plate detection and license plate recognition: the license plate detection can take the license plate as a target, and adopts a target detection method to detect, such as Faster-RCNN, YOLO, densebox and the like; license plate identification is to identify the detected license plate number, such as a CNN+RNN+CTC model and a CNN+RNN+attention model.

The license plate recognition method based on the static image is characterized in that the license plate recognition is carried out based on a single-frame picture, the recognition effect is determined to a great extent by the definition of the image, the snap-shot angle and the like, the information content provided by the single-frame picture is little, and the recognition accuracy is low. Therefore, more identification methods based on dynamic videos are studied at present, each frame of image in the videos can be identified, and the method has the advantages of being little influenced by single frame of images, high in adaptability of identification technology, high in speed, good in detection quality and the like. If the video information can be fully utilized, the processing speed and the accuracy of the license plate recognition system can be greatly improved. Since the target has problems of Motion blur, occlusion, low resolution and the like among certain video frames, the context information in the video can help to solve the problems, and representative methods are Motion guidance propagation (Motion-guided Propagation, MGP) and Multi-context suppression (Multi-context suppression, MCS) in T-CNN, target tracking for continuous missed targets among video frames, and a video target detection model based on deep learning. However, these methods are not ideal in real-time performance and have poor practicality.

In practical application, the existing methods exist: the inter-frame context information in the driving video is not utilized enough; the background is complex in a natural scene, and the influence of blurring, shielding, weather disqualification and the like can also exist, so that the robustness of the method is not very strong; and the license plate detection and recognition effects are not ideal.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides a license plate detection and identification method based on multi-video frame information and a kernel correlation filtering algorithm. The license plate region detection and license plate information recognition of the single-frame image are respectively carried out by utilizing different deep learning models, and multi-frame information tracking fusion is carried out by utilizing a kernel correlation filtering algorithm, so that the accuracy of license plate recognition and positioning can be improved, the calculation efficiency is high, and the method can be used for real-time processing.

A license plate detection and recognition method based on multi-video frame information and a kernel correlation filtering algorithm is characterized by comprising the following steps:

step 1: scaling each image in the license plate data set CCPD to an image with the size of 512 x 512, and forming a pre-training data set by all scaled images and labeling information of the scaled images;

step 2: respectively carrying out enhancement processing on each image in the pre-training data set, and forming a final training data set by all the images before and after the enhancement processing and the labeling information thereof; the enhancement treatment comprises any angle overturning, any size cutting and any degree of color transformation;

step 3: inputting the training data set obtained in the step 2 into a license plate detection network model for training to obtain a trained license plate detection model; the license plate detection network model extracts feature images with different sizes in a VgNet network, performs up-sampling to obtain feature images with the same size and different width channels, performs merging, sends the feature images obtained by merging into a full-connection layer, and outputs confidence and license plate region coordinates (cx, cy, w, h, score), wherein cx and cy are respectively the abscissa and the ordinate of the central point of the license plate region, w and h are the length and the width of the region, and score represents the confidence;

step 4: intercepting license plate areas in each image in the training data set, wherein all intercepted images and license plate mark information thereof form a training data set of a license plate recognition model, inputting the training data set into a character recognition network based on a visual attention mechanism for training, and taking the trained network as a final license plate recognition model; the character recognition network based on the visual attention mechanism is formed by connecting a 7-layer CNN network, an attention module and an LSTM network, wherein the 7-layer CNN extracts image characteristics with characterization capability, the attention module combines the output of the last layer with the state of an RNN hidden layer, and performs linear and softmax to obtain attention weight W, the attention weight W is multiplied by a feature matrix output by the CNN to obtain a feature map with attention characteristics, the feature map is sent to the LSTM layer, and the LSTM layer is used as a decoder to output a final text sequence;

step 5: inputting 1st to F frame images in a license plate video data set to be detected into a license plate detection model obtained in the step 3, respectively obtaining coordinates of 4 vertexes of a license plate region of each image, and cutting an original image according to the vertex coordinates to obtain a license plate region image of the original image; f has a value range of 5-30;

inputting F+1 which is less than or equal to k and less than or equal to N-1 of a kth frame image in a license plate video data set to be detected into a license plate detection model obtained in the step 3 to obtain coordinates of 4 vertexes of a license plate region of the license plate detection model, cutting an original image according to the coordinates of the vertexes to obtain a license plate region image of the image, tracking and predicting the image and the license plate region image obtained by the previous F frame by adopting a KCF kernel correlation filtering algorithm, and taking the obtained predicted image as a predicted license plate region image of the kth+1 frame image; inputting the k+1st frame image into the license plate detection model obtained in the step 3 to obtain coordinates of 4 vertexes of a license plate region, and carrying out weighted average on an image obtained by cutting the k+1st frame image and a predicted license plate region image of the k+1st frame image according to the vertex coordinates to obtain a final license plate region image of the k+1st frame image; n represents the total number of video frames contained in the dataset;

step 6: inputting the license plate region image of each frame of image obtained in the step 5 into the license plate recognition model obtained in the step 4, and taking the obtained result as an initial recognition result; then, voting each character according to the initial recognition result of each frame image and the initial recognition result of the previous M frame images, if the current frame is not more than the previous M frame images, voting all the previous frames, and taking the license plate number formed by the characters with the largest occurrence number as the recognition result of the frame images; m has a value ranging from 5 to 20.

The beneficial effects of the invention are as follows: as the multi-frame information in the video and the traditional kernel correlation filtering algorithm are effectively combined to detect and identify the license plate in the video, the accuracy of license plate identification is improved, and the instantaneity is ensured: because the data enhancement pretreatment is carried out by adopting various modes such as overturn, deformation, color transformation and the like, the method can be suitable for license plate detection and recognition under various scenes; because the feature images of different stages are utilized in the license plate detection stage, the detection model has better robustness for small target detection such as license plate detection; because the visual attention mechanism is used in the license plate recognition stage, the region where the license plate characters are located can be recognized in a key way, and the accuracy and the robustness of license plate recognition in a single-frame image are improved; as the method for voting the multi-frame recognition result in the video is adopted, the accuracy of license plate recognition in the video is improved.

Drawings

Fig. 1 is a flowchart of a license plate detection and recognition method based on multi-video frame information and a kernel correlation filtering algorithm.

Detailed Description

The invention will be further illustrated with reference to the following figures and examples, which include but are not limited to the following examples.

As shown in FIG. 1, the invention provides a license plate detection and identification method based on multi-video frame information and a kernel correlation filtering algorithm. The basic implementation process is as follows:

1. selecting and constructing a pre-training data set

In order to be suitable for various scenes, a license plate positioning and license plate recognition dataset with a proper sample diversity is required to be selected. The invention adopts a license plate data set CCPD, which is proposed in the literature "Zhenbo Xu, ajin Meng, nanxue Lu." Towards End-to-End License Plate Detection and Recognition: A Large Dataset and Baseline:15th European Conference,Munich,Germany,September 8-14,2018,Proceedings,Part XIII, "Eprint Arxiv,2018. Because the sizes of the images in the original data set are not uniform, the invention performs size transformation on the images to ensure that the sizes of the images are the same, namely, each image is scaled to an image with the size of 512 x 512, and the scaled image and the labeling information thereof form a pre-training data set.

2. Data set enhancement

The targeted sample data set is enhanced according to the distribution of the sample data set. The CCPD data set has complex natural conditions such as license plate blurring, shielding, rainy and snowy weather, strong light backlighting and the like, and for these conditions, data enhancement processing such as random angle overturning, random size cutting, random degree color change and the like is carried out on images in the pre-training data set, and all the images before enhancement processing and after enhancement processing and marking information thereof jointly form a final training data set.

3. License plate detection model training

And inputting the training data set obtained after the enhancement treatment into a license plate detection network model for training, and storing the trained network parameters to obtain a license plate detection model to be used later.

Since license plates have text features of fixed character categories, the shapes are all in the form of long rectangles and have obvious closed edge contours. Based on the above features, the license plate can be detected as text. The invention sends the enhanced data set into a license plate detection model for supervised training, continuously adjusts network parameters, modifies model structures and loss functions, compares the influence of two classical networks of VgNet and ResNet as a main network on the model effect, and finally determines the VgNet as the main network in consideration of time performance.

Therefore, the invention takes the classical network VgNet as the main network, extracts the characteristic diagrams with different stages and different sizes from the main network and performs characteristic aggregation, thereby solving the problem of severe license plate scale transformation, and the whole network is similar to a pyramid structure. The specific network model takes vgnet as a backbone network, extracts feature graphs with different sizes (in the embodiment, the feature graphs with different wide channels with the same size are extracted through the operation of upsampling the feature graphs with the different sizes, the feature graphs with the different wide channels are extracted through the operation of upsampling the feature graphs with the different sizes, then the feature graphs are combined, the combined feature graphs are sent to a full-connection layer to output confidence and license plate region coordinates (cx, cy, w, h, score), wherein cx is the abscissa of the central point of the license plate region, cy is the ordinate of the central point of the license plate region, w is the length of the license plate region, h is the width of the license plate region, and score represents the confidence.

4. License plate recognition model training

The CCPD data set is provided with license plate positioning marking information and license plate number information, and the recognition model and the detection model are two independent models, so that in the training stage of the recognition part, firstly, license plate areas of images in the training data set are intercepted according to the marking information, the intercepted license plate area images and license plate number marking information form a training data set of a license plate recognition model, and the training data set is input into a license plate recognition network model for training, so that a trained license plate recognition model is obtained.

Because the license plate number of China is composed of Chinese characters, capital letters and numbers and can be uniformly arranged in a license plate area according to a certain rule, the invention adopts a character recognition network model based on a visual attention mechanism as a license plate recognition network model, namely a network which is obtained by connecting a 7-layer Convolutional Neural Network (CNN), an attention module and a Long Short-Term Memory network (LSTM). Seven layers of CNNs are used for extracting image features with characterization capability, namely a plurality of slices are formed along the horizontal direction, each slice corresponds to a feature vector, the convolved receptive fields can overlap each other so that the features have a context relationship, then an area where a attention module learns license plate characters is stacked on the top layer of the CNNs, and finally an LSTM (least squares) is used as a decoder to output a final text sequence. The attention module specifically comprises: and combining the last output and the state of the RNN hidden layer, making a linear and softmax, obtaining attention weight W, multiplying W by a feature matrix output by CNN to obtain a feature map with attention features, and sending the feature map to the next LSTM layer.

5. License plate positioning and tracking

After license plate area detection is carried out on video frame images, the displacement scale of license plates between adjacent frames is not very large, in order to fully utilize the context information between different frame images, the invention adopts a KCF kernel correlation filtering algorithm to carry out positioning tracking on license plates between adjacent frames, the detection position of the next frame is predicted according to the positioning results of the previous frames, and in order to strengthen the effect of a detection part, the invention carries out weighted average on the result of KCF tracking and the result of detection of the next frame to be used as a final detection result. The KFC algorithm can obtain better balance effect on time performance and tracking effect, which is recorded in the literature "Henriques, joao F., caseiro, rui, martins, pedro, & Batista, jorge, high-speed tracking with kernelized correlation filters, IEEE Transactions on Pattern Analysis & Machine Intelligence,37 (3), 583-596", wherein the specific method is that a relevant filter is trained according to the information of the current frame and the information of the previous frame, then the relevant calculation is carried out on the relevant filter and the newly input frame, and the obtained confidence map is the predicted tracking result; and since the positioning information provided by the frame closer to the current frame is more referenced, the positioning information is given different information weights to the previous frame.

The method comprises the following steps of respectively calculating three conditions:

(1) Inputting 1st to F frame images in a license plate video data set to be detected into a license plate detection model obtained in the step 3, respectively obtaining coordinates of 4 vertexes of a license plate region of each image, and cutting an original image according to the vertex coordinates to obtain a license plate region image of the original image; f has a value of 5-30.

(2) F+1 is less than or equal to k and less than or equal to N-1 of a kth frame image in a license plate video data set to be detected, respectively inputting the kth frame image and the previous F frame image thereof into a license plate detection model obtained in the step 3, respectively obtaining coordinates of 4 vertexes of a license plate region of each image, cutting an original image according to the coordinates of the vertexes to obtain a license plate region image of each frame image, and then carrying out tracking prediction processing on the kth+1 frame image by using the images and adopting a KCF algorithm to obtain a predicted license plate region image of the kth+1 frame image; inputting the k+1st frame image into the license plate detection model obtained in the step 3 to obtain coordinates of 4 vertexes of a license plate region, and carrying out weighted average on an image obtained by cutting the k+1st frame image and a predicted license plate region image of the k+1st frame image according to the vertex coordinates to obtain a final license plate region image of the k+1st frame image; n represents the total number of video frames contained in the dataset.

(3) And (3) respectively inputting an N-th frame image in the license plate video data set to be detected and a front F-frame image thereof into the license plate detection model obtained in the step (3), respectively obtaining coordinates of 4 vertexes of a license plate region of each image, cutting an original image according to the coordinates of the vertexes to obtain a license plate region image of each frame image, and then tracking and predicting the N-th frame image by using the images through a KCF algorithm to obtain a predicted image serving as the license plate region image of the N-th frame image.

6. Multiple frame identification voting

And (3) inputting the license plate region image of each frame of image obtained in the step (5) into the license plate recognition model obtained in the step (4), and taking the obtained result as an initial recognition result.

In an actual scene, because of the change of license plate background, the license plate identification information between adjacent frames has larger difference, the invention votes each character on the initial identification result of each frame image and the initial identification result of the previous M frame image, and takes the license plate number formed by the character with the largest occurrence number as the identification result of the frame image; if the current frame is less than M frames before, voting is performed on all the previous frames. Wherein, the value range of M is 5-20. Finally, license plate identification information in the complete video segment is obtained.

To verify the effect of the method of the invention, the CPU isCPU E5-2697v2@2.70GHz, memory 128G, graphics processor +.>GeForSimulation experiments were performed on ce 1080Ti GPU, red Hat 6.5 operating system using a pytorch framework in combination with the Python language. Select literature "/-respectively>The method in Gabriel resde, david menu, and William Robson schwartz. "License plate recognition based on temporal reduced" 2016IEEE 19th International Conference on Intelligent Transportation Systems (ITSC) & IEEE,2016 "is comparative method 1, the method in literature" Silva, sergio Montazzolli, and Claudio Rosito jung. "Real-time brazilian license plate detection and recognition using deep convolutional neural networks" 2017 30th SIBGRAPI Conference on Graphics,Patterns and Images (sibrapi) & IEEE,2017 "is comparative method 2, the method in literature" Laroca, rayson, et al "a robustreal-time automatic license plate recognition based on the YOLO detector" 2018International Joint Conference on Neural Networks (IJCNN) & IEEE,2018 "is comparative method 3, and the effect is compared with the method of the present invention. Different methods are used for literature "The SSEG-PLATE license PLATE video dataset proposed in Gabriel Resender, et al, "Real-time automatic license PLATE recognition through deep multi-task networks," 2018 31st SIBGRAPI Conference on Graphics,Patterns and Images (SIBGRAPI), "IEEE, 2018," was tested, with both fixed F and M being 10, and the recognition accuracy and temporal performance FPS of the different methods across all frames being compared, and the test results data being shown in Table 1. It can be seen that the recognition accuracy of the method of the present invention is higher than that of all the comparison methods, while the time performance is better than that of the comparison methods 1 and 3.

TABLE 1

Method	Accuracy of identification	FPS
			Comparative method 1	81.8％	28
Comparative method 2	63.1％	55
			Comparative method 3	85.4％	36
The method of the invention	87.6％	38

The method is a license plate detection and recognition method which has stronger robustness and can be suitable for more complex natural scenes, and is not limited to the complex driving scenes although the license plate detection and recognition method is performed in the complex driving scenes. In addition, the method can fully mine the inter-frame information in the video, strengthen the identification performance on each frame, make the video have strong generalization, and simultaneously consider the time performance in the method design, so that the offline video segment can be efficiently processed.

Claims

1. A license plate detection and recognition method based on multi-video frame information and a kernel correlation filtering algorithm is characterized by comprising the following steps:

step 6: inputting the license plate region image of each frame of image obtained in the step 5 into the license plate recognition model obtained in the step 4, and taking the obtained result as an initial recognition result; then, voting each character according to the initial recognition result of each frame image and the initial recognition result of the previous M frame images, if the current frame is not more than the previous M frame images, voting all the previous frames, and taking the license plate number formed by the characters with the largest occurrence number as the recognition result of the frame images;

m has a value ranging from 5 to 20.