CN113361467A

CN113361467A - License plate recognition method based on field adaptation

Info

Publication number: CN113361467A
Application number: CN202110737913.1A
Authority: CN
Inventors: 郑嘉文; 邓金红; 丁建鹏; 段立新; 李文
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2021-06-30
Filing date: 2021-06-30
Publication date: 2021-09-07

Abstract

The invention provides a license plate recognition method based on domain adaptation. First, the Gaussian mixture model is used to model the background, and the waste of computing resources of the model on the background frame is reduced by judging the appearance time of the foreground; All the data are labeled, which greatly reduces the labeling cost of license plate data; the proposed license plate detection based on image segmentation achieves the precise positioning of the license plate at the pixel level, and proposes a cascade correction method for the segmentation map to correct the license plate image; The network model has a good generalization ability. In the face of new scenes, only the corresponding image data needs to be collected, and a model that performs robustly in new scenes can be trained without laborious labeling.

Description

License plate recognition method based on field adaptation

Technical Field

The invention relates to a computer learning and computer vision application technology, in particular to a license plate detection and recognition technology.

Background

In the field of computer vision, object detection and recognition usually require a large amount of training data with labeled information, however, the acquisition and labeling of training data is a very time-consuming and labor-consuming task.

The detection and identification of license plates are an important branch of the application field of computer vision, and are widely applied in the fields of public transportation, automatic driving, intelligent parking and the like.

The prior art solutions generally include: the camera acquires video stream data, software and hardware decodes the video stream, a deep learning model processes video frames, and a license plate recognition result is returned. When processing a video frame, the existing scheme generally processes only a part of frames in a frame extraction mode, so that the consumption of computing resources is reduced, but a large number of background frames without targets are still processed in a time period with fewer targets, and the computing resources are wasted. In addition, a large amount of marking data is needed for the deep learning model, but the problems of long time and large consumption of manpower, material resources and financial resources exist in the collection and marking of the license plate data; the trained model can only be deployed in a specific scene, such as a gate of a parking lot, and the network model trained by using data acquired by a gate camera can only be deployed at a position of a similar gate, so that the performance is remarkably reduced if the network model is deployed at other positions of the parking lot, and the model is insufficient in universality. In addition, the existing license plate recognition model is generally based on a classical target detection model, such as an RCNN series and a YOLO series, and the model is based on an Anchor prior regression rectangular detection frame, so that accurate positioning of a license plate at a pixel level cannot be achieved.

Disclosure of Invention

The invention aims to solve the technical problem of providing a license plate identification method which has good generalization capability and is based on accurate license plate positioning of pixel level.

The scheme adopted by the invention for solving the technical problems is that the license plate recognition method based on field adaptation comprises the following steps:

s1, acquiring a video stream;

s2, decoding the video stream to an image;

s3, modeling the image background by using a Gaussian mixture model, and screening out image frames containing moving objects:

s4, vehicle detection based on the deep learning model of the domain adaptation:

when a deep learning model based on field adaptation is trained, firstly, extracting features with unchanged fields in the image region containing the moving object by using a feature extractor of a YOLO model, then respectively inputting the extracted features into a classification regression module and a field classifier of the YOLO model, and judging whether the input features are from a source field or a target field by using the field classifier; a classification regression module of the YOLO model locates a vehicle region; wherein a loss function is used in training a deep learning model based on domain adaptation

Comprises the following steps:

for the loss function of the YOLO model,

the countermeasure loss of the domain classifier is realized through a gradient inversion layer between the feature extractor and the domain classifier; directly receiving an input image region containing a moving object and outputting a vehicle region based on a field-adaptive deep learning model after training is finished;

s5, a domain-adaptation-based segmented model license plate detection step:

inputting the detected vehicle region into a domain adaptive feature extractor of a domain adaptive segmentation model, wherein the domain adaptive feature extractor extracts domain invariant features in a countermeasure training mode, then inputting the extracted features into a decoding output layer of the domain adaptive segmentation model, and outputting a license plate segmentation result by the decoding output layer;

s6, cascading and correcting the license plate: performing primary coarse-grained correction by using a minimum circumscribed rectangle based on a license plate segmentation result through affine transformation; performing affine transformation on the basis of a straight line fitted by the left contour of the license plate region after the first fine-grained correction to complete the second fine-grained correction;

s7, license plate character recognition: inputting the license plate area after twice correction into a target detection model, and outputting detected characters by the target detection model in sequence;

s8, voting based on time axis: and performing statistical voting on all license plate characters of the current vehicle up to the current time axis, and outputting the character with the best vote as the license plate character.

The method has the advantages that firstly, a Gaussian mixture model is used for modeling the background, and the waste of computing resources of the model on a background frame is reduced by judging the time when the foreground appears; the data does not need to be completely labeled by utilizing the field adaptive technology in the transfer learning, so that the labeling cost of the license plate data is greatly reduced; the license plate detection based on image segmentation achieves accurate positioning of a license plate at a pixel level, and a cascade correction method is provided for a segmentation image to correct the license plate image; the network model trained by adopting the field adaptive technology has good generalization capability, and only corresponding image data needs to be collected in the face of a new scene, and a model which can express robustness in the new scene can be trained without laborious labeling.

Drawings

FIG. 1 is a schematic diagram of a system;

FIG. 2 is a schematic process flow diagram;

FIG. 3 is a schematic diagram of a Gaussian mixture model for determining a foreground;

FIG. 4 is a schematic diagram of text detection based on domain adaptation;

FIG. 5CCPD data set, Chinese license plate photographed by cell phone;

FIG. 6 is a license plate of a gate of a foreign parking lot;

FIG. 7 is a license plate segmentation graph of input data and output;

FIG. 8 is a license plate image after primary rectification;

FIG. 9 is a segmentation drawing (top) after one correction, the contour of the segmentation drawing (bottom);

FIG. 10 is a license plate coordinate system by outline resume;

FIG. 11 is a schematic diagram of the license plate left side determination;

FIG. 12 is a schematic diagram of a license plate after secondary fine correction;

FIG. 13 is a schematic view of a license plate picture detected;

FIG. 14 is a schematic view of a passing vehicle showing a doorway picture;

fig. 15 is a diagram showing the result generated based on the time axis voting.

Detailed Description

As shown in fig. 1, the license plate recognition system for implementing the method of the present invention includes:

the license plate data acquisition and preprocessing equipment is used for acquiring required images and video data and preprocessing the acquired data, such as hard decoding video flowing to the images and modeling the background to screen out images containing moving objects;

the domain adaptive license plate detection device is used for obtaining a vehicle detection result by using a domain adaptive method according to the provided image data, then carrying out domain adaptive license plate detection and segmenting a license plate;

and the license plate correction and recognition equipment is used for obtaining a license plate recognition result through license plate correction and license plate recognition post-processing according to the license plate area.

As shown in fig. 2, the license plate recognition method specifically includes the following steps:

s1, acquiring video stream

In this embodiment, the video stream to be identified is obtained from the erected camera in real time, and the erection place and angle of the camera are not limited. The video to be identified may also be directly read from the video file, which is not limited in this embodiment.

S2, hard decoding video stream to image

The hardware is used to decode the video stream to the image, and the decoding mode is related to the hardware used, and the decoding hardware is not limited in this embodiment.

S3, modeling the background, and screening out images containing moving objects:

because a video stream acquired from a camera generally contains tens of frames of data per second, processing each frame of data has a high requirement on computing resources, and the existing license plate recognition scheme generally performs a frame extraction operation in order to reduce the consumption of computing resources (for example, under the condition of uniformly extracting 4 frames per second, an input image frame is processed every 250ms, and all intermediate frames are discarded). Although the calculation amount is reduced by the uniform frame extraction operation, the problem that the license plate recognizer has to consume a large amount of resources to process video frames without targets under the condition that the targets are rare cannot be avoided.

In order to reduce the phenomenon that a license plate recognizer idles and wastes computing resources, modeling is carried out on an image background. Considering that the license plate target in a use scene is generally in a motion state, a Gaussian mixture model is used for modeling an image background, frames only containing static objects are filtered by using lower calculation cost, and a license plate recognition model only needs to process frames with moving objects, so that the waste of calculation resources is reduced, and the specific steps are as follows:

s301, modeling an image background by using a Gaussian mixture model;

s302, filtering the background image by using a Gaussian mixture model, and screening the foreground image.

For convenience of description, a region in an image containing a moving object is referred to as a foreground, and vice versa as a background. Background pixels are typically much larger than foreground pixels, and background pixels are modeled using a gaussian mixture model, and pixels that do not fit the model distribution are classified as foreground pixels. As shown in fig. 3, a vehicle whose white pixel region is moving is determined as a foreground, and other stationary portions represented by black pixels are determined as a background.

S4 deep learning model detection vehicle based on field adaptation

In this embodiment, we propose a deep learning model based on domain adaptation to detect vehicles in an image.

As is well known, conventional object detection models typically require a consistent distribution of images in the training and test sets, but in practical applications such limitations are significantIt is difficult to satisfy. Due to the actual deployment environment, the test set images may come from different weather, time periods, and different image acquisition devices, so the test set and the training set usually have great field differences in practical applications. To solve such problems, more data in the test scenario can be collected and labeled, but the labeling of data is time-consuming and expensive. For such a scene, the present embodiment proposes a deep learning model based on domain adaptation to detect vehicles in an image, and ensures that when our model encounters a new scene, data of the new scene no longer needs to be labeled. Specifically, we have a large number of labeled training sets, called source domains

Containing image data

And corresponding label

Wherein X_iRepresents the ith picture, B_iAnd C_iRepresenting the corresponding annotation information, i.e. target coordinates and class, N_sIs the number of source domain images. Correspondingly, we have new scene data without labels, called target domain

Containing image data

Wherein N is_tIs the number of target domain images.

A general target detection model based on field adaptation is constructed based on a two-stage strategy of fast R-CNN, and the model has high detection precision but low speed and is difficult to meet the real-time requirement in real application. In consideration of real-time performance of the YOLO and good detection precision, a deep learning model based on field adaptation is used for detecting a vehicle target in an image, during training, a feature extractor of the YOLO model is adopted for extracting features with unchanged fields in an image region containing a moving object, then the extracted features are respectively input into a classification regression module and a field classifier of the YOLO model, and the field classifier judges whether the input features are from a source field or a target field; a classification regression module of the YOLO model locates a vehicle region; by reducing the domain difference between the source domain and the target domain on the YOLO, the feature extractor of the YOLO can extract the features with unchanged domains, so that the detection capability learned by the model on the source domain can be effectively applied to the target domain. While improving the detection capability of the model, the method reduces the distribution difference of the source domain and the target domain by using a countertraining mode. Specifically, at the intermediate feature level of the YOLO structure, we add a domain classifier to distinguish whether the input features are from the source domain or the target domain. In the countermeasure training, the purpose of the feature extractor of YOLO is to confuse the features of the source domain and the target domain, and finally, the feature extractor of YOLO can extract the features with unchanged domain in a countermeasure mode. A Gradient reversal Layer (Gradient Reverse Layer) is added between the feature extractor and the domain classifier to Reverse the Gradient of backward propagation, so that the model can obtain the feature with unchanged domain when the loss function of the domain classifier is minimized. Domain invariant features refer to features extracted by the feature extractor that enable the classifier to distinguish whether it is from the source domain or the target domain. The final optimization objective was:

wherein

As a function of the loss of the YOLO,

is the countermeasure loss of the domain classifier.

S5 license plate detection method based on domain-adaptive segmentation model

In the embodiment, a domain-adaptive segmentation model is provided for accurately positioning a license plate. The traditional license plate detection is generally based on a classical target detection model, such as a YOLO series and an R-CNN series, and the traditional license plate detection model is set by relying on a prior Anchor point Anchor and regresses a rectangular detection frame surrounding a license plate target. However, in real application, shooting scenes are complex, the size and the angle of the license plate are different under the action of the shooting angle of the camera, and the license plate position is difficult to accurately position by a rectangular detection frame of a traditional license plate detector based on Anchor regression, so that the license plate is accurately positioned by using a pixel-level segmentation model.

On the other hand, similar to that described in S4, due to the actual deployment environment, there is a huge domain difference between the training data and the test data in the real scene, which may result in a large reduction in model performance. For example, fig. 5 shows a labeled domestic vehicle license plate photographed by a mobile phone, and fig. 6 shows a vehicle and a license plate photographed by a camera of a non-labeled foreign parking lot gate. Due to the fact that natural environments such as shooting scenes, equipment and illumination are different, if a general license plate detection model is trained on the data of the graph 5, the test performance of the data of the graph 6 is greatly reduced. In order to solve the problem, a domain-adaptation-based segmentation model is provided for accurately positioning the license plate. Our model contains two parts:

(1) the license plate segmentation model based on the U-Net structure can segment the license plate region pixel by pixel;

(2) and the domain adaptation module is based on a multi-layer countermeasure and license plate text attention mechanism. The module extracts features with unchanged fields by carrying out multi-layer confrontation training on a feature extractor of the segmentation model, and then enhances the features of the license plate region by combining with an attention mechanism.

Our model can be trained on published or existing datasets, and still have good detection capabilities when tested on unlabeled target domain data. The segmentation model based on the field adaptation mainly comprises a field adaptation feature extractor and a decoding output layer, wherein the field adaptation feature extractor extracts the features with unchanged fields in a countermeasure training mode, so that the model can still accurately position the license plate region in the target domain scene without labels.

Compared with the traditional license plate detection based on deep learning, the segmentation model based on the field adaptation can accurately position the license plate region in the scene lacking the labeled data, and the labor cost of labeling a large amount of data is avoided.

S6, correcting the license plate based on the segmentation result: aiming at the license plate area positioned in S5, a cascade correction mode is provided for correcting the license plate, and the method comprises the following specific steps:

s601, correcting the license plate based on affine transformation;

s602, secondarily correcting the license plate after primary correction based on contour segmentation;

the embodiment comprises a license plate correction algorithm based on affine transformation and a fine license plate correction algorithm based on contour segmentation. As shown in fig. 7, the domain-adaptive-based license plate segmentation model provided in S5 precisely segments the region where the license plate is located, finds the smallest circumscribed rectangle of the region, and uses affine transformation to complete the first coarse-grained correction, where the corrected license plate image is as shown in fig. 8.

The license plate after primary correction still has small-angle inclination, and the precision of license plate recognition can be influenced, so that the license plate after primary correction is secondarily corrected based on contour segmentation. The method comprises the following steps: according to the primary correction step, the segmentation chart is corrected, the contour of the segmentation chart is found, and the corrected segmentation chart and the contour are as shown in fig. 9. And calculating the center point of the segmented region according to the contour, and establishing a plane rectangular coordinate system at the center point of the license plate. Judging the inclination direction (leftward or rightward inclination) of the license plate region according to the contour, searching a point with the farthest distance between the contour of the second quadrant and the central point when the license plate region is inclined rightward, determining a point on the left side of the license plate, and fitting the left boundary of the contour with the point at the leftmost lower corner of the contour; if the vehicle is inclined to the left, a point of the third quadrant contour, which is farthest away from the central point, is searched to determine a point on the left side of the license plate, and the left boundary of the contour is fitted with the point of the leftmost upper corner of the contour. As shown in fig. 10, the contour is tilted to the right in the figure, the black point is the contour center point and is also the origin of coordinates, and the white point is a point found on the contour of the second quadrant. As shown in fig. 11, according to the contour direction, the upper left corner point or the lower left corner point of the contour is selected to fit the straight line of the left boundary to determine the slope of the left boundary of the license plate, and the gray line is the fitted left boundary of the license plate contour; and then carrying out second affine transformation according to the left boundary to carry out fine-grained correction, wherein the license plate after the second fine correction is as shown in figure 12, and the image is not completely corrected after the second fine correction, but the image is corrected to an angle which can be accurately recognized by a subsequent character recognition model. The area having the solid black area on the right side is not present after the affine transformation and is filled with black.

S7, segmenting and recognizing license plate characters based on deep learning model

In this embodiment, the license plate is identified based on the deep learning target detection model. Generally, the number of characters of the license plate of all countries in the world is limited, and in the case of the license plate of the english country, the license plate generally consists of english alphabets and numbers, and there are only 36 types of characters in total, and even if there are exceptions, there are several more special characters. Compared with the traditional model which predicts the license plate characters based on the sequence or predicts the length of the fixed license plate characters, the former can not predict a plurality of lines of license plates, and the latter can only predict the license plates with fixed length, and the target detection model YOLOv3 obtained based on the training of the pre-training model is used for predicting the license plate characters. The license plate character recognition method does not limit the length of the license plate characters, does not limit the number of lines of the license plate, and can accurately recognize multiple lines of license plates.

The method comprises the following steps of firstly, manually marking collected original license plate image data, and marking all license plate characters and categories on each image. The number of original image samples acquired in this step may be 5000, for example. The marked image is used for training based on a pre-training model, and data Augmentation (Augmentation) is carried out on input data in a data reading link in the training process so as to improve the generalization capability of the model and avoid overfitting. The augmentation may be, for example, random cropping, scaling, rotation, affine transformation, contrast adjustment, random erasure, and so forth. The trained model has the capability of predicting the positions and the types of the license plate characters, as shown in fig. 13, the recognition results of all character pictures in the license plate number, namely SKF228Z, are obtained according to the relative position sorting of the license plate characters.

S8, voting the license plate recognition result based on a time axis: the steps comprise a license plate result voting mechanism based on a time axis, and a final license plate recognition result is generated by voting through counting license plate recognition results on the time axis. As shown in fig. 14, taking license plate recognition at a gate as an example, there is a time difference between a vehicle entering a camera picture and a vehicle exiting the camera picture, and assuming that a vehicle needs two seconds from entering the camera picture to leaving the lens picture, our license plate recognition model will continuously detect and recognize the vehicle and the license plate within this time, and assuming that we detect 6 frames of pictures within one second, we have detected 12 frames of pictures in this process, and have obtained 12 license plate recognition results, as shown in fig. 15, there may be a little false detection (results of light color fonts in 3 rd, 6 th, and 10 th) in this process, but because all statistical results of the current vehicle up to the current time axis will be voted, the correctness of the current recognition result can be finally ensured, and the real-time recognition result is shown in the first row of fig. 15, and is a correct PC56 5629Z.

Claims

1. a license plate recognition method based on domain adaptation, is characterized in that, comprises the following steps:

S1. Obtain a video stream;

S2, decode video stream to image;

S3. Use a Gaussian mixture model to model the image background, and filter out image frames containing moving objects;

S4. Vehicle detection steps based on the domain-adapted deep learning model:

When training a deep learning model based on domain adaptation, the feature extractor of the YOLO model is used to extract features in the image area containing moving objects, and then the extracted features are input into the classification regression module and domain classification of the YOLO model respectively. The domain classifier determines whether the input features are from the source domain or the target domain; the classification and regression module of the YOLO model locates the vehicle area; among them, the loss function used when training the deep learning model based on domain adaptation

for:

is the loss function of the YOLO model,

It is the adversarial loss of the domain classifier, which is realized by the gradient reversal layer between the feature extractor and the domain classifier; after the training is completed, the deep learning model based on domain adaptation directly receives the input image area containing moving objects and outputs the vehicle area;

S5. Segmentation model license plate detection steps based on domain adaptation:

Input the detected vehicle area into the domain-adaptive feature extractor based on the domain-adaptive segmentation model, and the domain-adaptive feature extractor extracts domain-invariant features through adversarial training, and then inputs the extracted features into the domain-adaptive-based segmentation model. Decoding the output layer, the decoding output layer outputs the license plate segmentation result;

S6, the step of cascading the license plate correction: use the affine transformation to perform the first coarse-grained correction based on the minimum circumscribed rectangle of the license plate segmentation result; Perform affine transformation to complete the second fine-grained correction;

S7, the license plate character recognition step: input the license plate area after two corrections into the target detection model, and the target detection model outputs the detected characters in turn;

S8. Step of voting based on the time axis: perform statistical voting on the recognition of all license plate characters of the current vehicle up to the current time axis, and output the character with the best vote as the license plate character.