Disclosure of Invention
The invention aims to solve the technical problem of providing a license plate identification method which has good generalization capability and is based on accurate license plate positioning of pixel level.
The scheme adopted by the invention for solving the technical problems is that the license plate recognition method based on field adaptation comprises the following steps:
s1, acquiring a video stream;
s2, decoding the video stream to an image;
s3, modeling the image background by using a Gaussian mixture model, and screening out image frames containing moving objects:
s4, vehicle detection based on the deep learning model of the domain adaptation:
when a deep learning model based on field adaptation is trained, firstly, extracting features with unchanged fields in the image region containing the moving object by using a feature extractor of a YOLO model, then respectively inputting the extracted features into a classification regression module and a field classifier of the YOLO model, and judging whether the input features are from a source field or a target field by using the field classifier; a classification regression module of the YOLO model locates a vehicle region; wherein a loss function is used in training a deep learning model based on domain adaptation
Comprises the following steps:
for the loss function of the YOLO model,
the countermeasure loss of the domain classifier is realized through a gradient inversion layer between the feature extractor and the domain classifier; directly receiving an input image region containing a moving object and outputting a vehicle region based on a field-adaptive deep learning model after training is finished;
s5, a domain-adaptation-based segmented model license plate detection step:
inputting the detected vehicle region into a domain adaptive feature extractor of a domain adaptive segmentation model, wherein the domain adaptive feature extractor extracts domain invariant features in a countermeasure training mode, then inputting the extracted features into a decoding output layer of the domain adaptive segmentation model, and outputting a license plate segmentation result by the decoding output layer;
s6, cascading and correcting the license plate: performing primary coarse-grained correction by using a minimum circumscribed rectangle based on a license plate segmentation result through affine transformation; performing affine transformation on the basis of a straight line fitted by the left contour of the license plate region after the first fine-grained correction to complete the second fine-grained correction;
s7, license plate character recognition: inputting the license plate area after twice correction into a target detection model, and outputting detected characters by the target detection model in sequence;
s8, voting based on time axis: and performing statistical voting on all license plate characters of the current vehicle up to the current time axis, and outputting the character with the best vote as the license plate character.
The method has the advantages that firstly, a Gaussian mixture model is used for modeling the background, and the waste of computing resources of the model on a background frame is reduced by judging the time when the foreground appears; the data does not need to be completely labeled by utilizing the field adaptive technology in the transfer learning, so that the labeling cost of the license plate data is greatly reduced; the license plate detection based on image segmentation achieves accurate positioning of a license plate at a pixel level, and a cascade correction method is provided for a segmentation image to correct the license plate image; the network model trained by adopting the field adaptive technology has good generalization capability, and only corresponding image data needs to be collected in the face of a new scene, and a model which can express robustness in the new scene can be trained without laborious labeling.
Detailed Description
As shown in fig. 1, the license plate recognition system for implementing the method of the present invention includes:
the license plate data acquisition and preprocessing equipment is used for acquiring required images and video data and preprocessing the acquired data, such as hard decoding video flowing to the images and modeling the background to screen out images containing moving objects;
the domain adaptive license plate detection device is used for obtaining a vehicle detection result by using a domain adaptive method according to the provided image data, then carrying out domain adaptive license plate detection and segmenting a license plate;
and the license plate correction and recognition equipment is used for obtaining a license plate recognition result through license plate correction and license plate recognition post-processing according to the license plate area.
As shown in fig. 2, the license plate recognition method specifically includes the following steps:
s1, acquiring video stream
In this embodiment, the video stream to be identified is obtained from the erected camera in real time, and the erection place and angle of the camera are not limited. The video to be identified may also be directly read from the video file, which is not limited in this embodiment.
S2, hard decoding video stream to image
The hardware is used to decode the video stream to the image, and the decoding mode is related to the hardware used, and the decoding hardware is not limited in this embodiment.
S3, modeling the background, and screening out images containing moving objects:
because a video stream acquired from a camera generally contains tens of frames of data per second, processing each frame of data has a high requirement on computing resources, and the existing license plate recognition scheme generally performs a frame extraction operation in order to reduce the consumption of computing resources (for example, under the condition of uniformly extracting 4 frames per second, an input image frame is processed every 250ms, and all intermediate frames are discarded). Although the calculation amount is reduced by the uniform frame extraction operation, the problem that the license plate recognizer has to consume a large amount of resources to process video frames without targets under the condition that the targets are rare cannot be avoided.
In order to reduce the phenomenon that a license plate recognizer idles and wastes computing resources, modeling is carried out on an image background. Considering that the license plate target in a use scene is generally in a motion state, a Gaussian mixture model is used for modeling an image background, frames only containing static objects are filtered by using lower calculation cost, and a license plate recognition model only needs to process frames with moving objects, so that the waste of calculation resources is reduced, and the specific steps are as follows:
s301, modeling an image background by using a Gaussian mixture model;
s302, filtering the background image by using a Gaussian mixture model, and screening the foreground image.
For convenience of description, a region in an image containing a moving object is referred to as a foreground, and vice versa as a background. Background pixels are typically much larger than foreground pixels, and background pixels are modeled using a gaussian mixture model, and pixels that do not fit the model distribution are classified as foreground pixels. As shown in fig. 3, a vehicle whose white pixel region is moving is determined as a foreground, and other stationary portions represented by black pixels are determined as a background.
S4 deep learning model detection vehicle based on field adaptation
In this embodiment, we propose a deep learning model based on domain adaptation to detect vehicles in an image.
As is well known, conventional object detection models typically require a consistent distribution of images in the training and test sets, but in practical applications such limitations are significantIt is difficult to satisfy. Due to the actual deployment environment, the test set images may come from different weather, time periods, and different image acquisition devices, so the test set and the training set usually have great field differences in practical applications. To solve such problems, more data in the test scenario can be collected and labeled, but the labeling of data is time-consuming and expensive. For such a scene, the present embodiment proposes a deep learning model based on domain adaptation to detect vehicles in an image, and ensures that when our model encounters a new scene, data of the new scene no longer needs to be labeled. Specifically, we have a large number of labeled training sets, called source domains
Containing image data
And corresponding label
Wherein X
iRepresents the ith picture, B
iAnd C
iRepresenting the corresponding annotation information, i.e. target coordinates and class, N
sIs the number of source domain images. Correspondingly, we have new scene data without labels, called target domain
Containing image data
Wherein N is
tIs the number of target domain images.
A general target detection model based on field adaptation is constructed based on a two-stage strategy of fast R-CNN, and the model has high detection precision but low speed and is difficult to meet the real-time requirement in real application. In consideration of real-time performance of the YOLO and good detection precision, a deep learning model based on field adaptation is used for detecting a vehicle target in an image, during training, a feature extractor of the YOLO model is adopted for extracting features with unchanged fields in an image region containing a moving object, then the extracted features are respectively input into a classification regression module and a field classifier of the YOLO model, and the field classifier judges whether the input features are from a source field or a target field; a classification regression module of the YOLO model locates a vehicle region; by reducing the domain difference between the source domain and the target domain on the YOLO, the feature extractor of the YOLO can extract the features with unchanged domains, so that the detection capability learned by the model on the source domain can be effectively applied to the target domain. While improving the detection capability of the model, the method reduces the distribution difference of the source domain and the target domain by using a countertraining mode. Specifically, at the intermediate feature level of the YOLO structure, we add a domain classifier to distinguish whether the input features are from the source domain or the target domain. In the countermeasure training, the purpose of the feature extractor of YOLO is to confuse the features of the source domain and the target domain, and finally, the feature extractor of YOLO can extract the features with unchanged domain in a countermeasure mode. A Gradient reversal Layer (Gradient Reverse Layer) is added between the feature extractor and the domain classifier to Reverse the Gradient of backward propagation, so that the model can obtain the feature with unchanged domain when the loss function of the domain classifier is minimized. Domain invariant features refer to features extracted by the feature extractor that enable the classifier to distinguish whether it is from the source domain or the target domain. The final optimization objective was:
wherein
As a function of the loss of the YOLO,
is the countermeasure loss of the domain classifier.
S5 license plate detection method based on domain-adaptive segmentation model
In the embodiment, a domain-adaptive segmentation model is provided for accurately positioning a license plate. The traditional license plate detection is generally based on a classical target detection model, such as a YOLO series and an R-CNN series, and the traditional license plate detection model is set by relying on a prior Anchor point Anchor and regresses a rectangular detection frame surrounding a license plate target. However, in real application, shooting scenes are complex, the size and the angle of the license plate are different under the action of the shooting angle of the camera, and the license plate position is difficult to accurately position by a rectangular detection frame of a traditional license plate detector based on Anchor regression, so that the license plate is accurately positioned by using a pixel-level segmentation model.
On the other hand, similar to that described in S4, due to the actual deployment environment, there is a huge domain difference between the training data and the test data in the real scene, which may result in a large reduction in model performance. For example, fig. 5 shows a labeled domestic vehicle license plate photographed by a mobile phone, and fig. 6 shows a vehicle and a license plate photographed by a camera of a non-labeled foreign parking lot gate. Due to the fact that natural environments such as shooting scenes, equipment and illumination are different, if a general license plate detection model is trained on the data of the graph 5, the test performance of the data of the graph 6 is greatly reduced. In order to solve the problem, a domain-adaptation-based segmentation model is provided for accurately positioning the license plate. Our model contains two parts:
(1) the license plate segmentation model based on the U-Net structure can segment the license plate region pixel by pixel;
(2) and the domain adaptation module is based on a multi-layer countermeasure and license plate text attention mechanism. The module extracts features with unchanged fields by carrying out multi-layer confrontation training on a feature extractor of the segmentation model, and then enhances the features of the license plate region by combining with an attention mechanism.
Our model can be trained on published or existing datasets, and still have good detection capabilities when tested on unlabeled target domain data. The segmentation model based on the field adaptation mainly comprises a field adaptation feature extractor and a decoding output layer, wherein the field adaptation feature extractor extracts the features with unchanged fields in a countermeasure training mode, so that the model can still accurately position the license plate region in the target domain scene without labels.
Compared with the traditional license plate detection based on deep learning, the segmentation model based on the field adaptation can accurately position the license plate region in the scene lacking the labeled data, and the labor cost of labeling a large amount of data is avoided.
S6, correcting the license plate based on the segmentation result: aiming at the license plate area positioned in S5, a cascade correction mode is provided for correcting the license plate, and the method comprises the following specific steps:
s601, correcting the license plate based on affine transformation;
s602, secondarily correcting the license plate after primary correction based on contour segmentation;
the embodiment comprises a license plate correction algorithm based on affine transformation and a fine license plate correction algorithm based on contour segmentation. As shown in fig. 7, the domain-adaptive-based license plate segmentation model provided in S5 precisely segments the region where the license plate is located, finds the smallest circumscribed rectangle of the region, and uses affine transformation to complete the first coarse-grained correction, where the corrected license plate image is as shown in fig. 8.
The license plate after primary correction still has small-angle inclination, and the precision of license plate recognition can be influenced, so that the license plate after primary correction is secondarily corrected based on contour segmentation. The method comprises the following steps: according to the primary correction step, the segmentation chart is corrected, the contour of the segmentation chart is found, and the corrected segmentation chart and the contour are as shown in fig. 9. And calculating the center point of the segmented region according to the contour, and establishing a plane rectangular coordinate system at the center point of the license plate. Judging the inclination direction (leftward or rightward inclination) of the license plate region according to the contour, searching a point with the farthest distance between the contour of the second quadrant and the central point when the license plate region is inclined rightward, determining a point on the left side of the license plate, and fitting the left boundary of the contour with the point at the leftmost lower corner of the contour; if the vehicle is inclined to the left, a point of the third quadrant contour, which is farthest away from the central point, is searched to determine a point on the left side of the license plate, and the left boundary of the contour is fitted with the point of the leftmost upper corner of the contour. As shown in fig. 10, the contour is tilted to the right in the figure, the black point is the contour center point and is also the origin of coordinates, and the white point is a point found on the contour of the second quadrant. As shown in fig. 11, according to the contour direction, the upper left corner point or the lower left corner point of the contour is selected to fit the straight line of the left boundary to determine the slope of the left boundary of the license plate, and the gray line is the fitted left boundary of the license plate contour; and then carrying out second affine transformation according to the left boundary to carry out fine-grained correction, wherein the license plate after the second fine correction is as shown in figure 12, and the image is not completely corrected after the second fine correction, but the image is corrected to an angle which can be accurately recognized by a subsequent character recognition model. The area having the solid black area on the right side is not present after the affine transformation and is filled with black.
S7, segmenting and recognizing license plate characters based on deep learning model
In this embodiment, the license plate is identified based on the deep learning target detection model. Generally, the number of characters of the license plate of all countries in the world is limited, and in the case of the license plate of the english country, the license plate generally consists of english alphabets and numbers, and there are only 36 types of characters in total, and even if there are exceptions, there are several more special characters. Compared with the traditional model which predicts the license plate characters based on the sequence or predicts the length of the fixed license plate characters, the former can not predict a plurality of lines of license plates, and the latter can only predict the license plates with fixed length, and the target detection model YOLOv3 obtained based on the training of the pre-training model is used for predicting the license plate characters. The license plate character recognition method does not limit the length of the license plate characters, does not limit the number of lines of the license plate, and can accurately recognize multiple lines of license plates.
The method comprises the following steps of firstly, manually marking collected original license plate image data, and marking all license plate characters and categories on each image. The number of original image samples acquired in this step may be 5000, for example. The marked image is used for training based on a pre-training model, and data Augmentation (Augmentation) is carried out on input data in a data reading link in the training process so as to improve the generalization capability of the model and avoid overfitting. The augmentation may be, for example, random cropping, scaling, rotation, affine transformation, contrast adjustment, random erasure, and so forth. The trained model has the capability of predicting the positions and the types of the license plate characters, as shown in fig. 13, the recognition results of all character pictures in the license plate number, namely SKF228Z, are obtained according to the relative position sorting of the license plate characters.
S8, voting the license plate recognition result based on a time axis: the steps comprise a license plate result voting mechanism based on a time axis, and a final license plate recognition result is generated by voting through counting license plate recognition results on the time axis. As shown in fig. 14, taking license plate recognition at a gate as an example, there is a time difference between a vehicle entering a camera picture and a vehicle exiting the camera picture, and assuming that a vehicle needs two seconds from entering the camera picture to leaving the lens picture, our license plate recognition model will continuously detect and recognize the vehicle and the license plate within this time, and assuming that we detect 6 frames of pictures within one second, we have detected 12 frames of pictures in this process, and have obtained 12 license plate recognition results, as shown in fig. 15, there may be a little false detection (results of light color fonts in 3 rd, 6 th, and 10 th) in this process, but because all statistical results of the current vehicle up to the current time axis will be voted, the correctness of the current recognition result can be finally ensured, and the real-time recognition result is shown in the first row of fig. 15, and is a correct PC56 5629Z.