CN113822278A

CN113822278A - License plate recognition method for unlimited scene

Info

Publication number: CN113822278A
Application number: CN202111384274.1A
Authority: CN
Inventors: 刘寒松; 王永; 王国强; 刘瑞; 曲妍
Original assignee: Sonli Holdings Group Co Ltd
Current assignee: Sonli Holdings Group Co Ltd
Priority date: 2021-11-22
Filing date: 2021-11-22
Publication date: 2021-12-21
Anticipated expiration: 2041-11-22
Also published as: CN113822278B

Abstract

The invention belongs to the technical field of license plate recognition, and relates to a license plate recognition method in an unlimited scene.

Description

License plate recognition method for unlimited scene

Technical Field

The invention belongs to the technical field of license plate recognition, relates to a method for recognizing a license plate in an unlimited scene, and particularly relates to a method for recognizing a license plate in an unlimited scene based on depth feature alignment of improved deformation convolution.

Background

With the rapid development of technologies such as artificial intelligence, internet of things and 5G, intelligent traffic plays an important role in intelligent cities, and license plate detection and recognition technology plays an important role in an intelligent traffic system. The traditional Chinese license plate detection method has the defects that the detection precision is greatly influenced by the environment, stronger robustness is difficult to be shown when complex scenes such as license plate distortion, rotation and the like are faced, the phenomenon of low detection precision often occurs, and the application requirements can not be met far away.

Most of the early license plate recognition algorithms are researched based on a machine learning algorithm, and the license plate is positioned and recognized by using manually selected features. In recent years, with the arrival of a big data era and the improvement of computer computing power, deep learning makes a major breakthrough in the direction of license plate recognition, and the positioning and recognition of license plates are newly developed due to the proposal of deep learning algorithms such as Faster R-CNN, YOLO and the like. The existing license plate recognition technology is mainly applied to specific environments such as toll parking lot entrances and exits, highway ETC channels and the like, under the condition that the front-view detection visual angle and the detection area are fixed, the accuracy rate of the license plate recognition technology can reach a very high level, but the recognition effect is poor under a complex scene.

In the situation that the license plate detection in a complex scene may cause the rotation or distortion of the license plate due to an oblique view, the conventional method adopts a convolutional neural network based on deep learning to extract features, and the method is mainly divided into two types: (1) a license plate detection method based on a horizontal rectangular frame introduces a large amount of background information when detecting a tilted or malformed license plate, so that subsequent license plate identification and positioning are inaccurate. (2) The method based on affine transformation divides the license plate detection into two steps of detection and license plate correction, firstly, the horizontal frame of the license plate is detected, then the license plate image cut by the horizontal frame is subjected to affine parameter learning, and finally correction is carried out to correct the image. Therefore, aiming at the unconstrained scene, the technical problem of low detection precision exists in the existing license plate detection and identification technology, and a more effective method for carrying out feature alignment modeling to realize accurate and effective license plate identification is urgently needed.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides a depth feature alignment unlimited scene license plate detection method based on improved deformation convolution, which is used for solving the problem of depth feature misalignment of an inclined license plate and a distorted license plate, can be used for a license plate detection and identification task of an unlimited scene, and can efficiently realize license plate detection and correction.

In order to achieve the above object, the convolution feature extracted through a backbone is an axis-aligned feature, then classification and regression position information of an anchor frame are respectively calculated by using two parallel branches, so as to obtain a candidate frame, the improved deformed convolution is used for aligning the convolution feature with a license plate feature according to the candidate frame self-adaptation, namely, a sampling point is deviated and is concentrated on the feature of a license plate area, and finally the aligned feature is used for positioning, wherein in order to align the convolution feature by the deformed convolution, an offset is calculated by using the position information of the candidate frame and the corresponding convolution feature, and then the offset and the axis-aligned feature are sent to the aligned convolution together, so as to extract the aligned feature, the method specifically comprises the following steps:

(1) and (3) data set construction: collecting images containing conventional, inclined and distorted license plates of scenes such as traffic monitoring, side parking lots and the like, constructing a data set not lower than 20000 license plates, marking the positions of four vertexes of the license plates, calculating the coordinates of a horizontal rectangular frame corresponding to the license plates according to the positions of the four vertexes, and dividing the data set into a training set (60%), a verification set (20%) and a test set (20%);

(2) deep convolution feature extraction: firstly, initializing the size and the numerical range of an image in the training set in the step (1), wherein the size of the image is 512 x 512, the numerical range is 0-1, inputting the processed image into a backbone network of a deep convolution network for convolution feature extraction, wherein the backbone network uses VGG16 as a feature extraction network, and a feature pyramid network is added after VGG16, and the feature pyramid network is used for strengthening and utilizing multi-scale features formed in VGG16 to obtain a multi-scale license plate convolution feature map set with stronger expressive force;

(3) high quality candidate box generation: the multi-scale license plate convolution feature map set obtained from the main network of the deep convolution network in the step (2) respectively uses two full-connection layer sub-networks with the same structure but without sharing parameters to learn classification and position information, so as to complete the tasks of classification and position regression of a target frame, wherein the classification is whether the license plate is the target or not, the position information is four vertex coordinates of the license plate, when the network is trained and tested in the subsequent steps, each feature point in the multi-scale license plate convolution feature map is only provided with one anchor frame for learning the position of the target, different thresholds are respectively set for classification scores in the training and testing processes to obtain 100 high-quality candidate frames, the classification score threshold is set to be 0.01 in the training process to achieve a better training effect, and the classification score threshold is set to be 0.1 in the testing process to achieve a faster reasoning speed;

(4) depth feature alignment: for each position on the feature map in the multi-scale license plate convolution feature map set output in the step (2)

The operation of the deformable convolution is on a regular grid of conventional convolutionsR

By adding an offset

Is expanded, thus in position

The calculation formula of (a) is as follows:

wherein

Is toRAn enumeration of the positions listed in (a),

in order to be the weights of the convolution,

as input features, here

Is the offset obtained by convolutional layer operation;

the coordinates of the high-quality candidate frame obtained in the step (3) are recorded as Poly1

WhereinPoly1Representing a location

High quality candidate box of (1), utilizingPoly1The maximum value and the minimum value of the abscissa and the maximum value and the minimum value of the ordinate are used for obtaining a minimum external horizontal rectangular framePoly2[

The high-quality candidate box represents the coordinates of the feature region of the license plate, and the minimum circumscribed horizontal rectangular box represents the feature region to be aligned; knowing the coordinates of the two regions by

Their affine transformation matrices can be calculatedMThus for each feature point of the feature map

Sample position based on anchor frame

Expressed as:

whereinkWhich represents the size of the convolution kernel,Srepresenting the step size of the feature map, with the modified deformable convolution at the location

Amount of deviation of

Expressed as:

then the obtained offset

Inputting the convolution characteristic graph obtained in the step (2) and the convolution characteristic graph into convolution, extracting alignment characteristics and forming improved deformable convolution; for each high quality candidate box, the sampling point is 9 points, an offset value of 18 dimensions is obtained, and the position is given by the method

The axis-aligned convolution feature of (a) is converted into a convolution feature based on any direction and attitude of the corresponding candidate frame;

(5) and (3) fine license plate recognition of feature alignment: refining and positioning the license plate position again through the alignment features obtained in the step (4), inputting the alignment features into a convolution layer of 3 ✖ 3, and then accessing two branches for classification and regression, wherein the classification is to judge whether the vehicle is a vehicle or notThe card regression is the coordinates of four vertexes of the license plate

Thus obtaining the accurate coordinate position of the license plate;

(6) and (3) correcting the position of the license plate: carrying out affine transformation on the license plate coordinate position obtained in the step (5) and the license plate coordinate with a preset size, and calculating an affine matrix through the affine transformation of the coordinate positions of the license plate coordinate position and the license plate coordinate with the preset size

Then will be

Acting on the license plate cut by coordinates from the original image to obtain a license plate image after recognition and correction;

(7) training a network structure to obtain trained model parameters: using images of the training set in the dataset, picture size 512

512

3, according to the batch size (B), inputting the batch size (B) into the deep convolution network in sequence, and inputting the whole network

And using the IOU threshold value as a measurement standard of a sample distribution strategy to output the classification confidence of the license plate

And the location of the regressive coordinate

Wherein Class is 2, namely whether the license plate is detected, N is the number of output predicted license plate targets, and 8 is the horizontal and vertical coordinates of four vertexes of the license plate; predicting the category and the real category by Focal loss calculation to obtain an error, and obtaining the loss by using Smooth L1Calculating the error between the predicted license plate position and the real license plate position, updating parameters through back propagation, saving model parameters with the best result on a training set after training iteration of a complete training set for set times (50 times), and taking the model parameters as final model trained parameters to obtain trained license plate recognition network parameters for testing of a testing set;

(8) testing a network and correcting a license plate: testing the license plate recognition network parameters in the test set, scaling (resize) the long edge of the image to 512 under the condition of keeping the proportion of the long edge and the short edge of the image unchanged, and filling the short edge of the image to ensure that the image size is 512

And 512, sequentially inputting the license plate classification confidence coefficient and the coordinate position of the license plate into a deep convolutional network, setting a threshold value to filter the license plate with low confidence coefficient, finally deleting redundant frames output by the network by using non-maximum suppression (NMS), and finally correcting the license plate by using the step (6).

The technology which is not disclosed in the invention adopts the prior art.

Compared with the prior art, the invention provides a depth feature alignment unlimited scene license plate recognition method based on improved deformation convolution, which uses a convolution neural network to generate a high-quality candidate frame, uses the improved deformation convolution to self-adaptively align the convolution features with the license plate features according to the candidate frame, is used for solving the problem that the depth features of an inclined license plate and a distorted license plate are not aligned, further obtains a corrected image by directly carrying out affine transformation on four vertex coordinates of the detected license plate, does not need to learn affine parameters, reduces the calculation consumption of feature repeated extraction, greatly improves the performance of the detection method based on the convolution neural network on a rotation and distortion target on the basis of increasing few operations, and is different from the common deformation convolution in that the offset of the improved deformation convolution is directly deduced from the candidate frame, the improved deformable convolution is added into the existing method, in CCPD license plate detection rotation (Rotate) data concentration, the recognition accuracy is improved from 94.7% to 98.2%, and meanwhile, a small amount of calculation is increased.

Drawings

FIG. 1 is a diagram of an improved deformable convolution module according to the present invention.

Fig. 2 is a diagram illustrating the overall network structure according to the present invention.

Fig. 3 is a flow chart of the license plate detection method provided by the invention.

FIG. 4 is a comparison of the license plate detection results provided by the present invention with other methods.

FIG. 5 is a comparison of another license plate detection result provided by the present invention with other methods.

Detailed Description

The invention will be further described by way of examples, without in any way limiting the scope of the invention, with reference to the accompanying drawings.

Example (b):

in this embodiment, a high-quality candidate frame is generated by using a convolutional neural network, and the convolution feature is adaptively aligned with the license plate feature according to the candidate frame by using an improved modified convolutional, so as to solve the problem of misalignment of the depth feature of an inclined or distorted license plate, as shown in fig. 1 to 3, the specific implementation includes the following steps:

(1) and (3) data set construction: collecting images containing conventional, inclined and distorted license plates of scenes such as traffic monitoring, side parking lots and the like, constructing a license plate data set containing 20000 images, labeling the positions of four vertexes of a license plate, calculating the coordinates of a horizontal rectangular frame corresponding to the license plate according to the positions of the four vertexes, and dividing the data set into a training set (60%), a verification set (20%) and a test set (20%);

(3) high quality candidate box generation: the multi-scale license plate convolution feature map set obtained from the main network of the deep convolution network in the step (2) respectively uses two full-connection layer sub-networks with the same structure but without sharing parameters to learn classification and position information, so as to complete the tasks of classification and position regression of a target frame, wherein the classification is whether the license plate is the target or not, the position information is four vertex coordinates of the license plate, each feature point in the feature map is only provided with one anchor frame for learning the position of the target in the training (step (7)) and the testing (step (8)), 100 high-quality candidate frames are obtained by respectively setting different threshold values for classification scores in the training and the testing, the classification score threshold value is set to be 0.01 in the training process to achieve a better training effect, and the classification score setting threshold value is set to be 0.1 in the testing process to achieve a faster reasoning speed;

(4) depth feature alignment: for each position on the output feature map (the feature map in the multi-scale license plate convolution feature map set output in the step (2))

By adding an offset

Is expanded, thus in position

The calculation formula of (a) is as follows:

wherein

Is toRAn enumeration of the positions listed in (a),

in order to be the weights of the convolution,

as input features, here

Is the offset obtained by the convolutional layer operation.

And (4) recording the coordinates of the high-quality candidate frame obtained in the step (3) asPoly1

WhereinPoly1Representing a location

Their affine transformation matrices can be calculatedMThus for each feature of the feature mapDot

Sample position based on anchor frame

Expressed as:

Amount of deviation of

Expressed as:

then the obtained offset

(5) and (3) fine license plate recognition of feature alignment: refining and positioning the license plate position again through the alignment features obtained in the step (4), inputting the alignment features into a convolution layer of 3 ✖ 3, and then accessing two branches for classification and regression, wherein the branchesThe class is to judge whether the license plate is the license plate or not, and the regression is the coordinates of four vertexes of the license plate

Thus obtaining the accurate coordinate position of the license plate;

Then will be

512

3, according to the batch size (B), sequentially inputting the batch size (B) into the deep convolutional network (step (2) -step (5)), and inputting the whole network

And the location of the regressive coordinate

Wherein Class is 2, namely whether the license plate is detected, N is the number of output predicted license plate targets, and 8 is the horizontal and vertical coordinates of four vertexes of the license plate; using the Focal loss calculationMeasuring the category and the real category to obtain an error, calculating the error between the predicted license plate position and the real license plate position by adopting Smooth L1 loss, updating parameters through back propagation, saving model parameters with the best result on a training set after training iteration of a complete training set for set times (50 times), and taking the model parameters as final model trained parameters to obtain trained license plate recognition network parameters for testing the testing set;

In the embodiment, the license plate recognition result is compared with the license plate recognition result based on the horizontal frame detection method in the prior art, different license plate recognition results are shown in fig. 4 and fig. 5, wherein the left side image is the recognition result in the prior art, the right side image is the recognition result of the method in the embodiment, and finding out the recognition result can find out the result, the method for detecting the license plate in the unlimited scene based on the depth feature alignment of the improved deformed convolution generates a high-quality candidate frame by using the convolutional neural network, aligns the convolution feature with the license plate feature in a self-adaptive manner according to the candidate frame by using the improved deformed convolution, solves the problem that the depth feature of the inclined or distorted license plate is not aligned, and can efficiently realize the license plate detection and correction.

The technologies not disclosed in this embodiment are all the prior art.

It is noted that the disclosed embodiments are intended to aid in further understanding of the invention, but those skilled in the art will appreciate that: various substitutions and modifications are possible without departing from the spirit and scope of the invention and appended claims. Therefore, the invention should not be limited to the embodiments disclosed, but the scope of the invention is defined by the appended claims.

Claims

1. A license plate recognition method without a limited scene is characterized by comprising the following steps:

1) and (3) data set construction: collecting images containing conventional, inclined and distorted license plates of a parking lot with traffic monitoring and side positions, constructing a data set, marking the positions of four vertexes of the license plate, calculating the coordinates of a horizontal rectangular frame corresponding to the license plate according to the positions of the four vertexes, and dividing the data set into a training set, a verification set and a test set;

2) deep convolution feature extraction: firstly, initializing the size and numerical range of an image in a training set in the step 1), wherein the size of the image is 512 x 512, the numerical range is 0-1, inputting the processed image into a backbone network of a deep convolution network for convolution feature extraction, wherein the backbone network uses VGG16 as a feature extraction network, and a feature pyramid network is added after VGG16, and the feature pyramid network is used for strengthening and utilizing multi-scale features formed in VGG16 to obtain a multi-scale license plate convolution feature map set with stronger expressive force;

3) high quality candidate box generation: respectively learning classification and position information by using two full-connection layer sub-networks with the same structure but without shared parameters from a multi-scale license plate convolution feature map set obtained from the backbone network of the deep convolution network in the step 2), thereby completing tasks of classification and position regression of a target frame, wherein the classification is whether the license plate is present, and the position information is four vertex coordinates of the license plate;

4) depth feature alignment: for each position on the characteristic diagram in the multi-scale license plate convolution characteristic diagram set obtained in the step 2)

The operation of the deformable convolution is in the regular net of the conventional convolutionGrid (C)R

By adding an offset

Is expanded, thus in position

The calculation formula of (a) is as follows:

wherein

Is toRAn enumeration of the positions listed in (a),

in order to be the weights of the convolution,

as input features, here

Is the offset obtained by convolutional layer operation;

the coordinates of the high-quality candidate box obtained in the step 3) are recorded as Poly1

WhereinPoly1Representing a location

Sample position based on anchor frame

Expressed as:

Amount of deviation of

Expressed as:

then the obtained offset

Inputting the feature map and the convolution feature map obtained in the step 2) into convolution, extracting alignment features,forming an improved deformable convolution; for each high quality candidate box, the sampling point is 9 points, an offset value of 18 dimensions is obtained, and the position is given by the method

5) and (3) fine license plate recognition of feature alignment: refining and positioning the license plate position again through the alignment features obtained in the step 4), inputting the alignment features into a convolution layer of 3 ✖ 3, and then accessing two branches for classification and regression, wherein the classification is to judge whether the license plate is the license plate, and the regression is the coordinate of four vertexes of the license plate

Thus obtaining the accurate coordinate position of the license plate;

6) and (3) correcting the position of the license plate: carrying out affine transformation on the license plate coordinate position obtained in the step 5) and the license plate coordinate with the preset size, and calculating an affine matrix through the affine transformation of the coordinate positions of the license plate coordinate position and the license plate coordinate with the preset size

Then will be

7) training a network structure to obtain trained model parameters: using images of the training set in the dataset, picture size 512

512

3, according to the batch size (B), inputting the batch size (B) into the deep convolution network in sequence, and outputting the whole networkInto

And the location of the regressive coordinate

Wherein Class is 2, namely whether the license plate is detected, N is the number of output predicted license plate targets, and 8 is the horizontal and vertical coordinates of four vertexes of the license plate; calculating a prediction type and a real type by adopting Focal loss to obtain errors, calculating the errors of the predicted license plate position and the real license plate position by adopting Smooth L1 loss, updating parameters through back propagation, saving model parameters with the best results on a training set after setting 50 times of training iterations of the complete training set, and taking the model parameters as final model trained parameters to obtain trained license plate recognition network parameters for testing the testing set;

8) testing a network and correcting a license plate: testing the trained license plate recognition network parameters in a test set, scaling the long edge of the image to 512 under the condition of keeping the proportion of the long edge and the short edge of the image unchanged, and filling the short edge of the image to ensure that the image has the size of 512

And 512, sequentially inputting the license plate classification confidence coefficient and the coordinate position of the license plate into a deep convolutional network, setting a threshold value to filter the license plate with low confidence coefficient, finally using a frame which is not greatly inhibited and deletes the redundancy output by the network, and finally using the step 6) to correct the license plate.

2. The unlimited scene license plate recognition method of claim 1, wherein the data set in step 1) is not less than 20000 license plate images, wherein the training set accounts for 60%, the verification set accounts for 20%, and the test set accounts for 20%.