CN111325203A

CN111325203A - American license plate recognition method and system based on image correction

Info

Publication number: CN111325203A
Application number: CN202010069950.5A
Authority: CN
Inventors: 林立雄; 何洪钦; 黄国辉; 赖嘉弘; 陈睿函; 何炳蔚; 陈彦杰
Original assignee: Fuzhou University
Current assignee: Fuzhou University
Priority date: 2020-01-21
Filing date: 2020-01-21
Publication date: 2020-06-23
Anticipated expiration: 2040-01-21
Also published as: CN111325203B

Abstract

The invention relates to an American license plate recognition method and system based on image correction, wherein a text detection module, an image correction module, a text recognition module and a text classification module comprise the following steps: preprocessing an image file of a data set, enhancing the data, and generating a training set and a test set; designing a text detection module, detecting text information in the image, realizing the segmentation of the text and the background in the image, and obtaining a text image only containing the text information; adopting an image correction module to correct the text image, and converting the originally distorted or inclined text image into a horizontal direction; identifying the corrected text image to obtain information such as letters, numbers and the like contained in the text image; and constructing a text classification module, and screening the license plate number, the continent name and the annual inspection date from all the text information to finish license plate identification. The method solves the problems of complex background patterns, fuzzy target text image deformation, complex text information and large calculation amount during off-line training by using a neural network during American license plate recognition.

Description

American license plate recognition method and system based on image correction

Technical Field

The invention relates to the field of image recognition and artificial intelligence, in particular to an American license plate recognition method and system based on image correction.

Background

The license plate recognition technology is an important technology in smart city traffic construction, and accuracy and instantaneity are two important indexes for measuring the license plate recognition technology. The license plate image belongs to a natural scene image, is easily influenced by light when being collected, and can cause inclination, motion blur and fading and stain of the license plate due to different shooting angles, and the accuracy of license plate information identification can be directly influenced by uncertain interference. Different from the domestic license plates with uniform patterns of blue-bottom white characters, yellow-bottom black characters, white-bottom black characters and the like, the American license plates contain complex background patterns, the text fonts and styles are different, the license plate designs of different states are completely different, and each license plate can be customized individually, so that the application of the common license plate recognition technology to American license plate recognition is infeasible, and great difficulty and challenge are brought to the American license plate recognition.

Text in images of natural scenes contains high-level semantic information that is important for analyzing and understanding the scene. With the rapid spread of video cameras and computers, images with text information can be acquired in large quantities in daily life. Therefore, scene text recognition has been the active research topic in computer vision, and its related applications include image retrieval, video security, intelligent transportation, human-computer interaction, etc. The license plate recognition method mainly relates to two aspects of text detection and text recognition, wherein the text detection aims to determine the position of a text according to an input image, and the position is usually represented by a bounding box. Text recognition aims at converting image areas containing text into machine-readable character strings.

The traditional license plate recognition method comprises the step of positioning a license plate region according to texture feature analysis, color features and the intensity degree of image edge change. The recent schemes for license plate recognition using neural networks mainly include OpenALPR, easy pr, HyperLPR, and the like. Although the license plate recognition technology is greatly developed, the license plate recognition technology in China is mainly designed for domestic standard license plates at present, no related technology is available for recognizing complicated American license plates, and the American license plate recognition still has difficulty. The difficulty is mainly embodied in the following aspects that firstly, the pattern background of the American license plate is complex, and the acquired image quality is low due to the influence of weather such as illumination, fog, sand storm and the like, the American license plate has a long service life, and a part of target images are covered by background images due to the fact that stains or rusts are generated on texts on the license plate, and the background patterns and the texts are easily mixed when text detection is carried out; secondly, the English font of the American license plate comprises a printed form and a handwritten form, the style is not unified, the problems of inclination, bending and perspective deformation of the acquired license plate image exist, and the accuracy rate of license plate character recognition is reduced; finally, when the characters of the license plate are recognized, English letters and numbers need to be recognized in a mixed mode, text information on the license plate is complex and comprises license plate names, continent names, annual inspection dates, publicity slogans and the like, the license plate names, the continent names and the annual inspection dates need to be screened out from all the text information, and the years and the months of the annual inspection dates need to be pasted on the license plate separately, so that the years and the months need to be distinguished when the annual inspection dates are judged.

Disclosure of Invention

In view of the above, the present invention provides an american license plate recognition method and system based on image correction, so as to solve the problems of complex background patterns, fuzzy target text image deformation, complex text information, and large calculation amount during offline training using a neural network during the american license plate recognition.

The invention is realized by adopting the following scheme: an American license plate recognition method based on image correction comprises the following steps:

step S1: preprocessing a data set: cleaning and screening an original American license plate image data set, labeling the image, wherein the labeled content comprises a text box and text information in the text box, and dividing the labeled image into a training set and a test set for neural network training and experimental testing;

step S2: constructing a text detection module based on a convolutional neural network and a cyclic neural network: extracting the characteristics of the training set image by using a convolutional neural network, converting the characteristic image into a characteristic sequence by using sliding convolution, and designing an anchor point mechanism to detect the result of each sliding convolution; obtaining a plurality of groups of continuous text boxes by comparing the lengths of the text boxes in the x-axis direction and the y-axis direction, and sequencing the text boxes by adopting a recurrent neural network to obtain text box sequence information, wherein each group of text boxes is a predicted text image;

step S3: the predicted text image has bending and inclining conditions, an image correction module is designed to correct the image, the module comprises a line fitting transformation, the line fitting transformation models a middle line of a scene text and a group of line segments vertical to the middle line of the text through a polynomial, and the direction and the boundary of the text are estimated; using an iterative correction network to learn line segment equation parameters, and adjusting the text image to the horizontal direction;

step S4: loading the text image corrected in the step S3, inputting the text image into a text recognition module, and recognizing the text image by the text recognition module through a feature extraction layer, a sequence regression layer and a transcription layer to obtain text information;

step S5: the text classification module is adopted to classify the text information, and because each predicted text information corresponds to a text box one by one, the license plate number, the continent name and the annual inspection date in the text information can be judged according to the length, the width and the position information of the text box.

Further, the step S1 specifically includes the following steps:

step S11: carrying out data screening on an original data set, removing fuzzy images with text information missing, carrying out frame selection marking on the screened images by using LabelImg marking software, wherein marking contents comprise 4 endpoint coordinates of the frame selection text images and text information in a text box, storing an image sequence number, the endpoint coordinates and the text information into a txt file, and generating a training set and a testing set, wherein the training set accounts for 2/3, the testing set accounts for 1/3, and the training set and the testing set both comprise license plate images and txt files;

step S12: scaling original image pixels in a training set to a specified size by a bilinear interpolation method;

step S13: randomly and horizontally turning the training set images;

step S14: carrying out random angle rotation on the training set images;

step S15: carrying out image brightness, contrast and saturation transformation on monocular original images in the training set;

step S16: the training set and test set files are converted into LMDB (Lightning Memory-mapped database) data to improve the file reading speed.

Further, the step S2 specifically includes the following steps:

step S21: when the text detection module based on the convolutional neural network and the cyclic neural network is constructed, the MobileNet V2 is used as the convolutional neural network to extract the features of the image; the conventional convolution is replaced with a depth-separable convolution using, in order, a 1 × 1 point convolution, a ReLU6 activation function, a 3 × 3 depth convolution, a ReLU6 activation function, a 1 × 1 point convolution, a linear activation function, a loss function L (e) of a convolutional neural network_i,g_j,r_k) Comprises the following steps:

wherein the content of the first and second substances,

3 loss functions, e, for calculating text/non-text scores, coordinates, and boundary refinements, respectively_iRepresenting the probability of correct prediction of the ith anchor, e^* _iFor the true value, take 0-1, j is the index of anchor i on the y-axis, k is the index of anchor i on the x-axis, g_jAnd g^* _jPredicted value and true value in y-axis direction of ith anchor, r_kAnd r^* _kPredicted value and true value in x-axis direction, λ, of the ith anchor₁And λ₂As a weight, N_e、N_gAnd N_rIs a normalized parameter;

step S22: performing convolution on the obtained feature map by adopting 3-by-3 sliding convolution; step length is 1, channel number is 512, the feature map is converted into a feature sequence through sliding convolution, and the result of each convolution is sequentially input into an anchor point mechanism for text box matching;

step S23: the anchor point mechanism is a text detection box with fixed length in the x-axis direction and variable length in the y-axis direction; and designing k anchor points, and then converting the length in the y-axis direction into the length in the y-axis direction:

y＝ck,y∈(11,273)

c is a constant, and text box matching is carried out on each text box sequence through k value transformation to form a plurality of text boxes in sequence;

the anchor point mechanism firstly uses the matching of text detection boxes to obtain the text boxes of all texts in the image in sequence for the result of each sliding convolution, then judges whether the adjacent text boxes belong to the same text line from the x-axis direction by comparing the distance b between the adjacent text boxes, and if b is less than 50, the adjacent text boxes are considered to belong to the same text line; in the y-axis direction, setting the height of the first text box as an initial value, matching the text boxes to obtain a k value and a height h, and if the height of the subsequently detected text box is within the range of (0.9h,1.1h), determining that the text boxes belong to the same text line; and combining a plurality of text boxes of the same type into a group of text boxes through simultaneous positioning in the x-axis direction and the y-axis direction to obtain a plurality of groups of text patterns only containing text information.

Step S24: the circulating neural network adopts a deep-stacked bidirectional LSTM (Bi-directional Long short-Term Memory, BLSTM) network, and text boxes in the text patterns are treated as sequences to obtain sequence information of each text box; the recurrent neural network is a 256-dimensional bidirectional LSTM network, comprises 2 128-dimensional LSTM networks, and obtains a plurality of groups of text box patterns with sequence information through a full connection layer after characteristics are input into the BLSTM network.

Further, the step S3 specifically includes the following steps:

step S31: the line fitting transformation models the position of the image text to correct image perspective and curvature deformation; the fitting line segment takes the image central point as an origin, an x-y coordinate system is established, the fitting line segment comprises two parts, the first part is a polynomial fitting the text centerline in the horizontal direction, the polynomial is expressed by using a K-order polynomial, and the fitting line segment comprises K +1 parameters:

f₁(x)＝a_K*x^K+a_K-1*x^K-1+…+a₁*x+a₀

the second part is a normal line of the horizontal middle line and a boundary line segment vertical to the horizontal middle line, wherein the second part comprises L line segments, 3L parameters are represented as follows:

f₂(x)＝b₁,l*x+b₀,l|r_l,l＝1,2,…,L

wherein r is_lRepresenting the length of lines on both sides of the line in the text; wherein, the middle line of the model is the text direction, and the line segment perpendicular to the middle line is the boundary;

step S32: the iterative correction network comprises 5 layers, the first layer of Block1 uses convolution kernels with the size of 3 × 3, the number of output channels is 32, the step size is 2, the second layer of Block2 uses convolution kernels with the size of 3 × 3, the number of output channels is 64, the step size is 2, the third layer of Block3 uses convolution kernels with the size of 3 × 3, the number of output channels is 128, the step size is 2, the fourth layer of FC4 fully-connected layer has the number of output channels of 512, the fifth layer of FC5 fully-connected layer has the number of output channels of the parameter number of line fitting transformation, namely 3L + K + 1;

step S33: by obtaining the parameters of the fitted line segment in step S32, the coordinates P of the end points of L line segments perpendicular to the center line of the text may be determined as [ t ═ t [ ([ t ])₁,t₂,…,t_2L]^TUsing thin platesThe corrected endpoint coordinate P ' ═ t ' is calculated by Spline interpolation (TPS) '₁,t'₂,…,t'_2L]^TThen, the parameters of the thin-plate spline interpolation are expressed as:

wherein S ═ U (t-t)₁),U(t-t₂),…,U(t-t_2L)]^T,U(r)＝r²logr²For each pixel point t in the original image, obtaining a corrected pixel point t 'through TPS transformation, namely t is C.t';

step S34: obtaining corrected segment end points through the step S33, learning a mapping relation from the corrected end points to the original image by using a sampler, and continuously iterating the step S33 in the training process; the sampler is completely differentiable, the fitting line segment does not need manual marking, and the sampler is trained through the image gradient of the back propagation text recognition module to finish the correction of the text image.

Further, the specific content of step S4 is:

the feature extraction layer uses a ResNet-50 encoder, the number of the convoluted channels of the last layer of the encoder is 512, the size of a convolution kernel is 3 x 3, the step length is 1, maxporoling operation is carried out, an image with the size of (32,100,3) is converted into a feature map with the size of (1,25,512), the convolution operation with the number of the channels of 512, the size of the convolution kernel is 2 x 2 and the step length is 1 is used again, and the feature map is cut into feature sequences according to columns and input into the next layer;

the sequence regression layer uses a deep stacking bidirectional LSTM network, the obtained feature sequences are input into the deep stacking bidirectional LSTM network, the BLSTM network comprises 2 groups of 256-dimensional LSTM networks, and each group of feature sequences are subjected to forward and reverse sequencing to obtain sequence information of each group of features;

the transcription layer adopts connection time sequence Classification (CTC) as conditional probability, uses negative log-likelihood probability as training loss, enables sequence information obtained by the sequence regression layer to correspond to pixels in text information of each frame of image one by one, and finally obtains predicted text information.

Further, the specific content of step S5 is: a text detection and image correction module predicts to obtain multiple groups of detection frames with sequence information, each detection frame is marked as box, and the height of the maximum detection frame in the y-axis direction is determined as box_maxFor height value of (box)_max-20，box_max) The detection frames are used as license plate candidate frames, then, the detection frames are arranged from small to large by using the corresponding x-axis coordinate position, the front-back relation of the license plate numbers and the letters is determined, and the coordinates are small in front; recognizing the text in the frame as the license plate number by adopting a text recognition module; calculating the editing distances between the texts in the box except the license plate candidate frame and 50 continents, wherein the continent with the minimum matching distance is the continent name; and detecting numbers in the rest boxes, automatically filling the numbers into four digits, selecting the largest annual inspection date if the numbers are in the range of 1950 and 2019, otherwise, selecting the largest annual inspection date as 0, and finally outputting the license plate number, the continent name and the annual inspection date.

Furthermore, the invention provides an American license plate recognition system based on image correction, which comprises a text detection module, an image correction module, a text recognition module and a text classification module; the text detection module is used for detecting the text information in the preprocessed image, so that the text and the background in the image are segmented, and a text image only containing the text information is obtained; the image correction module is used for correcting the text image and converting the originally distorted or inclined text image into a horizontal direction; the text recognition module is used for recognizing the corrected text image to obtain the information of letters and numbers contained in the text image; the text classification module is used for screening out license plate numbers, continent names and annual inspection dates from all text information to finish license plate identification.

Compared with the prior art, the invention has the following beneficial effects:

(1) due to the fact that the background pattern of the American license plate is complex, the target text image is deformed and fuzzy, and text information is complex and difficult to recognize, the method realizes end-to-end American license plate recognition by combining a deep learning method and an image recognition technology.

(2) The invention adopts the lightweight network structure design, combines the convolutional neural network and the cyclic neural network, converts the text detection and the text recognition into sequence processing, improves the American license plate recognition speed and precision, and can be used for real-time license plate recognition.

(3) The invention uses the image correction module to correct the license plate with inclination and curvature transformation, and iteratively fits the text central line through line fitting transformation, thereby realizing text correction and improving the identification precision of the inclined deformation text.

(4) The text classification module of the invention classifies according to the position and size characteristics of the license plate, screens useful information, completes the identification of the license plate number, the continent name and the annual inspection date of the American license plate, and ensures that the identification is more accurate and complete.

Drawings

Fig. 1 is a schematic overall flow chart of an embodiment of the present invention.

Fig. 2 is a schematic structural diagram of a neural network for recognizing american license plates according to an embodiment of the present invention.

Fig. 3 is a schematic diagram of a text detection neural network according to an embodiment of the present invention.

FIG. 4 is a diagram of an image-corrected neural network according to an embodiment of the present invention.

Fig. 5 is a schematic diagram of a text recognition neural network according to an embodiment of the present invention.

Fig. 6 is a schematic diagram of an american type license plate recognition effect according to an embodiment of the present invention.

Detailed Description

The invention is further explained below with reference to the drawings and the embodiments.

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

As shown in fig. 1, the present embodiment provides an american license plate recognition method based on image correction, including the following steps:

In this embodiment, the step S1 specifically includes the following steps:

step S13: randomly and horizontally turning the training set images;

step S14: carrying out random angle rotation on the training set images;

In this embodiment, the step S2 specifically includes the following steps:

wherein the content of the first and second substances,

y＝ck,y∈(11,273)

In this embodiment, the step S3 specifically includes the following steps:

f₁(x)＝a_K*x^K+a_K-1*x^K-1+…+a₁*x+a₀

f₂(x)＝b₁,l*x+b₀,l|r_l,l＝1,2,…,L

step S33: by obtaining the parameters of the fitted line segment in step S32, the coordinates P of the end points of L line segments perpendicular to the center line of the text may be determined as [ t ═ t [ ([ t ])₁,t₂,…,t_2L]^TThe corrected endpoint coordinate P ' ═ t ' is calculated using Thin Plate Spline interpolation (TPS) '₁,t'₂,…,t'_2L]^TThen, the parameters of the thin-plate spline interpolation are expressed as:

In this embodiment, the specific content of step S4 is:

In this embodiment, the specific content of step S5 is: a text detection and image correction module predicts to obtain multiple groups of detection frames with sequence information, each detection frame is marked as box, and the height of the maximum detection frame in the y-axis direction is determined as box_maxFor height value of (box)_max-20，box_max) The detection frames are used as license plate candidate frames, then, the detection frames are arranged from small to large by using the corresponding x-axis coordinate position, the front-back relation of the license plate numbers and the letters is determined, and the coordinates are small in front; recognizing the text in the frame as the license plate number by adopting a text recognition module; calculating the editing distances between the texts in the box except the license plate candidate frame and 50 continents, wherein the continent with the minimum matching distance is the continent name; and detecting numbers in the rest boxes, automatically filling the numbers into four digits, selecting the largest annual inspection date if the numbers are in the range of 1950 and 2019, otherwise, selecting the largest annual inspection date as 0, and finally outputting the license plate number, the continent name and the annual inspection date.

Preferably, the embodiment further provides an american license plate recognition system based on image correction, which includes a text detection module, an image correction module, a text recognition module and a text classification module; the text detection module is used for detecting the text information in the preprocessed image, so that the text and the background in the image are segmented, and a text image only containing the text information is obtained; the image correction module is used for correcting the text image and converting the originally distorted or inclined text image into a horizontal direction; the text recognition module is used for recognizing the corrected text image to obtain the information of letters and numbers contained in the text image; the text classification module is used for screening out license plate numbers, continent names and annual inspection dates from all text information to finish license plate identification.

Preferably, the present embodiment includes the following design points: 1) preprocessing an image file of a data set to generate a training set and a test set, and performing data enhancement; 2) designing a text detection module, detecting text information in the image, realizing the segmentation of the text and the background in the image, and obtaining a text image only containing the text information; 3) adopting an image correction module to correct the text image, and converting the originally distorted or inclined text image into a horizontal direction; 4) identifying the corrected text image to obtain information such as letters, numbers and the like contained in the text image; 5) and constructing a text classification module, and screening out the license plate number, the continent name and the annual inspection date from all the text information to finish license plate identification.

Preferably, in this embodiment, the original image is characterized by using a lightweight convolutional neural network mobilonetv 2, the feature sequence is input into a cyclic neural network, and an anchor point mechanism is used to connect text boxes, so as to form a final text line and complete text detection; for the text with inclination and curvature transformation, an image correction module is adopted to correct the text image, and the originally distorted or inclined text image is converted into the horizontal direction; identifying the corrected text image through a text identification module to obtain information such as letters, numbers and the like contained in the text image; and finally, constructing a text information classification module, and screening out the license plate number, the continent name and the annual inspection date from all the text information to finish the license plate identification. The method combines a deep learning method, inputs American license plate images into a trained network model, can finish the identification of license plate numbers, continent names and annual inspection dates, has high accuracy and strong robustness, can well detect and identify multi-direction and multi-size text objects in the license plate images with complex backgrounds, has small model parameters, and can be used for real-time license plate detection.

Preferably, the specific application example of the present embodiment is as follows:

1) preprocessing a data set, cleaning and screening an original American license plate, labeling images, dividing the labeled images into a training set and a testing set for neural network training and experimental testing, and specifically comprising the following steps:

1-1) carrying out data screening on an original data set, removing pictures with fuzzy and text information missing, carrying out framing and labeling on the images by using LabelImg labeling software, wherein labeling contents comprise 4 endpoint coordinates of the framing and labeling text images and text information in a text box, storing an image serial number, the endpoint coordinates and the text information into a txt file, and generating a training set and a testing set, wherein the training set comprises 4000 pictures, and the testing set comprises 2000 pictures and 6000 pictures in total;

1-2) adjusting original images in a training set to be uniform in size, wherein the resolution is 800 x 400;

1-3) randomly and horizontally turning over the images in the training set, wherein the turning over probability is 0.5;

1-4) carrying out random angle rotation on the training set image, wherein the value range of the rotation angle is (-5 degrees and 5 degrees);

1-5) carrying out image brightness, contrast and saturation transformation on monocular original images in a training set, wherein values are 0.4, 0.4 and 0.4 respectively; (ii) a

1-6) converting the training set and test set files into LMDB (Lighting Memory-Mapped Database) data, and improving the file reading speed.

2) As shown in FIG. 2, an unsupervised convolutional neural network structure for American license plate recognition is designed, the network comprises four units of text detection, image correction, text recognition and text classification, the whole neural network completes feature extraction, text detection and character recognition of images, and the unsupervised convolutional neural network structure is an end-to-end unsupervised learning process.

As shown in fig. 3, a text detection module based on a convolutional neural network and a cyclic neural network is constructed, continuous text lines, i.e., predicted text images, are obtained from an input american license plate image, and the text image is framed and selected, specifically including the following steps:

2-1) the text detection module uses the MobileNet V2 as a convolution neural network to extract the features of the image, and obtains a feature map from the image. MobileNet V2 as a lightweight network, replaces the traditional convolution with a deep separable convolution that uses, in order, a 1 x 1 point convolution, a ReLU6 activation function, a 3 x 3 deep convolution, a ReLU6 activation function, a 1 x 1 point convolution, a linear convolutionActivation function, loss function L (e) of convolutional neural network_i,g_j,r_k) Comprises the following steps:

wherein

3 loss functions, e, for calculating text/non-text scores, coordinates, and boundary refinements, respectively_iRepresenting the probability of correct prediction of the ith anchor, e^* _iFor the true value, take 0-1, j is the index of anchor i on the y-axis, k is the index of anchor i on the x-axis, g_jAnd g^* _jPredicted value and true value in y-axis direction of ith anchor, r_kAnd r^* _kPredicted value and true value in x-axis direction, λ, of the ith anchor₁And λ₂As weights, take 1.0 and 2.0, respectively, N_e、N_gAnd N_rThe values are 128, 20 and 32 respectively for the normalization parameter;

2-2) performing convolution on the obtained feature graph by adopting 3-by-3 sliding convolution, wherein the step length is 1, the number of channels is 512, converting the feature graph into a feature sequence through the sliding convolution, and sequentially inputting the result of each convolution into an anchor point mechanism for text box matching;

2-3) the anchor point mechanism is a text detection box with fixed length in the x-axis direction and variable length in the y-axis direction. And designing k anchor points, and then converting the length in the y-axis direction into the length in the y-axis direction:

y＝ck,y∈(11,273)

and c is a constant, and text box matching is carried out on each text box sequence through k value transformation to form a plurality of text boxes in sequence. In the examples, the value of k is 20 and the value of c is 13.1.

The anchor point mechanism firstly uses the matching of text detection boxes to obtain the text boxes of all texts in the image in sequence for the result of each sliding convolution, then judges whether the adjacent text boxes belong to the same text line from the x-axis direction by comparing the distance b between the adjacent text boxes, and if b is less than 50, the adjacent text boxes are considered to belong to the same text line; and in the y-axis direction, setting the height of the first text box as an initial value, matching the text boxes to obtain a k value and a height h, and if the height of the subsequently detected text box is in the range of (0.9h,1.1h), determining that the text boxes belong to the same text line. Combining a plurality of text boxes of the same type into a group of text boxes through simultaneous positioning in the x-axis direction and the y-axis direction to obtain a plurality of groups of text patterns only containing text information;

and 2-4) adopting a deep-stacking bidirectional Short-term memory (BLSTM) network by the recurrent neural network, and treating text boxes in the text pattern as sequences to obtain sequence information of each text box. The cyclic neural network is a 256-dimensional bidirectional LSTM network and comprises 2 128-dimensional LSTM networks, and after characteristics are input into the BLSTM network, a plurality of groups of text box patterns with sequence information are obtained through a full-connection layer with 512 channels;

3) as shown in fig. 4, the predicted text image has a curve and a tilt, and an image correction module is designed to correct the image, where the image correction module includes a line fitting transformation, the line fitting transformation models a central line of a scene text by a polynomial, and estimates a direction and a boundary of the text by using a set of line segments perpendicular to the central line of the text, and adjusts the text image to a horizontal direction by using iterative correction network learning line segment equation parameters, and the specific steps are as follows:

3-1) line fitting transformation is first used to model the position of the image text to correct perspective and curvature distortion. The fitting line segment takes the image central point as an origin, an x-y coordinate system is established, the fitting line segment comprises two parts, the first part is a polynomial fitting the text centerline in the horizontal direction, the polynomial is expressed by using a K-order polynomial, and the fitting line segment comprises K +1 parameters:

f₁(x)＝a_K*x^K+a_K-1*x^K-1+…+a₁*x+a₀

the second part is a normal line of the horizontal middle line and a boundary line segment perpendicular to the horizontal middle line, wherein the second part comprises L line segments, 3L parameters are represented as follows:

f₂(x)＝b₁,l*x+b₀,l|r_l,l＝1,2,…,L

wherein r is_lThe length of the line segment on both sides of the line in the text is shown, in the embodiment, K is 4, and L is 20;

and 3-2) after the position modeling of the image text is completed, adjusting and optimizing the fitted line segment by using an iterative correction network. The number of output channels of the first layer of Block1 is 32, the step size is 2, the second layer of Block2 uses a convolution kernel with the size of 3 × 3, the number of output channels is 64, the step size is 2, the third layer of Block3 uses a convolution kernel with the size of 3 × 3, the number of output channels is 128, the step size is 2, the fourth layer of FC4 is fully connected, the number of output channels is 512, the fifth layer of FC5 is fully connected, the number of output channels is the number of parameters of line fitting transformation, namely 3L + K +1, and the iterative correction time n is 5;

3-3) obtaining the parameters of the fitted line segment through the steps, and determining the endpoint coordinate P ═ t of L line segments vertical to the center line of the text₁,t₂,…,t_2L]^TThe corrected endpoint coordinate P ' ═ t ' is calculated using thin plate Spline interpolation (TPS) '₁,t'₂,…,t'_2L]^TThen the parameters of the thin-plate spline interpolation can be expressed as:

wherein S ═ U (t-t)₁),U(t-t₂),…,U(t-t_2L)]^T,U(r)＝r²logr²For each pixel point t in the original image, a corrected pixel point t 'can be obtained through TPS transformation, namely t is equal to C.t';

3-4) obtaining the corrected segment end points through the step S32, learning the mapping relation from the corrected end points to the original image by using a sampler, and continuously iterating the step 3-3) in the training process. The sampler is completely differentiable, so that the fitted line segment does not need manual marking, and the sampler is trained through the image gradient of the back propagation text recognition module to finish the correction of the text image;

4) and loading the text image corrected in the step 3), and inputting the text image into a text recognition module. As shown in fig. 5, the text recognition module recognizes the text image through the feature extraction layer, the sequence regression layer, and the transcription layer, and obtains text information.

4-1) a ResNet-50 encoder is used by a feature extraction layer of the text recognition module, the number of convoluted channels of the last layer of the encoder is 512, the size of a convolution kernel is 3 x 3, the step length is 1, maxporoling operation is carried out, an image with the size of (32,100,3) is converted into a feature map with the size of (1,25,512), the convolution operation with the number of channels of 512, the size of the convolution kernel is 2 x 2 and the step length is 1 is used again, and the feature map is cut into feature sequences according to columns and input into the next layer;

4-2) inputting the obtained feature sequences into a deep-stacking bidirectional LSTM network, wherein the BLSTM network comprises 2 groups of 256-dimensional LSTM networks, and performing forward and reverse sequencing on each group of feature sequences to obtain sequence information of each group of features;

4-3) the transcription layer adopts connection time sequence Classification (CTC) as conditional probability, uses negative log-likelihood probability as training loss, enables sequence information obtained by the sequence regression layer to correspond to pixels in text information of each frame of image one by one, and finally obtains predicted text information;

5) the text classification module is adopted to classify the text information, and because each predicted text information corresponds to the text box one by one, the license plate number, the continent name and the annual inspection date in the text information can be judged according to the length, the width, the position and the like of the text box.

The text recognition module predicts a plurality of groups of detection boxes and marks the detection boxes as box, and the height of the maximum detection box in the y-axis direction is determined as box_maxFor height value of (box)_max-20，box_max) The detection frame is used as a license plate candidate frame, then the front-back relation of the license plate is determined through coordinates, and a text recognition module is adopted to recognize that text in the frame is the license plate number; calculating the editing distances between the texts in the box except the license plate candidate frame and 50 continents, wherein the continent with the minimum matching distance is the continent name; detecting the number in the rest box and automatically filling the number into four digits, if the number is in the range of 1950 and 2019, selecting the largest annual inspection date, if the number is not 0, and finally outputting the carBrand, continent name and annual inspection date.

The experimental result is shown in fig. 6, the american type license plate recognition method based on image correction provided in this embodiment can perform quick and accurate recognition on the american type license plate through algorithm optimization and model improvement, and the verification result using the test set shows that the overall recognition rate of the license plate and the continent name in this embodiment reaches over 90%, and the recognition rate of the annual check date is only 10% due to large format and position difference of the annual check date and confusion of the year and the month. The size of the parameter model of the neural network provided by the embodiment does not exceed 100M, online identification can be carried out, and the actual application requirements can be better met.

The above description is only a preferred embodiment of the present invention, and all equivalent changes and modifications made in accordance with the claims of the present invention should be covered by the present invention.

Claims

1. An American license plate recognition method based on image correction is characterized in that: the method comprises the following steps:

2. The American license plate recognition method based on image correction as claimed in claim 1, wherein: the step S1 specifically includes the following steps:

step S13: randomly and horizontally turning the training set images;

step S14: carrying out random angle rotation on the training set images;

step S16: and converting the training set files and the test set files into LMDB data so as to improve the file reading speed.

3. The American license plate recognition method based on image correction as claimed in claim 1, wherein: the step S2 specifically includes the following steps:

wherein the content of the first and second substances,

y＝ck,y∈(11,273)

Step S24: the cyclic neural network adopts a deep stacking bidirectional LSTM network, and text boxes in the text patterns are treated as sequences to obtain sequence information of each text box; the recurrent neural network is a 256-dimensional bidirectional LSTM network, comprises 2 128-dimensional LSTM networks, and obtains a plurality of groups of text box patterns with sequence information through a full connection layer after characteristics are input into the BLSTM network.

4. The American license plate recognition method based on image correction as claimed in claim 1, wherein: the step S3 specifically includes the following steps:

f₁(x)＝a_K*x^K+a_K-1*x^K-1+…+a₁*x+a₀

f₂(x)＝b₁,l*x+b₀,l|r_l,l＝1,2,…,L

step S33: by obtaining the parameters of the fitted line segment in step S32, the coordinates P of the end points of L line segments perpendicular to the center line of the text may be determined as [ t ═ t [ ([ t ])₁,t₂,…,t_2L]^TThe corrected endpoint coordinate P 'is calculated by interpolation using a thin-plate spline as [ t'₁,t'₂,…,t'_2L]^TThen, the parameters of the thin-plate spline interpolation are expressed as:

5. The American license plate recognition method based on image correction as claimed in claim 1, wherein: the specific content of step S4 is:

the transcription layer adopts connection time sequence classification as conditional probability, uses negative log-likelihood probability as training loss, enables sequence information obtained by the sequence regression layer to correspond to pixels in text information of each frame of image one by one, and finally obtains predicted text information.

6. The American license plate recognition method based on image correction as claimed in claim 1, wherein: the specific content of step S5 is: a text detection and image correction module predicts to obtain multiple groups of detection frames with sequence information, each detection frame is marked as box, and the height of the maximum detection frame in the y-axis direction is determined as box_maxFor height value of (box)_max-20，box_max) The detection frames are used as license plate candidate frames, then, the detection frames are arranged from small to large by using the corresponding x-axis coordinate position, the front-back relation of the license plate numbers and the letters is determined, and the coordinates are small in front; recognizing the text in the frame as the license plate number by adopting a text recognition module; calculating the editing distances between the texts in the box except the license plate candidate frame and 50 continents, wherein the continent with the minimum matching distance is the continentA name; and detecting numbers in the rest boxes, automatically filling the numbers into four digits, selecting the largest annual inspection date if the numbers are in the range of 1950 and 2019, otherwise, selecting the largest annual inspection date as 0, and finally outputting the license plate number, the continent name and the annual inspection date.

7. An American license plate recognition system based on image correction is characterized in that: the system comprises a text detection module, an image correction module, a text recognition module and a text classification module; the text detection module is used for detecting the text information in the preprocessed image, so that the text and the background in the image are segmented, and a text image only containing the text information is obtained; the image correction module is used for correcting the text image and converting the originally distorted or inclined text image into a horizontal direction; the text recognition module is used for recognizing the corrected text image to obtain the information of letters and numbers contained in the text image; the text classification module is used for screening out license plate numbers, continent names and annual inspection dates from all text information to finish license plate identification.