CN111325203A - American license plate recognition method and system based on image correction - Google Patents

American license plate recognition method and system based on image correction Download PDF

Info

Publication number
CN111325203A
CN111325203A CN202010069950.5A CN202010069950A CN111325203A CN 111325203 A CN111325203 A CN 111325203A CN 202010069950 A CN202010069950 A CN 202010069950A CN 111325203 A CN111325203 A CN 111325203A
Authority
CN
China
Prior art keywords
text
image
license plate
information
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010069950.5A
Other languages
Chinese (zh)
Other versions
CN111325203B (en
Inventor
林立雄
何洪钦
黄国辉
赖嘉弘
陈睿函
何炳蔚
陈彦杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou University
Original Assignee
Fuzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou University filed Critical Fuzhou University
Priority to CN202010069950.5A priority Critical patent/CN111325203B/en
Publication of CN111325203A publication Critical patent/CN111325203A/en
Application granted granted Critical
Publication of CN111325203B publication Critical patent/CN111325203B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • G06V20/625License plates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Abstract

The invention relates to an American license plate recognition method and system based on image correction, wherein a text detection module, an image correction module, a text recognition module and a text classification module comprise the following steps: preprocessing an image file of a data set, enhancing the data, and generating a training set and a test set; designing a text detection module, detecting text information in the image, realizing the segmentation of the text and the background in the image, and obtaining a text image only containing the text information; adopting an image correction module to correct the text image, and converting the originally distorted or inclined text image into a horizontal direction; identifying the corrected text image to obtain information such as letters, numbers and the like contained in the text image; and constructing a text classification module, and screening the license plate number, the continent name and the annual inspection date from all the text information to finish license plate identification. The method solves the problems of complex background patterns, fuzzy target text image deformation, complex text information and large calculation amount during off-line training by using a neural network during American license plate recognition.

Description

American license plate recognition method and system based on image correction
Technical Field
The invention relates to the field of image recognition and artificial intelligence, in particular to an American license plate recognition method and system based on image correction.
Background
The license plate recognition technology is an important technology in smart city traffic construction, and accuracy and instantaneity are two important indexes for measuring the license plate recognition technology. The license plate image belongs to a natural scene image, is easily influenced by light when being collected, and can cause inclination, motion blur and fading and stain of the license plate due to different shooting angles, and the accuracy of license plate information identification can be directly influenced by uncertain interference. Different from the domestic license plates with uniform patterns of blue-bottom white characters, yellow-bottom black characters, white-bottom black characters and the like, the American license plates contain complex background patterns, the text fonts and styles are different, the license plate designs of different states are completely different, and each license plate can be customized individually, so that the application of the common license plate recognition technology to American license plate recognition is infeasible, and great difficulty and challenge are brought to the American license plate recognition.
Text in images of natural scenes contains high-level semantic information that is important for analyzing and understanding the scene. With the rapid spread of video cameras and computers, images with text information can be acquired in large quantities in daily life. Therefore, scene text recognition has been the active research topic in computer vision, and its related applications include image retrieval, video security, intelligent transportation, human-computer interaction, etc. The license plate recognition method mainly relates to two aspects of text detection and text recognition, wherein the text detection aims to determine the position of a text according to an input image, and the position is usually represented by a bounding box. Text recognition aims at converting image areas containing text into machine-readable character strings.
The traditional license plate recognition method comprises the step of positioning a license plate region according to texture feature analysis, color features and the intensity degree of image edge change. The recent schemes for license plate recognition using neural networks mainly include OpenALPR, easy pr, HyperLPR, and the like. Although the license plate recognition technology is greatly developed, the license plate recognition technology in China is mainly designed for domestic standard license plates at present, no related technology is available for recognizing complicated American license plates, and the American license plate recognition still has difficulty. The difficulty is mainly embodied in the following aspects that firstly, the pattern background of the American license plate is complex, and the acquired image quality is low due to the influence of weather such as illumination, fog, sand storm and the like, the American license plate has a long service life, and a part of target images are covered by background images due to the fact that stains or rusts are generated on texts on the license plate, and the background patterns and the texts are easily mixed when text detection is carried out; secondly, the English font of the American license plate comprises a printed form and a handwritten form, the style is not unified, the problems of inclination, bending and perspective deformation of the acquired license plate image exist, and the accuracy rate of license plate character recognition is reduced; finally, when the characters of the license plate are recognized, English letters and numbers need to be recognized in a mixed mode, text information on the license plate is complex and comprises license plate names, continent names, annual inspection dates, publicity slogans and the like, the license plate names, the continent names and the annual inspection dates need to be screened out from all the text information, and the years and the months of the annual inspection dates need to be pasted on the license plate separately, so that the years and the months need to be distinguished when the annual inspection dates are judged.
Disclosure of Invention
In view of the above, the present invention provides an american license plate recognition method and system based on image correction, so as to solve the problems of complex background patterns, fuzzy target text image deformation, complex text information, and large calculation amount during offline training using a neural network during the american license plate recognition.
The invention is realized by adopting the following scheme: an American license plate recognition method based on image correction comprises the following steps:
step S1: preprocessing a data set: cleaning and screening an original American license plate image data set, labeling the image, wherein the labeled content comprises a text box and text information in the text box, and dividing the labeled image into a training set and a test set for neural network training and experimental testing;
step S2: constructing a text detection module based on a convolutional neural network and a cyclic neural network: extracting the characteristics of the training set image by using a convolutional neural network, converting the characteristic image into a characteristic sequence by using sliding convolution, and designing an anchor point mechanism to detect the result of each sliding convolution; obtaining a plurality of groups of continuous text boxes by comparing the lengths of the text boxes in the x-axis direction and the y-axis direction, and sequencing the text boxes by adopting a recurrent neural network to obtain text box sequence information, wherein each group of text boxes is a predicted text image;
step S3: the predicted text image has bending and inclining conditions, an image correction module is designed to correct the image, the module comprises a line fitting transformation, the line fitting transformation models a middle line of a scene text and a group of line segments vertical to the middle line of the text through a polynomial, and the direction and the boundary of the text are estimated; using an iterative correction network to learn line segment equation parameters, and adjusting the text image to the horizontal direction;
step S4: loading the text image corrected in the step S3, inputting the text image into a text recognition module, and recognizing the text image by the text recognition module through a feature extraction layer, a sequence regression layer and a transcription layer to obtain text information;
step S5: the text classification module is adopted to classify the text information, and because each predicted text information corresponds to a text box one by one, the license plate number, the continent name and the annual inspection date in the text information can be judged according to the length, the width and the position information of the text box.
Further, the step S1 specifically includes the following steps:
step S11: carrying out data screening on an original data set, removing fuzzy images with text information missing, carrying out frame selection marking on the screened images by using LabelImg marking software, wherein marking contents comprise 4 endpoint coordinates of the frame selection text images and text information in a text box, storing an image sequence number, the endpoint coordinates and the text information into a txt file, and generating a training set and a testing set, wherein the training set accounts for 2/3, the testing set accounts for 1/3, and the training set and the testing set both comprise license plate images and txt files;
step S12: scaling original image pixels in a training set to a specified size by a bilinear interpolation method;
step S13: randomly and horizontally turning the training set images;
step S14: carrying out random angle rotation on the training set images;
step S15: carrying out image brightness, contrast and saturation transformation on monocular original images in the training set;
step S16: the training set and test set files are converted into LMDB (Lightning Memory-mapped database) data to improve the file reading speed.
Further, the step S2 specifically includes the following steps:
step S21: when the text detection module based on the convolutional neural network and the cyclic neural network is constructed, the MobileNet V2 is used as the convolutional neural network to extract the features of the image; the conventional convolution is replaced with a depth-separable convolution using, in order, a 1 × 1 point convolution, a ReLU6 activation function, a 3 × 3 depth convolution, a ReLU6 activation function, a 1 × 1 point convolution, a linear activation function, a loss function L (e) of a convolutional neural networki,gj,rk) Comprises the following steps:
Figure BDA0002377050420000051
wherein the content of the first and second substances,
Figure BDA0002377050420000052
3 loss functions, e, for calculating text/non-text scores, coordinates, and boundary refinements, respectivelyiRepresenting the probability of correct prediction of the ith anchor, e* iFor the true value, take 0-1, j is the index of anchor i on the y-axis, k is the index of anchor i on the x-axis, gjAnd g* jPredicted value and true value in y-axis direction of ith anchor, rkAnd r* kPredicted value and true value in x-axis direction, λ, of the ith anchor1And λ2As a weight, Ne、NgAnd NrIs a normalized parameter;
step S22: performing convolution on the obtained feature map by adopting 3-by-3 sliding convolution; step length is 1, channel number is 512, the feature map is converted into a feature sequence through sliding convolution, and the result of each convolution is sequentially input into an anchor point mechanism for text box matching;
step S23: the anchor point mechanism is a text detection box with fixed length in the x-axis direction and variable length in the y-axis direction; and designing k anchor points, and then converting the length in the y-axis direction into the length in the y-axis direction:
y=ck,y∈(11,273)
c is a constant, and text box matching is carried out on each text box sequence through k value transformation to form a plurality of text boxes in sequence;
the anchor point mechanism firstly uses the matching of text detection boxes to obtain the text boxes of all texts in the image in sequence for the result of each sliding convolution, then judges whether the adjacent text boxes belong to the same text line from the x-axis direction by comparing the distance b between the adjacent text boxes, and if b is less than 50, the adjacent text boxes are considered to belong to the same text line; in the y-axis direction, setting the height of the first text box as an initial value, matching the text boxes to obtain a k value and a height h, and if the height of the subsequently detected text box is within the range of (0.9h,1.1h), determining that the text boxes belong to the same text line; and combining a plurality of text boxes of the same type into a group of text boxes through simultaneous positioning in the x-axis direction and the y-axis direction to obtain a plurality of groups of text patterns only containing text information.
Step S24: the circulating neural network adopts a deep-stacked bidirectional LSTM (Bi-directional Long short-Term Memory, BLSTM) network, and text boxes in the text patterns are treated as sequences to obtain sequence information of each text box; the recurrent neural network is a 256-dimensional bidirectional LSTM network, comprises 2 128-dimensional LSTM networks, and obtains a plurality of groups of text box patterns with sequence information through a full connection layer after characteristics are input into the BLSTM network.
Further, the step S3 specifically includes the following steps:
step S31: the line fitting transformation models the position of the image text to correct image perspective and curvature deformation; the fitting line segment takes the image central point as an origin, an x-y coordinate system is established, the fitting line segment comprises two parts, the first part is a polynomial fitting the text centerline in the horizontal direction, the polynomial is expressed by using a K-order polynomial, and the fitting line segment comprises K +1 parameters:
f1(x)=aK*xK+aK-1*xK-1+…+a1*x+a0
the second part is a normal line of the horizontal middle line and a boundary line segment vertical to the horizontal middle line, wherein the second part comprises L line segments, 3L parameters are represented as follows:
f2(x)=b1,l*x+b0,l|rl,l=1,2,…,L
wherein r islRepresenting the length of lines on both sides of the line in the text; wherein, the middle line of the model is the text direction, and the line segment perpendicular to the middle line is the boundary;
step S32: the iterative correction network comprises 5 layers, the first layer of Block1 uses convolution kernels with the size of 3 × 3, the number of output channels is 32, the step size is 2, the second layer of Block2 uses convolution kernels with the size of 3 × 3, the number of output channels is 64, the step size is 2, the third layer of Block3 uses convolution kernels with the size of 3 × 3, the number of output channels is 128, the step size is 2, the fourth layer of FC4 fully-connected layer has the number of output channels of 512, the fifth layer of FC5 fully-connected layer has the number of output channels of the parameter number of line fitting transformation, namely 3L + K + 1;
step S33: by obtaining the parameters of the fitted line segment in step S32, the coordinates P of the end points of L line segments perpendicular to the center line of the text may be determined as [ t ═ t [ ([ t ])1,t2,…,t2L]TUsing thin platesThe corrected endpoint coordinate P ' ═ t ' is calculated by Spline interpolation (TPS) '1,t'2,…,t'2L]TThen, the parameters of the thin-plate spline interpolation are expressed as:
Figure BDA0002377050420000071
wherein S ═ U (t-t)1),U(t-t2),…,U(t-t2L)]T,U(r)=r2logr2For each pixel point t in the original image, obtaining a corrected pixel point t 'through TPS transformation, namely t is C.t';
step S34: obtaining corrected segment end points through the step S33, learning a mapping relation from the corrected end points to the original image by using a sampler, and continuously iterating the step S33 in the training process; the sampler is completely differentiable, the fitting line segment does not need manual marking, and the sampler is trained through the image gradient of the back propagation text recognition module to finish the correction of the text image.
Further, the specific content of step S4 is:
the feature extraction layer uses a ResNet-50 encoder, the number of the convoluted channels of the last layer of the encoder is 512, the size of a convolution kernel is 3 x 3, the step length is 1, maxporoling operation is carried out, an image with the size of (32,100,3) is converted into a feature map with the size of (1,25,512), the convolution operation with the number of the channels of 512, the size of the convolution kernel is 2 x 2 and the step length is 1 is used again, and the feature map is cut into feature sequences according to columns and input into the next layer;
the sequence regression layer uses a deep stacking bidirectional LSTM network, the obtained feature sequences are input into the deep stacking bidirectional LSTM network, the BLSTM network comprises 2 groups of 256-dimensional LSTM networks, and each group of feature sequences are subjected to forward and reverse sequencing to obtain sequence information of each group of features;
the transcription layer adopts connection time sequence Classification (CTC) as conditional probability, uses negative log-likelihood probability as training loss, enables sequence information obtained by the sequence regression layer to correspond to pixels in text information of each frame of image one by one, and finally obtains predicted text information.
Further, the specific content of step S5 is: a text detection and image correction module predicts to obtain multiple groups of detection frames with sequence information, each detection frame is marked as box, and the height of the maximum detection frame in the y-axis direction is determined as boxmaxFor height value of (box)max-20,boxmax) The detection frames are used as license plate candidate frames, then, the detection frames are arranged from small to large by using the corresponding x-axis coordinate position, the front-back relation of the license plate numbers and the letters is determined, and the coordinates are small in front; recognizing the text in the frame as the license plate number by adopting a text recognition module; calculating the editing distances between the texts in the box except the license plate candidate frame and 50 continents, wherein the continent with the minimum matching distance is the continent name; and detecting numbers in the rest boxes, automatically filling the numbers into four digits, selecting the largest annual inspection date if the numbers are in the range of 1950 and 2019, otherwise, selecting the largest annual inspection date as 0, and finally outputting the license plate number, the continent name and the annual inspection date.
Furthermore, the invention provides an American license plate recognition system based on image correction, which comprises a text detection module, an image correction module, a text recognition module and a text classification module; the text detection module is used for detecting the text information in the preprocessed image, so that the text and the background in the image are segmented, and a text image only containing the text information is obtained; the image correction module is used for correcting the text image and converting the originally distorted or inclined text image into a horizontal direction; the text recognition module is used for recognizing the corrected text image to obtain the information of letters and numbers contained in the text image; the text classification module is used for screening out license plate numbers, continent names and annual inspection dates from all text information to finish license plate identification.
Compared with the prior art, the invention has the following beneficial effects:
(1) due to the fact that the background pattern of the American license plate is complex, the target text image is deformed and fuzzy, and text information is complex and difficult to recognize, the method realizes end-to-end American license plate recognition by combining a deep learning method and an image recognition technology.
(2) The invention adopts the lightweight network structure design, combines the convolutional neural network and the cyclic neural network, converts the text detection and the text recognition into sequence processing, improves the American license plate recognition speed and precision, and can be used for real-time license plate recognition.
(3) The invention uses the image correction module to correct the license plate with inclination and curvature transformation, and iteratively fits the text central line through line fitting transformation, thereby realizing text correction and improving the identification precision of the inclined deformation text.
(4) The text classification module of the invention classifies according to the position and size characteristics of the license plate, screens useful information, completes the identification of the license plate number, the continent name and the annual inspection date of the American license plate, and ensures that the identification is more accurate and complete.
Drawings
Fig. 1 is a schematic overall flow chart of an embodiment of the present invention.
Fig. 2 is a schematic structural diagram of a neural network for recognizing american license plates according to an embodiment of the present invention.
Fig. 3 is a schematic diagram of a text detection neural network according to an embodiment of the present invention.
FIG. 4 is a diagram of an image-corrected neural network according to an embodiment of the present invention.
Fig. 5 is a schematic diagram of a text recognition neural network according to an embodiment of the present invention.
Fig. 6 is a schematic diagram of an american type license plate recognition effect according to an embodiment of the present invention.
Detailed Description
The invention is further explained below with reference to the drawings and the embodiments.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
As shown in fig. 1, the present embodiment provides an american license plate recognition method based on image correction, including the following steps:
step S1: preprocessing a data set: cleaning and screening an original American license plate image data set, labeling the image, wherein the labeled content comprises a text box and text information in the text box, and dividing the labeled image into a training set and a test set for neural network training and experimental testing;
step S2: constructing a text detection module based on a convolutional neural network and a cyclic neural network: extracting the characteristics of the training set image by using a convolutional neural network, converting the characteristic image into a characteristic sequence by using sliding convolution, and designing an anchor point mechanism to detect the result of each sliding convolution; obtaining a plurality of groups of continuous text boxes by comparing the lengths of the text boxes in the x-axis direction and the y-axis direction, and sequencing the text boxes by adopting a recurrent neural network to obtain text box sequence information, wherein each group of text boxes is a predicted text image;
step S3: the predicted text image has bending and inclining conditions, an image correction module is designed to correct the image, the module comprises a line fitting transformation, the line fitting transformation models a middle line of a scene text and a group of line segments vertical to the middle line of the text through a polynomial, and the direction and the boundary of the text are estimated; using an iterative correction network to learn line segment equation parameters, and adjusting the text image to the horizontal direction;
step S4: loading the text image corrected in the step S3, inputting the text image into a text recognition module, and recognizing the text image by the text recognition module through a feature extraction layer, a sequence regression layer and a transcription layer to obtain text information;
step S5: the text classification module is adopted to classify the text information, and because each predicted text information corresponds to a text box one by one, the license plate number, the continent name and the annual inspection date in the text information can be judged according to the length, the width and the position information of the text box.
In this embodiment, the step S1 specifically includes the following steps:
step S11: carrying out data screening on an original data set, removing fuzzy images with text information missing, carrying out frame selection marking on the screened images by using LabelImg marking software, wherein marking contents comprise 4 endpoint coordinates of the frame selection text images and text information in a text box, storing an image sequence number, the endpoint coordinates and the text information into a txt file, and generating a training set and a testing set, wherein the training set accounts for 2/3, the testing set accounts for 1/3, and the training set and the testing set both comprise license plate images and txt files;
step S12: scaling original image pixels in a training set to a specified size by a bilinear interpolation method;
step S13: randomly and horizontally turning the training set images;
step S14: carrying out random angle rotation on the training set images;
step S15: carrying out image brightness, contrast and saturation transformation on monocular original images in the training set;
step S16: the training set and test set files are converted into LMDB (Lightning Memory-mapped database) data to improve the file reading speed.
In this embodiment, the step S2 specifically includes the following steps:
step S21: when the text detection module based on the convolutional neural network and the cyclic neural network is constructed, the MobileNet V2 is used as the convolutional neural network to extract the features of the image; the conventional convolution is replaced with a depth-separable convolution using, in order, a 1 × 1 point convolution, a ReLU6 activation function, a 3 × 3 depth convolution, a ReLU6 activation function, a 1 × 1 point convolution, a linear activation function, a loss function L (e) of a convolutional neural networki,gj,rk) Comprises the following steps:
Figure BDA0002377050420000121
wherein the content of the first and second substances,
Figure BDA0002377050420000122
3 loss functions, e, for calculating text/non-text scores, coordinates, and boundary refinements, respectivelyiRepresenting the probability of correct prediction of the ith anchor, e* iFor the true value, take 0-1, j is the index of anchor i on the y-axis, k is the index of anchor i on the x-axis, gjAnd g* jPredicted value and true value in y-axis direction of ith anchor, rkAnd r* kPredicted value and true value in x-axis direction, λ, of the ith anchor1And λ2As a weight, Ne、NgAnd NrIs a normalized parameter;
step S22: performing convolution on the obtained feature map by adopting 3-by-3 sliding convolution; step length is 1, channel number is 512, the feature map is converted into a feature sequence through sliding convolution, and the result of each convolution is sequentially input into an anchor point mechanism for text box matching;
step S23: the anchor point mechanism is a text detection box with fixed length in the x-axis direction and variable length in the y-axis direction; and designing k anchor points, and then converting the length in the y-axis direction into the length in the y-axis direction:
y=ck,y∈(11,273)
c is a constant, and text box matching is carried out on each text box sequence through k value transformation to form a plurality of text boxes in sequence;
the anchor point mechanism firstly uses the matching of text detection boxes to obtain the text boxes of all texts in the image in sequence for the result of each sliding convolution, then judges whether the adjacent text boxes belong to the same text line from the x-axis direction by comparing the distance b between the adjacent text boxes, and if b is less than 50, the adjacent text boxes are considered to belong to the same text line; in the y-axis direction, setting the height of the first text box as an initial value, matching the text boxes to obtain a k value and a height h, and if the height of the subsequently detected text box is within the range of (0.9h,1.1h), determining that the text boxes belong to the same text line; and combining a plurality of text boxes of the same type into a group of text boxes through simultaneous positioning in the x-axis direction and the y-axis direction to obtain a plurality of groups of text patterns only containing text information.
Step S24: the circulating neural network adopts a deep-stacked bidirectional LSTM (Bi-directional Long short-Term Memory, BLSTM) network, and text boxes in the text patterns are treated as sequences to obtain sequence information of each text box; the recurrent neural network is a 256-dimensional bidirectional LSTM network, comprises 2 128-dimensional LSTM networks, and obtains a plurality of groups of text box patterns with sequence information through a full connection layer after characteristics are input into the BLSTM network.
In this embodiment, the step S3 specifically includes the following steps:
step S31: the line fitting transformation models the position of the image text to correct image perspective and curvature deformation; the fitting line segment takes the image central point as an origin, an x-y coordinate system is established, the fitting line segment comprises two parts, the first part is a polynomial fitting the text centerline in the horizontal direction, the polynomial is expressed by using a K-order polynomial, and the fitting line segment comprises K +1 parameters:
f1(x)=aK*xK+aK-1*xK-1+…+a1*x+a0
the second part is a normal line of the horizontal middle line and a boundary line segment vertical to the horizontal middle line, wherein the second part comprises L line segments, 3L parameters are represented as follows:
f2(x)=b1,l*x+b0,l|rl,l=1,2,…,L
wherein r islRepresenting the length of lines on both sides of the line in the text; wherein, the middle line of the model is the text direction, and the line segment perpendicular to the middle line is the boundary;
step S32: the iterative correction network comprises 5 layers, the first layer of Block1 uses convolution kernels with the size of 3 × 3, the number of output channels is 32, the step size is 2, the second layer of Block2 uses convolution kernels with the size of 3 × 3, the number of output channels is 64, the step size is 2, the third layer of Block3 uses convolution kernels with the size of 3 × 3, the number of output channels is 128, the step size is 2, the fourth layer of FC4 fully-connected layer has the number of output channels of 512, the fifth layer of FC5 fully-connected layer has the number of output channels of the parameter number of line fitting transformation, namely 3L + K + 1;
step S33: by obtaining the parameters of the fitted line segment in step S32, the coordinates P of the end points of L line segments perpendicular to the center line of the text may be determined as [ t ═ t [ ([ t ])1,t2,…,t2L]TThe corrected endpoint coordinate P ' ═ t ' is calculated using Thin Plate Spline interpolation (TPS) '1,t'2,…,t'2L]TThen, the parameters of the thin-plate spline interpolation are expressed as:
Figure BDA0002377050420000141
wherein S ═ U (t-t)1),U(t-t2),…,U(t-t2L)]T,U(r)=r2logr2For each pixel point t in the original image, obtaining a corrected pixel point t 'through TPS transformation, namely t is C.t';
step S34: obtaining corrected segment end points through the step S33, learning a mapping relation from the corrected end points to the original image by using a sampler, and continuously iterating the step S33 in the training process; the sampler is completely differentiable, the fitting line segment does not need manual marking, and the sampler is trained through the image gradient of the back propagation text recognition module to finish the correction of the text image.
In this embodiment, the specific content of step S4 is:
the feature extraction layer uses a ResNet-50 encoder, the number of the convoluted channels of the last layer of the encoder is 512, the size of a convolution kernel is 3 x 3, the step length is 1, maxporoling operation is carried out, an image with the size of (32,100,3) is converted into a feature map with the size of (1,25,512), the convolution operation with the number of the channels of 512, the size of the convolution kernel is 2 x 2 and the step length is 1 is used again, and the feature map is cut into feature sequences according to columns and input into the next layer;
the sequence regression layer uses a deep stacking bidirectional LSTM network, the obtained feature sequences are input into the deep stacking bidirectional LSTM network, the BLSTM network comprises 2 groups of 256-dimensional LSTM networks, and each group of feature sequences are subjected to forward and reverse sequencing to obtain sequence information of each group of features;
the transcription layer adopts connection time sequence Classification (CTC) as conditional probability, uses negative log-likelihood probability as training loss, enables sequence information obtained by the sequence regression layer to correspond to pixels in text information of each frame of image one by one, and finally obtains predicted text information.
In this embodiment, the specific content of step S5 is: a text detection and image correction module predicts to obtain multiple groups of detection frames with sequence information, each detection frame is marked as box, and the height of the maximum detection frame in the y-axis direction is determined as boxmaxFor height value of (box)max-20,boxmax) The detection frames are used as license plate candidate frames, then, the detection frames are arranged from small to large by using the corresponding x-axis coordinate position, the front-back relation of the license plate numbers and the letters is determined, and the coordinates are small in front; recognizing the text in the frame as the license plate number by adopting a text recognition module; calculating the editing distances between the texts in the box except the license plate candidate frame and 50 continents, wherein the continent with the minimum matching distance is the continent name; and detecting numbers in the rest boxes, automatically filling the numbers into four digits, selecting the largest annual inspection date if the numbers are in the range of 1950 and 2019, otherwise, selecting the largest annual inspection date as 0, and finally outputting the license plate number, the continent name and the annual inspection date.
Preferably, the embodiment further provides an american license plate recognition system based on image correction, which includes a text detection module, an image correction module, a text recognition module and a text classification module; the text detection module is used for detecting the text information in the preprocessed image, so that the text and the background in the image are segmented, and a text image only containing the text information is obtained; the image correction module is used for correcting the text image and converting the originally distorted or inclined text image into a horizontal direction; the text recognition module is used for recognizing the corrected text image to obtain the information of letters and numbers contained in the text image; the text classification module is used for screening out license plate numbers, continent names and annual inspection dates from all text information to finish license plate identification.
Preferably, the present embodiment includes the following design points: 1) preprocessing an image file of a data set to generate a training set and a test set, and performing data enhancement; 2) designing a text detection module, detecting text information in the image, realizing the segmentation of the text and the background in the image, and obtaining a text image only containing the text information; 3) adopting an image correction module to correct the text image, and converting the originally distorted or inclined text image into a horizontal direction; 4) identifying the corrected text image to obtain information such as letters, numbers and the like contained in the text image; 5) and constructing a text classification module, and screening out the license plate number, the continent name and the annual inspection date from all the text information to finish license plate identification.
Preferably, in this embodiment, the original image is characterized by using a lightweight convolutional neural network mobilonetv 2, the feature sequence is input into a cyclic neural network, and an anchor point mechanism is used to connect text boxes, so as to form a final text line and complete text detection; for the text with inclination and curvature transformation, an image correction module is adopted to correct the text image, and the originally distorted or inclined text image is converted into the horizontal direction; identifying the corrected text image through a text identification module to obtain information such as letters, numbers and the like contained in the text image; and finally, constructing a text information classification module, and screening out the license plate number, the continent name and the annual inspection date from all the text information to finish the license plate identification. The method combines a deep learning method, inputs American license plate images into a trained network model, can finish the identification of license plate numbers, continent names and annual inspection dates, has high accuracy and strong robustness, can well detect and identify multi-direction and multi-size text objects in the license plate images with complex backgrounds, has small model parameters, and can be used for real-time license plate detection.
Preferably, the specific application example of the present embodiment is as follows:
1) preprocessing a data set, cleaning and screening an original American license plate, labeling images, dividing the labeled images into a training set and a testing set for neural network training and experimental testing, and specifically comprising the following steps:
1-1) carrying out data screening on an original data set, removing pictures with fuzzy and text information missing, carrying out framing and labeling on the images by using LabelImg labeling software, wherein labeling contents comprise 4 endpoint coordinates of the framing and labeling text images and text information in a text box, storing an image serial number, the endpoint coordinates and the text information into a txt file, and generating a training set and a testing set, wherein the training set comprises 4000 pictures, and the testing set comprises 2000 pictures and 6000 pictures in total;
1-2) adjusting original images in a training set to be uniform in size, wherein the resolution is 800 x 400;
1-3) randomly and horizontally turning over the images in the training set, wherein the turning over probability is 0.5;
1-4) carrying out random angle rotation on the training set image, wherein the value range of the rotation angle is (-5 degrees and 5 degrees);
1-5) carrying out image brightness, contrast and saturation transformation on monocular original images in a training set, wherein values are 0.4, 0.4 and 0.4 respectively; (ii) a
1-6) converting the training set and test set files into LMDB (Lighting Memory-Mapped Database) data, and improving the file reading speed.
2) As shown in FIG. 2, an unsupervised convolutional neural network structure for American license plate recognition is designed, the network comprises four units of text detection, image correction, text recognition and text classification, the whole neural network completes feature extraction, text detection and character recognition of images, and the unsupervised convolutional neural network structure is an end-to-end unsupervised learning process.
As shown in fig. 3, a text detection module based on a convolutional neural network and a cyclic neural network is constructed, continuous text lines, i.e., predicted text images, are obtained from an input american license plate image, and the text image is framed and selected, specifically including the following steps:
2-1) the text detection module uses the MobileNet V2 as a convolution neural network to extract the features of the image, and obtains a feature map from the image. MobileNet V2 as a lightweight network, replaces the traditional convolution with a deep separable convolution that uses, in order, a 1 x 1 point convolution, a ReLU6 activation function, a 3 x 3 deep convolution, a ReLU6 activation function, a 1 x 1 point convolution, a linear convolutionActivation function, loss function L (e) of convolutional neural networki,gj,rk) Comprises the following steps:
Figure BDA0002377050420000191
wherein
Figure BDA0002377050420000192
3 loss functions, e, for calculating text/non-text scores, coordinates, and boundary refinements, respectivelyiRepresenting the probability of correct prediction of the ith anchor, e* iFor the true value, take 0-1, j is the index of anchor i on the y-axis, k is the index of anchor i on the x-axis, gjAnd g* jPredicted value and true value in y-axis direction of ith anchor, rkAnd r* kPredicted value and true value in x-axis direction, λ, of the ith anchor1And λ2As weights, take 1.0 and 2.0, respectively, Ne、NgAnd NrThe values are 128, 20 and 32 respectively for the normalization parameter;
2-2) performing convolution on the obtained feature graph by adopting 3-by-3 sliding convolution, wherein the step length is 1, the number of channels is 512, converting the feature graph into a feature sequence through the sliding convolution, and sequentially inputting the result of each convolution into an anchor point mechanism for text box matching;
2-3) the anchor point mechanism is a text detection box with fixed length in the x-axis direction and variable length in the y-axis direction. And designing k anchor points, and then converting the length in the y-axis direction into the length in the y-axis direction:
y=ck,y∈(11,273)
and c is a constant, and text box matching is carried out on each text box sequence through k value transformation to form a plurality of text boxes in sequence. In the examples, the value of k is 20 and the value of c is 13.1.
The anchor point mechanism firstly uses the matching of text detection boxes to obtain the text boxes of all texts in the image in sequence for the result of each sliding convolution, then judges whether the adjacent text boxes belong to the same text line from the x-axis direction by comparing the distance b between the adjacent text boxes, and if b is less than 50, the adjacent text boxes are considered to belong to the same text line; and in the y-axis direction, setting the height of the first text box as an initial value, matching the text boxes to obtain a k value and a height h, and if the height of the subsequently detected text box is in the range of (0.9h,1.1h), determining that the text boxes belong to the same text line. Combining a plurality of text boxes of the same type into a group of text boxes through simultaneous positioning in the x-axis direction and the y-axis direction to obtain a plurality of groups of text patterns only containing text information;
and 2-4) adopting a deep-stacking bidirectional Short-term memory (BLSTM) network by the recurrent neural network, and treating text boxes in the text pattern as sequences to obtain sequence information of each text box. The cyclic neural network is a 256-dimensional bidirectional LSTM network and comprises 2 128-dimensional LSTM networks, and after characteristics are input into the BLSTM network, a plurality of groups of text box patterns with sequence information are obtained through a full-connection layer with 512 channels;
3) as shown in fig. 4, the predicted text image has a curve and a tilt, and an image correction module is designed to correct the image, where the image correction module includes a line fitting transformation, the line fitting transformation models a central line of a scene text by a polynomial, and estimates a direction and a boundary of the text by using a set of line segments perpendicular to the central line of the text, and adjusts the text image to a horizontal direction by using iterative correction network learning line segment equation parameters, and the specific steps are as follows:
3-1) line fitting transformation is first used to model the position of the image text to correct perspective and curvature distortion. The fitting line segment takes the image central point as an origin, an x-y coordinate system is established, the fitting line segment comprises two parts, the first part is a polynomial fitting the text centerline in the horizontal direction, the polynomial is expressed by using a K-order polynomial, and the fitting line segment comprises K +1 parameters:
f1(x)=aK*xK+aK-1*xK-1+…+a1*x+a0
the second part is a normal line of the horizontal middle line and a boundary line segment perpendicular to the horizontal middle line, wherein the second part comprises L line segments, 3L parameters are represented as follows:
f2(x)=b1,l*x+b0,l|rl,l=1,2,…,L
wherein r islThe length of the line segment on both sides of the line in the text is shown, in the embodiment, K is 4, and L is 20;
and 3-2) after the position modeling of the image text is completed, adjusting and optimizing the fitted line segment by using an iterative correction network. The number of output channels of the first layer of Block1 is 32, the step size is 2, the second layer of Block2 uses a convolution kernel with the size of 3 × 3, the number of output channels is 64, the step size is 2, the third layer of Block3 uses a convolution kernel with the size of 3 × 3, the number of output channels is 128, the step size is 2, the fourth layer of FC4 is fully connected, the number of output channels is 512, the fifth layer of FC5 is fully connected, the number of output channels is the number of parameters of line fitting transformation, namely 3L + K +1, and the iterative correction time n is 5;
3-3) obtaining the parameters of the fitted line segment through the steps, and determining the endpoint coordinate P ═ t of L line segments vertical to the center line of the text1,t2,…,t2L]TThe corrected endpoint coordinate P ' ═ t ' is calculated using thin plate Spline interpolation (TPS) '1,t'2,…,t'2L]TThen the parameters of the thin-plate spline interpolation can be expressed as:
Figure BDA0002377050420000211
wherein S ═ U (t-t)1),U(t-t2),…,U(t-t2L)]T,U(r)=r2logr2For each pixel point t in the original image, a corrected pixel point t 'can be obtained through TPS transformation, namely t is equal to C.t';
3-4) obtaining the corrected segment end points through the step S32, learning the mapping relation from the corrected end points to the original image by using a sampler, and continuously iterating the step 3-3) in the training process. The sampler is completely differentiable, so that the fitted line segment does not need manual marking, and the sampler is trained through the image gradient of the back propagation text recognition module to finish the correction of the text image;
4) and loading the text image corrected in the step 3), and inputting the text image into a text recognition module. As shown in fig. 5, the text recognition module recognizes the text image through the feature extraction layer, the sequence regression layer, and the transcription layer, and obtains text information.
4-1) a ResNet-50 encoder is used by a feature extraction layer of the text recognition module, the number of convoluted channels of the last layer of the encoder is 512, the size of a convolution kernel is 3 x 3, the step length is 1, maxporoling operation is carried out, an image with the size of (32,100,3) is converted into a feature map with the size of (1,25,512), the convolution operation with the number of channels of 512, the size of the convolution kernel is 2 x 2 and the step length is 1 is used again, and the feature map is cut into feature sequences according to columns and input into the next layer;
4-2) inputting the obtained feature sequences into a deep-stacking bidirectional LSTM network, wherein the BLSTM network comprises 2 groups of 256-dimensional LSTM networks, and performing forward and reverse sequencing on each group of feature sequences to obtain sequence information of each group of features;
4-3) the transcription layer adopts connection time sequence Classification (CTC) as conditional probability, uses negative log-likelihood probability as training loss, enables sequence information obtained by the sequence regression layer to correspond to pixels in text information of each frame of image one by one, and finally obtains predicted text information;
5) the text classification module is adopted to classify the text information, and because each predicted text information corresponds to the text box one by one, the license plate number, the continent name and the annual inspection date in the text information can be judged according to the length, the width, the position and the like of the text box.
The text recognition module predicts a plurality of groups of detection boxes and marks the detection boxes as box, and the height of the maximum detection box in the y-axis direction is determined as boxmaxFor height value of (box)max-20,boxmax) The detection frame is used as a license plate candidate frame, then the front-back relation of the license plate is determined through coordinates, and a text recognition module is adopted to recognize that text in the frame is the license plate number; calculating the editing distances between the texts in the box except the license plate candidate frame and 50 continents, wherein the continent with the minimum matching distance is the continent name; detecting the number in the rest box and automatically filling the number into four digits, if the number is in the range of 1950 and 2019, selecting the largest annual inspection date, if the number is not 0, and finally outputting the carBrand, continent name and annual inspection date.
The experimental result is shown in fig. 6, the american type license plate recognition method based on image correction provided in this embodiment can perform quick and accurate recognition on the american type license plate through algorithm optimization and model improvement, and the verification result using the test set shows that the overall recognition rate of the license plate and the continent name in this embodiment reaches over 90%, and the recognition rate of the annual check date is only 10% due to large format and position difference of the annual check date and confusion of the year and the month. The size of the parameter model of the neural network provided by the embodiment does not exceed 100M, online identification can be carried out, and the actual application requirements can be better met.
The above description is only a preferred embodiment of the present invention, and all equivalent changes and modifications made in accordance with the claims of the present invention should be covered by the present invention.

Claims (7)

1. An American license plate recognition method based on image correction is characterized in that: the method comprises the following steps:
step S1: preprocessing a data set: cleaning and screening an original American license plate image data set, labeling the image, wherein the labeled content comprises a text box and text information in the text box, and dividing the labeled image into a training set and a test set for neural network training and experimental testing;
step S2: constructing a text detection module based on a convolutional neural network and a cyclic neural network: extracting the characteristics of the training set image by using a convolutional neural network, converting the characteristic image into a characteristic sequence by using sliding convolution, and designing an anchor point mechanism to detect the result of each sliding convolution; obtaining a plurality of groups of continuous text boxes by comparing the lengths of the text boxes in the x-axis direction and the y-axis direction, and sequencing the text boxes by adopting a recurrent neural network to obtain text box sequence information, wherein each group of text boxes is a predicted text image;
step S3: the predicted text image has bending and inclining conditions, an image correction module is designed to correct the image, the module comprises a line fitting transformation, the line fitting transformation models a middle line of a scene text and a group of line segments vertical to the middle line of the text through a polynomial, and the direction and the boundary of the text are estimated; using an iterative correction network to learn line segment equation parameters, and adjusting the text image to the horizontal direction;
step S4: loading the text image corrected in the step S3, inputting the text image into a text recognition module, and recognizing the text image by the text recognition module through a feature extraction layer, a sequence regression layer and a transcription layer to obtain text information;
step S5: the text classification module is adopted to classify the text information, and because each predicted text information corresponds to a text box one by one, the license plate number, the continent name and the annual inspection date in the text information can be judged according to the length, the width and the position information of the text box.
2. The American license plate recognition method based on image correction as claimed in claim 1, wherein: the step S1 specifically includes the following steps:
step S11: carrying out data screening on an original data set, removing fuzzy images with text information missing, carrying out frame selection marking on the screened images by using LabelImg marking software, wherein marking contents comprise 4 endpoint coordinates of the frame selection text images and text information in a text box, storing an image sequence number, the endpoint coordinates and the text information into a txt file, and generating a training set and a testing set, wherein the training set accounts for 2/3, the testing set accounts for 1/3, and the training set and the testing set both comprise license plate images and txt files;
step S12: scaling original image pixels in a training set to a specified size by a bilinear interpolation method;
step S13: randomly and horizontally turning the training set images;
step S14: carrying out random angle rotation on the training set images;
step S15: carrying out image brightness, contrast and saturation transformation on monocular original images in the training set;
step S16: and converting the training set files and the test set files into LMDB data so as to improve the file reading speed.
3. The American license plate recognition method based on image correction as claimed in claim 1, wherein: the step S2 specifically includes the following steps:
step S21: when the text detection module based on the convolutional neural network and the cyclic neural network is constructed, the MobileNet V2 is used as the convolutional neural network to extract the features of the image; the conventional convolution is replaced with a depth-separable convolution using, in order, a 1 × 1 point convolution, a ReLU6 activation function, a 3 × 3 depth convolution, a ReLU6 activation function, a 1 × 1 point convolution, a linear activation function, a loss function L (e) of a convolutional neural networki,gj,rk) Comprises the following steps:
Figure FDA0002377050410000031
wherein the content of the first and second substances,
Figure FDA0002377050410000032
3 loss functions, e, for calculating text/non-text scores, coordinates, and boundary refinements, respectivelyiRepresenting the probability of correct prediction of the ith anchor, e* iFor the true value, take 0-1, j is the index of anchor i on the y-axis, k is the index of anchor i on the x-axis, gjAnd g* jPredicted value and true value in y-axis direction of ith anchor, rkAnd r* kPredicted value and true value in x-axis direction, λ, of the ith anchor1And λ2As a weight, Ne、NgAnd NrIs a normalized parameter;
step S22: performing convolution on the obtained feature map by adopting 3-by-3 sliding convolution; step length is 1, channel number is 512, the feature map is converted into a feature sequence through sliding convolution, and the result of each convolution is sequentially input into an anchor point mechanism for text box matching;
step S23: the anchor point mechanism is a text detection box with fixed length in the x-axis direction and variable length in the y-axis direction; and designing k anchor points, and then converting the length in the y-axis direction into the length in the y-axis direction:
y=ck,y∈(11,273)
c is a constant, and text box matching is carried out on each text box sequence through k value transformation to form a plurality of text boxes in sequence;
the anchor point mechanism firstly uses the matching of text detection boxes to obtain the text boxes of all texts in the image in sequence for the result of each sliding convolution, then judges whether the adjacent text boxes belong to the same text line from the x-axis direction by comparing the distance b between the adjacent text boxes, and if b is less than 50, the adjacent text boxes are considered to belong to the same text line; in the y-axis direction, setting the height of the first text box as an initial value, matching the text boxes to obtain a k value and a height h, and if the height of the subsequently detected text box is within the range of (0.9h,1.1h), determining that the text boxes belong to the same text line; and combining a plurality of text boxes of the same type into a group of text boxes through simultaneous positioning in the x-axis direction and the y-axis direction to obtain a plurality of groups of text patterns only containing text information.
Step S24: the cyclic neural network adopts a deep stacking bidirectional LSTM network, and text boxes in the text patterns are treated as sequences to obtain sequence information of each text box; the recurrent neural network is a 256-dimensional bidirectional LSTM network, comprises 2 128-dimensional LSTM networks, and obtains a plurality of groups of text box patterns with sequence information through a full connection layer after characteristics are input into the BLSTM network.
4. The American license plate recognition method based on image correction as claimed in claim 1, wherein: the step S3 specifically includes the following steps:
step S31: the line fitting transformation models the position of the image text to correct image perspective and curvature deformation; the fitting line segment takes the image central point as an origin, an x-y coordinate system is established, the fitting line segment comprises two parts, the first part is a polynomial fitting the text centerline in the horizontal direction, the polynomial is expressed by using a K-order polynomial, and the fitting line segment comprises K +1 parameters:
f1(x)=aK*xK+aK-1*xK-1+…+a1*x+a0
the second part is a normal line of the horizontal middle line and a boundary line segment vertical to the horizontal middle line, wherein the second part comprises L line segments, 3L parameters are represented as follows:
f2(x)=b1,l*x+b0,l|rl,l=1,2,…,L
wherein r islRepresenting the length of lines on both sides of the line in the text; wherein, the middle line of the model is the text direction, and the line segment perpendicular to the middle line is the boundary;
step S32: the iterative correction network comprises 5 layers, the first layer of Block1 uses convolution kernels with the size of 3 × 3, the number of output channels is 32, the step size is 2, the second layer of Block2 uses convolution kernels with the size of 3 × 3, the number of output channels is 64, the step size is 2, the third layer of Block3 uses convolution kernels with the size of 3 × 3, the number of output channels is 128, the step size is 2, the fourth layer of FC4 fully-connected layer has the number of output channels of 512, the fifth layer of FC5 fully-connected layer has the number of output channels of the parameter number of line fitting transformation, namely 3L + K + 1;
step S33: by obtaining the parameters of the fitted line segment in step S32, the coordinates P of the end points of L line segments perpendicular to the center line of the text may be determined as [ t ═ t [ ([ t ])1,t2,…,t2L]TThe corrected endpoint coordinate P 'is calculated by interpolation using a thin-plate spline as [ t'1,t'2,…,t'2L]TThen, the parameters of the thin-plate spline interpolation are expressed as:
Figure FDA0002377050410000051
wherein S ═ U (t-t)1),U(t-t2),…,U(t-t2L)]T,U(r)=r2logr2For each pixel point t in the original image, obtaining a corrected pixel point t 'through TPS transformation, namely t is C.t';
step S34: obtaining corrected segment end points through the step S33, learning a mapping relation from the corrected end points to the original image by using a sampler, and continuously iterating the step S33 in the training process; the sampler is completely differentiable, the fitting line segment does not need manual marking, and the sampler is trained through the image gradient of the back propagation text recognition module to finish the correction of the text image.
5. The American license plate recognition method based on image correction as claimed in claim 1, wherein: the specific content of step S4 is:
the feature extraction layer uses a ResNet-50 encoder, the number of the convoluted channels of the last layer of the encoder is 512, the size of a convolution kernel is 3 x 3, the step length is 1, maxporoling operation is carried out, an image with the size of (32,100,3) is converted into a feature map with the size of (1,25,512), the convolution operation with the number of the channels of 512, the size of the convolution kernel is 2 x 2 and the step length is 1 is used again, and the feature map is cut into feature sequences according to columns and input into the next layer;
the sequence regression layer uses a deep stacking bidirectional LSTM network, the obtained feature sequences are input into the deep stacking bidirectional LSTM network, the BLSTM network comprises 2 groups of 256-dimensional LSTM networks, and each group of feature sequences are subjected to forward and reverse sequencing to obtain sequence information of each group of features;
the transcription layer adopts connection time sequence classification as conditional probability, uses negative log-likelihood probability as training loss, enables sequence information obtained by the sequence regression layer to correspond to pixels in text information of each frame of image one by one, and finally obtains predicted text information.
6. The American license plate recognition method based on image correction as claimed in claim 1, wherein: the specific content of step S5 is: a text detection and image correction module predicts to obtain multiple groups of detection frames with sequence information, each detection frame is marked as box, and the height of the maximum detection frame in the y-axis direction is determined as boxmaxFor height value of (box)max-20,boxmax) The detection frames are used as license plate candidate frames, then, the detection frames are arranged from small to large by using the corresponding x-axis coordinate position, the front-back relation of the license plate numbers and the letters is determined, and the coordinates are small in front; recognizing the text in the frame as the license plate number by adopting a text recognition module; calculating the editing distances between the texts in the box except the license plate candidate frame and 50 continents, wherein the continent with the minimum matching distance is the continentA name; and detecting numbers in the rest boxes, automatically filling the numbers into four digits, selecting the largest annual inspection date if the numbers are in the range of 1950 and 2019, otherwise, selecting the largest annual inspection date as 0, and finally outputting the license plate number, the continent name and the annual inspection date.
7. An American license plate recognition system based on image correction is characterized in that: the system comprises a text detection module, an image correction module, a text recognition module and a text classification module; the text detection module is used for detecting the text information in the preprocessed image, so that the text and the background in the image are segmented, and a text image only containing the text information is obtained; the image correction module is used for correcting the text image and converting the originally distorted or inclined text image into a horizontal direction; the text recognition module is used for recognizing the corrected text image to obtain the information of letters and numbers contained in the text image; the text classification module is used for screening out license plate numbers, continent names and annual inspection dates from all text information to finish license plate identification.
CN202010069950.5A 2020-01-21 2020-01-21 American license plate recognition method and system based on image correction Active CN111325203B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010069950.5A CN111325203B (en) 2020-01-21 2020-01-21 American license plate recognition method and system based on image correction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010069950.5A CN111325203B (en) 2020-01-21 2020-01-21 American license plate recognition method and system based on image correction

Publications (2)

Publication Number Publication Date
CN111325203A true CN111325203A (en) 2020-06-23
CN111325203B CN111325203B (en) 2022-07-05

Family

ID=71170961

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010069950.5A Active CN111325203B (en) 2020-01-21 2020-01-21 American license plate recognition method and system based on image correction

Country Status (1)

Country Link
CN (1) CN111325203B (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111783763A (en) * 2020-07-07 2020-10-16 厦门商集网络科技有限责任公司 Text positioning box correction method and system based on convolutional neural network
CN111814736A (en) * 2020-07-23 2020-10-23 上海东普信息科技有限公司 Express bill information identification method, device, equipment and storage medium
CN111882004A (en) * 2020-09-28 2020-11-03 北京易真学思教育科技有限公司 Model training method, question judging method, device, equipment and storage medium
CN111898597A (en) * 2020-06-24 2020-11-06 泰康保险集团股份有限公司 Method, device, equipment and computer readable medium for processing text image
CN111914838A (en) * 2020-07-28 2020-11-10 同济大学 License plate recognition method based on text line recognition
CN111985465A (en) * 2020-08-17 2020-11-24 中移(杭州)信息技术有限公司 Text recognition method, device, equipment and storage medium
CN112070048A (en) * 2020-09-16 2020-12-11 福州大学 Vehicle attribute identification method based on RDSNet
CN112183307A (en) * 2020-09-25 2021-01-05 上海眼控科技股份有限公司 Text recognition method, computer device, and storage medium
CN112016315B (en) * 2020-10-19 2021-02-02 北京易真学思教育科技有限公司 Model training method, text recognition method, model training device, text recognition device, electronic equipment and storage medium
CN112308092A (en) * 2020-11-20 2021-02-02 福州大学 Light-weight license plate detection and identification method based on multi-scale attention mechanism
CN112364883A (en) * 2020-09-17 2021-02-12 福州大学 American license plate recognition method based on single-stage target detection and deptext recognition network
CN112528994A (en) * 2020-12-18 2021-03-19 南京师范大学 Free-angle license plate detection method, license plate identification method and identification system
CN112784836A (en) * 2021-01-22 2021-05-11 浙江康旭科技有限公司 Text and graphic offset angle prediction and correction method thereof
CN112801095A (en) * 2021-02-05 2021-05-14 广东工业大学 Attention mechanism-based graph neural network container text recognition method
CN112818823A (en) * 2021-01-28 2021-05-18 建信览智科技(北京)有限公司 Text extraction method based on bill content and position information
CN112883973A (en) * 2021-03-17 2021-06-01 北京市商汤科技开发有限公司 License plate recognition method and device, electronic equipment and computer storage medium
CN112990197A (en) * 2021-03-17 2021-06-18 浙江商汤科技开发有限公司 License plate recognition method and device, electronic equipment and storage medium
CN113240058A (en) * 2021-07-13 2021-08-10 北京文安智能技术股份有限公司 License plate image training set construction method and license plate character detection model training method
CN113343903A (en) * 2021-06-28 2021-09-03 成都恒创新星科技有限公司 License plate recognition method and system in natural scene

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100073735A1 (en) * 2008-05-06 2010-03-25 Compulink Management Center, Inc. Camera-based document imaging
WO2016197381A1 (en) * 2015-06-12 2016-12-15 Sensetime Group Limited Methods and apparatus for recognizing text in an image
CN108985137A (en) * 2017-06-02 2018-12-11 杭州海康威视数字技术股份有限公司 A kind of licence plate recognition method, apparatus and system
CN109034152A (en) * 2018-07-17 2018-12-18 广东工业大学 License plate locating method and device based on LSTM-CNN built-up pattern

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100073735A1 (en) * 2008-05-06 2010-03-25 Compulink Management Center, Inc. Camera-based document imaging
WO2016197381A1 (en) * 2015-06-12 2016-12-15 Sensetime Group Limited Methods and apparatus for recognizing text in an image
CN108985137A (en) * 2017-06-02 2018-12-11 杭州海康威视数字技术股份有限公司 A kind of licence plate recognition method, apparatus and system
CN109034152A (en) * 2018-07-17 2018-12-18 广东工业大学 License plate locating method and device based on LSTM-CNN built-up pattern

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JIANMING ZHANG等: ""A Real-Time Chinese Traffic Sign Detection Algorithm Based on Modified YOLOv2"", 《ALGORITHMS》 *
谢光俊: ""车牌识别系统中车牌定位与字符分割算法的研究"", 《中国优秀硕士学位论文全文数据库·信息科技辑》 *

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111898597A (en) * 2020-06-24 2020-11-06 泰康保险集团股份有限公司 Method, device, equipment and computer readable medium for processing text image
CN111783763A (en) * 2020-07-07 2020-10-16 厦门商集网络科技有限责任公司 Text positioning box correction method and system based on convolutional neural network
CN111814736A (en) * 2020-07-23 2020-10-23 上海东普信息科技有限公司 Express bill information identification method, device, equipment and storage medium
CN111814736B (en) * 2020-07-23 2023-12-29 上海东普信息科技有限公司 Express delivery face list information identification method, device, equipment and storage medium
CN111914838A (en) * 2020-07-28 2020-11-10 同济大学 License plate recognition method based on text line recognition
CN111985465A (en) * 2020-08-17 2020-11-24 中移(杭州)信息技术有限公司 Text recognition method, device, equipment and storage medium
CN112070048A (en) * 2020-09-16 2020-12-11 福州大学 Vehicle attribute identification method based on RDSNet
CN112070048B (en) * 2020-09-16 2022-08-09 福州大学 Vehicle attribute identification method based on RDSNet
CN112364883A (en) * 2020-09-17 2021-02-12 福州大学 American license plate recognition method based on single-stage target detection and deptext recognition network
CN112364883B (en) * 2020-09-17 2022-06-10 福州大学 American license plate recognition method based on single-stage target detection and deptext recognition network
CN112183307A (en) * 2020-09-25 2021-01-05 上海眼控科技股份有限公司 Text recognition method, computer device, and storage medium
CN111882004A (en) * 2020-09-28 2020-11-03 北京易真学思教育科技有限公司 Model training method, question judging method, device, equipment and storage medium
CN112016315B (en) * 2020-10-19 2021-02-02 北京易真学思教育科技有限公司 Model training method, text recognition method, model training device, text recognition device, electronic equipment and storage medium
CN112308092A (en) * 2020-11-20 2021-02-02 福州大学 Light-weight license plate detection and identification method based on multi-scale attention mechanism
CN112308092B (en) * 2020-11-20 2023-02-28 福州大学 Light-weight license plate detection and identification method based on multi-scale attention mechanism
CN112528994A (en) * 2020-12-18 2021-03-19 南京师范大学 Free-angle license plate detection method, license plate identification method and identification system
CN112528994B (en) * 2020-12-18 2024-03-29 南京师范大学 Free angle license plate detection method, license plate recognition method and recognition system
CN112784836A (en) * 2021-01-22 2021-05-11 浙江康旭科技有限公司 Text and graphic offset angle prediction and correction method thereof
CN112818823A (en) * 2021-01-28 2021-05-18 建信览智科技(北京)有限公司 Text extraction method based on bill content and position information
CN112818823B (en) * 2021-01-28 2024-04-12 金科览智科技(北京)有限公司 Text extraction method based on bill content and position information
CN112801095A (en) * 2021-02-05 2021-05-14 广东工业大学 Attention mechanism-based graph neural network container text recognition method
CN112990197A (en) * 2021-03-17 2021-06-18 浙江商汤科技开发有限公司 License plate recognition method and device, electronic equipment and storage medium
CN112883973A (en) * 2021-03-17 2021-06-01 北京市商汤科技开发有限公司 License plate recognition method and device, electronic equipment and computer storage medium
CN113343903A (en) * 2021-06-28 2021-09-03 成都恒创新星科技有限公司 License plate recognition method and system in natural scene
CN113343903B (en) * 2021-06-28 2024-03-26 成都恒创新星科技有限公司 License plate recognition method and system in natural scene
CN113240058A (en) * 2021-07-13 2021-08-10 北京文安智能技术股份有限公司 License plate image training set construction method and license plate character detection model training method

Also Published As

Publication number Publication date
CN111325203B (en) 2022-07-05

Similar Documents

Publication Publication Date Title
CN111325203B (en) American license plate recognition method and system based on image correction
CN108960245B (en) Tire mold character detection and recognition method, device, equipment and storage medium
CN107609549A (en) The Method for text detection of certificate image under a kind of natural scene
CN111738055B (en) Multi-category text detection system and bill form detection method based on same
CN113128442A (en) Chinese character calligraphy style identification method and scoring method based on convolutional neural network
US20200134382A1 (en) Neural network training utilizing specialized loss functions
CN111783757A (en) OCR technology-based identification card recognition method in complex scene
CN109886978B (en) End-to-end alarm information identification method based on deep learning
CN110929746A (en) Electronic file title positioning, extracting and classifying method based on deep neural network
CN111652273B (en) Deep learning-based RGB-D image classification method
CN111523622B (en) Method for simulating handwriting by mechanical arm based on characteristic image self-learning
CN111626292B (en) Text recognition method of building indication mark based on deep learning technology
CN114155527A (en) Scene text recognition method and device
CN113033558B (en) Text detection method and device for natural scene and storage medium
CN109977834B (en) Method and device for segmenting human hand and interactive object from depth image
CN112580507A (en) Deep learning text character detection method based on image moment correction
CN114038004A (en) Certificate information extraction method, device, equipment and storage medium
CN113158895A (en) Bill identification method and device, electronic equipment and storage medium
CN113158977B (en) Image character editing method for improving FANnet generation network
CN114092938B (en) Image recognition processing method and device, electronic equipment and storage medium
CN113657377B (en) Structured recognition method for mechanical bill image
CN111340032A (en) Character recognition method based on application scene in financial field
CN113850324B (en) Multispectral target detection method based on Yolov4
CN114882204A (en) Automatic ship name recognition method
CN111832497B (en) Text detection post-processing method based on geometric features

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant