CN110717493B - License plate recognition method containing stacked characters based on deep learning - Google Patents

License plate recognition method containing stacked characters based on deep learning Download PDF

Info

Publication number
CN110717493B
CN110717493B CN201910870894.2A CN201910870894A CN110717493B CN 110717493 B CN110717493 B CN 110717493B CN 201910870894 A CN201910870894 A CN 201910870894A CN 110717493 B CN110717493 B CN 110717493B
Authority
CN
China
Prior art keywords
character
stacked
layer
recognition
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910870894.2A
Other languages
Chinese (zh)
Other versions
CN110717493A (en
Inventor
张三元
祁忠琪
涂凯
吴书楷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201910870894.2A priority Critical patent/CN110717493B/en
Publication of CN110717493A publication Critical patent/CN110717493A/en
Application granted granted Critical
Publication of CN110717493B publication Critical patent/CN110717493B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/273Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion removing elements interfering with the pattern to be recognised
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • G06V20/625License plates

Abstract

The invention discloses a license plate recognition method containing stacked characters based on deep learning. Constructing a stacked character recognition network, and training by using stacked character sample pictures and corresponding stacked character sample labels; acquiring a license plate region picture, and performing character detection on the license plate region picture by using a target detection technology; intercepting the area of which the label is detected to be a stacking character and sending the area into a stacking character recognition network for recognition; post-processing the result of identifying the character length which is not the designated character length; using a non-maximum value to inhibit and delete the pseudo character frame to obtain a real character frame; and splicing the recognition results of the real character frames from left to right to obtain a final license plate recognition result. The invention makes up the defect of the current license plate recognition technology that the license plate containing the stacked characters is not supported, and has extremely high robustness for the recognition of the complex license plate through the constructed complete license plate recognition system, and the invention has extremely high use value for the license plate recognition application.

Description

License plate recognition method containing stacked characters based on deep learning
Technical Field
The invention relates to the field of license plate image recognition, in particular to a license plate recognition method containing stacked characters based on deep learning.
Background
The intelligent traffic technology not only provides great convenience for people's travel, but also improves safety guarantee, wherein license plate recognition is used as important application for constructing intelligent traffic and intelligent cities, and plays an important role in the fields of road vehicle tracking, high-speed automatic charging, traffic law enforcement assistance, unattended parking lots and the like. The current license plate recognition application has higher recognition accuracy on license plates only containing single-line characters, but cannot support more complex and special license plates, wherein most of the license plates with stacked characters are not supported by the current license plate recognition technology. Therefore, the more universal license plate identification application is a key problem in the current license plate identification field.
Disclosure of Invention
In order to solve the problems in the background art, the invention provides a license plate recognition method containing stacked characters based on deep learning, which not only supports the license plate containing only a single line of characters, but also has extremely high robustness for the license plate with the stacked characters.
The technical scheme adopted by the invention comprises the following steps:
1) shooting, collecting and obtaining stacked character sample pictures in a license plate, wherein each stacked character sample picture is provided with a stacked character sample number, mapping each character in the stacked character sample numbers to a corresponding integer identifier according to a character dictionary, and connecting the stacked character sample numbers according to the original character sequence to obtain a stacked character sample label;
2) constructing a stacked character recognition network for recognizing the stacked character sample pictures, training the stacked character recognition network by using the stacked character sample pictures and the stacked character sample labels in the step 1), and storing weight parameters of the stacked character recognition network after the training is finished so as to obtain the trained stacked character recognition network;
3) acquiring a license plate picture, and detecting and acquiring single-row characters and stacked characters in the license plate picture by using a target detection method (MobileNet-SSD or Cascade) to acquire a single-row character frame and a stacked character frame as well as corresponding recognition results and confidence degrees;
the single-line character is a character having only one line above and below, and the stacked character is a character having at least two lines above and below, and the upper and lower LG characters in fig. 7 constitute the stacked character.
4) Processing the result obtained in the step 3) by using a non-maximum suppression algorithm to delete the pseudo character frame or the overlapped character frame to obtain a final true single-row character frame and a true stacked character frame; the dummy character box is a character box that does not actually contain a character, and the overlapping character box refers to a character box that has a large overlap ratio with other character boxes and a relatively small confidence.
5) Intercepting a stack character from an original license plate picture according to the true stack character frame to obtain the stack character as a stack character picture, and inputting the stack character picture into the stack character recognition network constructed in the step 2) to obtain a stack character recognition result;
6) post-processing the stacked character recognition result in the step 5): if the length of the stacked character recognition result is not the designated character length, directly dividing the stacked character picture in the dividing step 4) into small character pictures with the designated character lengths of the same height from top to bottom, and sequentially performing character recognition by using a character classification algorithm;
finally, splicing the recognition results of all the segmented small character pictures from top to bottom to serve as re-recognition results;
7) and splicing the re-recognition results of the real single-row character frames and the real stacked character frames from left to right in sequence to obtain the final recognition result of the license plate containing the stacked characters.
In the step 1), the stacked character sample pictures contain two or more characters in the vertical direction, each character is mapped to an integer mark from top to bottom according to the character dictionary in sequence, and the integer marks are connected in sequence to obtain stacked character sample labels; the character dictionary is a mapping relation between 37 characters and integer marks, the 37 characters are 10 Arabic numerals and 26 capital letters, a blank class is formed, and the integer marks are integer labels from 0 to 36. Where the blank class is used to represent a non-character class.
The stacked character recognition network in the step 2) is as follows:
2.1) the stacked character recognition network comprises three convolution layers, two network branches, a multiple convolution module, a maximum outer pooling layer, an average outer pooling layer, a merging layer, a dimension compression layer and a dimension transposition layer;
the method comprises the steps that an input picture sequentially passes through a first convolution layer, a maximum outer pooling layer and a multiple convolution module and then is respectively input into two network branches, the outputs of the two network branches are merged in channel dimensions through a merging layer, then sequentially passes through a second convolution layer, a third convolution layer and an average outer pooling layer, an input dimension compression layer compresses a third dimension, namely the dimension of the width of the input picture, and finally a dimension transposition layer transposes the first dimension and the second dimension (namely the dimensions of the number and the height of the input picture) to obtain the final network output;
the first network branch comprises a first maximum internal pooling layer, a multiple convolution module, a second maximum internal pooling layer, a first Dropout layer, a separable convolution residual module and a second Dropout layer which are connected in sequence;
the second network branch comprises a first inner convolution layer, a third maximum inner pooling layer, a separable convolution residual module, a fourth maximum inner pooling layer, a separable convolution residual module, a fifth maximum inner pooling layer, a third Dropout layer, a sixth maximum inner pooling layer and a fourth Dropout layer which are connected in sequence;
the multiple convolution module inputs a characteristic diagram and outputs the number of channels, the multiple convolution module comprises two branches, the first branch comprises a first two convolution layers, a compression expansion module and a second two convolution layers which are sequentially connected, the second branch only comprises one convolution layer, the input characteristic diagram respectively passes through the two branches, the outputs of the two branches are connected to an element-by-element addition layer for bitwise summation, and finally, the output of the two branches passes through an independent convolution layer to obtain the number of channels;
the separable convolution residual error module inputs a characteristic diagram and outputs the convolution step length of the number, height and width of channels; the separable convolution residual module comprises two branches, wherein the first branch comprises a separable convolution layer, a first batch normalization layer, a convolution layer and a second batch normalization layer which are sequentially connected, and the second branch only comprises one convolution layer; after the input characteristic graph passes through two branches respectively, the outputs of the two branches are connected to a merging layer to be merged on the channel dimension and then output to obtain the convolution step length of the channel number, the height and the width;
the compression expansion module inputs a characteristic diagram, the input characteristic diagram sequentially passes through a channel number acquisition layer, a global average pooling layer and two convolution layers to obtain a channel weight, and the channel weight is multiplied by the original input characteristic diagram to obtain the output of the compression expansion module;
all convolution layers except the first network branch and module and the maximum outer pooling layer adopt convolution kernels with the size of 3x3, and the convolution or pooling step length in the height direction and the width direction is (1, 1). The number of convolution kernels of the convolution layer in the first convolution layer of the network, the convolution layer in the second network branch and the two convolution layers after the two branches are combined are respectively 64,64,76 and 38. The number of output channels of the first multiple convolution module in the network structure is 128, the number of output channels of the two separable convolution residual modules in the second network branch is 128 and 256 respectively, and the height step and the width step are (2,2) and (1,1) respectively. The average outer pooling layer employed convolution kernel size of 1x15 with height and width steps of (2, 2). The convolution kernels of the two largest inner pooling layers in the second network branch are both 3x3, and the height step length and the width step length are (2,1) and (1,2) respectively; the number of output channels of the multiple convolution module is 256, the number of output channels of the separable convolution residual module is 256, and the height step and the width step are (1, 1).
The number of output channels of the multiple convolution module is C1, the height and width step lengths of the four convolution layers in the first branch are (1,1), the sizes of convolution kernels are 1x1, 3x3, 3x1 and 1x3 respectively, and the numbers of the convolution kernels are C1/2, C1/4, C1/2 and C1/2 respectively; the height and width step length of the convolutional layer in the second branch and the convolutional layer after element-by-element addition are both (1,1), the size of the convolutional kernel is 1x1, and the number of the convolutional kernels is C1/2 and C1 respectively.
The number of output channels of the separable convolution residual module is C2, the step length of high and wide convolution is (S1, S2), the convolution kernel size of the separable convolution layer in the first branch is 3x3, and the step length of high and wide convolution is (S1, S2); the convolution kernels in the convolution layers in the first branch and the second branch are (3,3), the number of the convolution kernels is C2/2, and the height step and the width step are (S1, S2) and (1,1) respectively.
The compression expansion module obtains the number of channels of the input feature map through the channel number acquisition layer, wherein the number of the channels is C3, the sizes of convolution kernels of the two convolution layers in the first branch are both 1x1, the height and width step lengths are both (1,1), and the number of the convolution kernels is C3.
In all the convolution layers, the second convolution layer except the compression and expansion module uses the sigmod as an activation function, and the other convolution layers all use the ReLU function as activation functions;
2.2) inputting the stacked character sample pictures obtained in the step 1) and the corresponding stacked character sample labels into a stacked character recognition network, training the stacked character recognition network by using an Adam optimization algorithm until the error of the deep learning classification network reaches the minimum value and keeps stable, and storing the weight parameter data in the indefinite number license plate number recognition network at the moment so as to obtain the trained stacked character recognition network; wherein the initial value of learning rate is set to 0.001, the attenuation speed is 3000 steps, and the attenuation rate is 0.9; the loss function adopts CTC loss, the Dropout layers adopt 0.5 node retention rate during training, and the mean value and variance of the batch normalization layer adopt 0.8 moving average coefficients. The number of pictures in a batch per training iteration is 32 pictures.
In the network structure constructed by the invention, the height and width step length of the pooling of the first maximum internal pooling layer and the second maximum internal pooling layer in the first branch of the whole network are (2,1) and (1,2) in sequence, and through the design of the step length with unequal width and height, the network can learn the pixel information of cross-row and cross-column in the characteristic diagram sequentially and independently, so that the network can achieve better identification effect; the second branch of the multiple convolution module and the second branch of the separable convolution residual error module are jump connection structures in the deep learning network, the structures and the batch normalization layer play the same role, and can help the loss of the network in training to quickly converge to the minimum value under the condition of excessive network layer number or excessively complex network, so that the training efficiency is improved; the separable convolution layer in the separable convolution residual module can ensure that the parameter quantity is greatly reduced under the condition of not reducing the network performance, and the network identification efficiency is improved; the first branch of the compression and expansion module can independently learn the importance degree of different channels in the feature diagram through the network, increase the weight of important channels and reduce the weight of unimportant channels so as to help the whole stacked character recognition network to achieve the best recognition effect.
The step 3) is specifically as follows: and detecting all letters, numbers, stacked characters and the like in the license plate picture by using a target detection algorithm (Cascade or Mobile-SSD algorithm) with the category number of 37 to obtain a single-row character frame, a stacked character frame, a corresponding recognition result and a corresponding confidence coefficient. Wherein 37 classes of the target detection algorithm are background class +10 digits +25 capital letters (except for capital letter O) + a stacked character class.
The step 4) is specifically as follows:
4.1) combining the single-row character frames and the stacked character frames obtained in the step 3) into candidate character frames, calculating the intersection ratio between every two character frames in the candidate character frames, and classifying the character frames with the intersection ratio larger than the intersection ratio threshold value into a cluster;
and 4.2) taking the character frame with the highest confidence coefficient from each cluster of character frames to form a final true character frame, wherein the true character frame consists of a true single-row character frame and a true stacking character frame, and then dividing the true character frame into the true single-row character frame and the true stacking character frame according to the original classification of each character frame.
The step 5) is specifically as follows:
inputting the stacked character pictures into a stacked character recognition network to obtain a label prediction probability distribution matrix Adj of the stacked character pictures, sequentially processing the label prediction probability distribution matrix Adj by using a tf.nn.ctc _ beam _ search _ decoder function and a tf.sparse _ to _ dense function of tensor flow software (tenserflow) to obtain an intermediate prediction result, wherein each element value in the intermediate prediction result is an integer identifier of each predicted stacked character, and the integer identifiers are inversely mapped to corresponding characters according to a character dictionary to obtain a stacked character recognition result, so that the recognition of the stacked character pictures is completed.
The size of the label prediction probability distribution matrix Adj is Wtext37, each element Adj (m, n) in the label prediction probability distribution matrix Adj represents the probability that the m-th row of pixels in the input picture to be tested is the n-th type of characters; wherein WtextFor the height of the license plate number region sample picture, m represents a matrix column index (m ═ 0,1,2text-1) And n denotes a character category subscript (n ═ 0,1,2 … 36).
The step 6) is specifically as follows: designating the length N of the stacked character, comparing the recognition result length of the stacked character in the step 5) with the designated character length, if the lengths are not equal, sequentially dividing the stacked character picture in equal proportion from top to bottom to obtain N small character pictures, sequentially recognizing the N small character pictures by using a character recognition algorithm (such as neural network or template matching), splicing the recognition results to obtain a recognition result of the character string type with the length N, and using the recognition result character string as the final recognition result of the true stacked character frame; if the length of the stack character is not specified, the original real stack character frame recognition result is kept unchanged.
The invention has the beneficial effects that:
aiming at the difficult problem of complex license plate recognition in the current license plate recognition field, the invention mainly discloses a license plate recognition method capable of recognizing stacked characters, and the invention makes up the vacancy of the current license plate recognition field and the defect of the current license plate recognition technology that the license plate containing the stacked characters is not supported, thereby greatly improving the universality of license plate recognition, having extremely high robustness for the recognition of the complex license plate and extremely improving the application of the license plate recognition.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a block diagram of a stacked character recognition network;
FIG. 3 is a block diagram of a multiple convolution module;
FIG. 4 is a block diagram of a separable convolution residual module;
FIG. 5 is a block diagram of a compression expansion module;
FIG. 6 is a sample picture of stacked characters;
fig. 7 is a schematic diagram of the recognition result.
Detailed Description
The invention is described in further detail below with reference to the figures and the embodiments.
As shown in fig. 1, the implementation of the present invention is as follows:
the method comprises the following steps: acquiring stacked character sample pictures in the license plate as shown in fig. 6, wherein each stacked character sample picture has a stacked character sample number, mapping each character in the stacked character sample numbers to a corresponding integer mark according to a character dictionary, and connecting according to the original character sequence to obtain a stacked character sample label. The stacked character sample picture comprises two or more than two characters in the vertical direction, each character is mapped to an integer mark from top to bottom according to the character dictionary in sequence, and the integer marks are connected in sequence to obtain the stacked character sample label. The character dictionary is a mapping relation between 37 characters and integer marks, the 37 characters are 10 Arabic numerals and 26 capital letters, a blank class is formed, and the integer marks are integer labels from 0 to 36. Where the blank class is used to represent a non-character class.
Step two: constructing a stacked character recognition network for recognizing the stacked character sample pictures, training the stacked character recognition network by using the stacked character sample pictures and the stacked character sample labels in the step 1), and storing weight parameters of the stacked character recognition network after the training is finished, thereby obtaining the trained stacked character recognition network.
2.1) as shown in FIG. 2, the stacked character recognition network includes three convolutional layers, two network branches, one multiple convolutional module, a max outer pooling layer, an average outer pooling layer, a merging layer, a dimension compression layer, and a dimension transposition layer. The input picture sequentially passes through a first convolution layer, a maximum outer pooling layer and a multiple convolution module and then respectively enters two network branches, output results of the two network branches are merged in a channel dimension through a merging layer, then sequentially passes through the two convolution layers and an average outer pooling layer, then enters a dimension compression layer and compresses a third dimension (namely the dimension of the width of the input picture), and finally the first dimension and the second dimension (namely the dimensions of the number and the height of the input picture respectively) are transposed through a dimension transposition layer to obtain final network output.
The first network branch comprises a first maximum inner pooling layer, a multiple convolution module, a second maximum inner pooling layer, a Dropout layer, a separable convolution residual module and a Dropout layer which are connected in sequence.
The second network branch comprises a convolution layer, a third maximum inner pooling layer, a separable convolution residual module, a fourth maximum inner pooling layer, a separable convolution residual module, a fifth maximum inner pooling layer, a Dropout layer, a sixth maximum inner pooling layer and a Dropout layer which are connected in sequence.
As shown in fig. 3, the multiple convolution module has two input parameters, which are the input feature map and the number of output channels. The module comprises two branches, wherein the first branch comprises two convolution layers, a compression expansion module and two convolution layers which are connected in sequence; the second branch comprises only one convolutional layer. And after the input characteristic diagram passes through two branches respectively, connecting the outputs of the two branches to an element-by-element addition layer for bitwise summation, and obtaining the final module output after passing through a convolution layer.
As shown in fig. 4, the separable convolution residual module has four input parameters, which are input feature map, number of output channels, height and width convolution step size. The module comprises two branches, wherein the first branch comprises a separable convolution layer, a first batch normalization layer, a convolution layer and a second batch normalization layer which are sequentially connected; the second branch comprises only one convolutional layer. And after the input characteristic diagram passes through the two branches respectively, connecting the outputs of the two branches to a merging layer, and merging the outputs in the channel dimension to obtain the output of the final module.
As shown in FIG. 5, the companding module has an output parameter, i.e., an input feature map. And the input feature map sequentially passes through a channel number acquisition layer, a global average pooling layer and two convolution layers to obtain channel weights, and the channel weights are multiplied by the original input feature map to obtain the output of the final module.
All convolution layers except the first network branch and module and the maximum outer pooling layer adopt convolution kernels with the size of 3x3, and the convolution or pooling step length in the height direction and the width direction is (1, 1). The number of convolution kernels of the convolution layer in the first convolution layer of the network, the convolution layer in the second network branch and the two convolution layers after the two branches are combined are respectively 64,64,76 and 38. The number of output channels of the first multiple convolution module in the network structure is 128, the number of output channels of the two separable convolution residual modules in the second network branch is 128 and 256 respectively, and the height step and the width step are (2,2) and (1,1) respectively. The average outer pooling layer employed convolution kernel size of 1x15 with height and width steps of (2, 2). The convolution kernels of the two largest inner pooling layers in the second network branch are both 3x3, and the height step length and the width step length are (2,1) and (1,2) respectively; the number of output channels of the multiple convolution module is 256, the number of output channels of the separable convolution residual module is 256, and the height step and the width step are (1, 1).
The number of output channels of the multiple convolution module is C1, the height and width step lengths of the four convolution layers in the first branch are (1,1), the sizes of convolution kernels are 1x1, 3x3, 3x1 and 1x3 respectively, and the numbers of the convolution kernels are C1/2, C1/4, C1/2 and C1/2 respectively; the height and width step length of the convolutional layer in the second branch and the convolutional layer after element-by-element addition are both (1,1), the size of the convolutional kernel is 1x1, and the number of the convolutional kernels is C1/2 and C1 respectively.
The number of output channels of the separable convolution residual module is C2, the step length of high and wide convolution is (S1, S2), the convolution kernel size of the separable convolution layer in the first branch is 3x3, and the step length of high and wide convolution is (S1, S2); the convolution kernels in the convolution layers in the first branch and the second branch are (3,3), the number of the convolution kernels is C2/2, and the height step and the width step are (S1, S2) and (1,1) respectively.
The compression expansion module obtains the number of channels of the input feature map through the channel number acquisition layer, wherein the number of the channels is C3, the sizes of convolution kernels of the two convolution layers in the first branch are both 1x1, the height and width step lengths are both (1,1), and the number of the convolution kernels is C3.
And (3) the above convolution layers except the second convolution layer in the compression and expansion module use the sigmod as an activation function, and the other convolution layers all use the ReLU function as activation functions.
2.2) dividing the obtained stacked character sample picture (shown in figure 6) and the corresponding stacked character sample label into a training set and a testing set according to the proportion of 7:3, inputting the training set into a stacked character recognition network, training the stacked character recognition network by using an Adam optimization algorithm until the error of the deep learning classification network reaches the minimum value and keeps stable, and storing the weight parameter data in the indefinite long license plate number recognition network at the moment; wherein the initial value of learning rate is set to 0.001, the attenuation speed is 3000 steps, and the attenuation rate is 0.9; the loss function adopts CTC loss, the Dropout layers adopt 0.5 node retention rate during training, and the mean value and variance of the batch normalization layer adopt 0.8 moving average coefficients. The number of pictures in a batch per training iteration is 32 pictures.
In the network structure constructed by the invention, the height and width step length of the pooling of the first maximum internal pooling layer and the second maximum internal pooling layer in the first branch of the whole network are (2,1) and (1,2) in sequence, and through the design of the step length with unequal width and height, the network can learn the pixel information of cross-row and cross-column in the characteristic diagram sequentially and independently, so that the network can achieve better identification effect; the second branch of the multiple convolution module and the second branch of the separable convolution residual error module are jump connection structures in the deep learning network, the structures and the batch normalization layer play the same role, and can help the loss of the network in training to quickly converge to the minimum value under the condition of excessive network layer number or excessively complex network, so that the training efficiency is improved; the separable convolution layer in the separable convolution residual module can ensure that the parameter quantity is greatly reduced under the condition of not reducing the network performance, and the network identification efficiency is improved; the first branch of the compression and expansion module can independently learn the importance degree of different channels in the feature diagram through the network, increase the weight of important channels and reduce the weight of unimportant channels so as to help the whole stacked character recognition network to achieve the best recognition effect.
Step three: acquiring a license plate picture, and detecting all letters, numbers, stacked characters and the like in the license plate picture by using a target detection algorithm (Cascade or Mobile-SSD algorithm) with 37 types of types to obtain a single-row character frame, a stacked character frame, a corresponding recognition result and a corresponding confidence coefficient. Wherein 37 classes of the target detection algorithm are background class +10 digits +25 capital letters (except for capital letter O) + a stacked character class.
Step four: and deleting the false character boxes or the overlapped character boxes in the step three by using a non-maximum suppression algorithm to obtain the final true single-row character box and the true stacked character box. The character box is a character box which does not actually contain characters, and the overlapped character box refers to a character box which has a larger overlapping proportion with other character boxes and relatively smaller confidence coefficient. The method comprises the following specific steps:
4.1) set the cross-over ratio threshold to 0.7,
4.2) combining the single-row character frame and the stacked character frame obtained in the step 3) into a candidate character frame. And calculating the intersection ratio between every two character frames in the candidate character frames, and classifying the character frames with the intersection ratio larger than the intersection ratio threshold value into a cluster.
And 4.3) taking the character box with the highest confidence coefficient in each cluster of character boxes to form a final true character box, wherein the true character box consists of a true single-row character box and a true stack character box.
Step five: and intercepting the original license plate picture according to the true stacking character frame to obtain a stacking character as a stacking character picture, and inputting the stacking character recognition network constructed in the second step to obtain a stacking character recognition result. The specific process is as follows: inputting the stacked character pictures into a stacked character recognition network to obtain a tag prediction probability distribution matrix Adj of the stacked character pictures, processing the tag prediction probability distribution matrix Adj by using a tf.nn.ctc _ beam _ search _ decoder function and a tf.sparse _ to _ dense function carried by tensoflow in sequence to obtain an intermediate prediction result, wherein each element value in the intermediate prediction result is an integer identifier of each predicted stacked character, and the integer identifiers are inversely mapped to corresponding characters according to a character dictionary to obtain a stacked character recognition result so as to finish the recognition of the stacked character pictures;
the size of the label prediction probability distribution matrix Adj is Wtext37, each element Adj (m, n) in the label prediction probability distribution matrix Adj represents the probability that the m-th row of pixels in the input picture to be tested is the n-th type of characters; wherein WtextIs the height of a sample picture of the license plate number area, m represents the index of the matrix column (m is 0,1,2,3 …, W)text-1), n denotes a character category subscript (n ═ 0,1,2 … 36).
Step six: and D, post-processing the stacked character result in the step five. And B, designating the length N of the stacked character, comparing the recognition result length of the stacked character in the step five with the designated character length, if the lengths are not equal, sequentially dividing the stacked character picture in equal proportion from top to bottom to obtain N small character pictures, sequentially recognizing the N small character pictures by using a character recognition algorithm (such as neural network or template matching), splicing the recognition results to obtain a recognition result character string with the length N, and using the recognition result character string as the final recognition result of the true stacked character frame. If the length of the stack character is not specified, the original real stack character frame recognition result is kept unchanged.
Step seven: and splicing the recognition results of the real single-row character frames and the real stacked character frames from left to right in sequence to obtain the final recognition result of the license plate containing the stacked characters.
The specific embodiment is as follows:
the invention trains the stacked character recognition network by respectively adopting the stacked character sample picture and the corresponding stacked character sample label as shown in fig. 6, wherein the stacked character sample picture can contain two or more than two other characters in the vertical direction.
In this embodiment, 13692 stacked character pictures are collected, and the training set are divided according to the ratio of 7:3, and the stacked character recognition network is trained by using the training set. Finally, 14802 license plate pictures are tested, wherein 3303 license plates containing stacked characters are included, and finally the recognition accuracy of 96.45% on a 14802 total license plate picture set is achieved, and the recognition accuracy of 95.03% on 3303 license plates containing stacked characters is achieved. Fig. 7 is a schematic diagram of recognition results of two examples of license plates with stacked characters.
The detection results of the embodiments show that the method of the invention has higher recognition accuracy for the license plate which generally only contains single line characters, and also has higher recognition accuracy for the complex license plate which contains stacked characters, and has great application potential in the field of license plate recognition.
The foregoing detailed description is intended to illustrate and not limit the invention, which is intended to be within the spirit and scope of the appended claims, and any changes and modifications that fall within the true spirit and scope of the invention are intended to be covered by the following claims.

Claims (6)

1. A license plate recognition method containing stacked characters based on deep learning is characterized in that: the method comprises the following steps:
1) shooting, collecting and obtaining stacked character sample pictures in a license plate, wherein each stacked character sample picture is provided with a stacked character sample number, mapping each character in the stacked character sample numbers to a corresponding integer identifier according to a character dictionary, and connecting the stacked character sample numbers according to the original character sequence to obtain a stacked character sample label;
2) constructing a stacked character recognition network for recognizing the stacked character sample pictures, training the stacked character recognition network by using the stacked character sample pictures and the stacked character sample labels in the step 1), and storing weight parameters of the stacked character recognition network after the training is finished so as to obtain the trained stacked character recognition network;
3) acquiring a license plate picture, and detecting and acquiring single-row characters and stacked characters in the license plate picture by using a target detection method to obtain single-row character frames and stacked character frames and corresponding recognition results and confidence degrees;
4) processing the result obtained in the step 3) by using a non-maximum suppression algorithm to delete the pseudo character frame or the overlapped character frame to obtain a final true single-row character frame and a true stacked character frame;
5) intercepting a stack character from an original license plate picture according to the true stack character frame to obtain the stack character as a stack character picture, and inputting the stack character picture into the stack character recognition network constructed in the step 2) to obtain a stack character recognition result;
6) post-processing the stacked character recognition result in the step 5): if the length of the stacked character recognition result is not the designated character length, directly dividing the stacked character picture in the dividing step 4) into small character pictures with the designated character lengths of the same height from top to bottom, and sequentially performing character recognition by using a character classification algorithm;
finally, splicing the recognition results of all the segmented small character pictures from top to bottom to serve as re-recognition results;
7) splicing the re-recognition results of the real single-row character frames and the real stacked character frames from left to right in sequence to obtain the final recognition result of the license plate containing the stacked characters;
the stacked character recognition network in the step 2) is as follows:
2.1) the stacked character recognition network comprises three convolution layers, two network branches, a multiple convolution module, a maximum outer pooling layer, an average outer pooling layer, a merging layer, a dimension compression layer and a dimension transposition layer;
the input picture sequentially passes through a first convolution layer, a maximum outer pooling layer and a multiple convolution module and then is respectively input into two network branches, the outputs of the two network branches are merged in the channel dimension through a merging layer, then sequentially passes through a second convolution layer, a third convolution layer and an average outer pooling layer, an input dimension compression layer compresses the third dimension, and finally the first dimension and the second dimension are transposed through a dimension transposition layer to obtain the final network output;
the first network branch comprises a first maximum internal pooling layer, a multiple convolution module, a second maximum internal pooling layer, a first Dropout layer, a separable convolution residual module and a second Dropout layer which are connected in sequence;
the second network branch comprises a first inner convolution layer, a third maximum inner pooling layer, a separable convolution residual module, a fourth maximum inner pooling layer, a separable convolution residual module, a fifth maximum inner pooling layer, a third Dropout layer, a sixth maximum inner pooling layer and a fourth Dropout layer which are connected in sequence;
the multiple convolution module inputs a characteristic diagram and outputs the number of channels, the multiple convolution module comprises two branches, the first branch comprises a first two convolution layers, a compression expansion module and a second two convolution layers which are sequentially connected, the second branch only comprises one convolution layer, the input characteristic diagram respectively passes through the two branches, the outputs of the two branches are connected to an element-by-element addition layer for bitwise summation, and finally, the output of the two branches passes through an independent convolution layer to obtain the number of channels;
the separable convolution residual error module inputs a characteristic diagram and outputs the convolution step length of the number, height and width of channels; the separable convolution residual module comprises two branches, wherein the first branch comprises a separable convolution layer, a first batch normalization layer, a convolution layer and a second batch normalization layer which are sequentially connected, and the second branch only comprises one convolution layer; after the input characteristic graph passes through two branches respectively, the outputs of the two branches are connected to a merging layer to be merged on the channel dimension and then output to obtain the convolution step length of the channel number, the height and the width;
the compression expansion module inputs a characteristic diagram, the input characteristic diagram sequentially passes through a channel number acquisition layer, a global average pooling layer and two convolution layers to obtain a channel weight, and the channel weight is multiplied by the original input characteristic diagram to obtain the output of the compression expansion module;
in all the convolution layers, the second convolution layer except the compression and expansion module uses the sigmod as an activation function, and the other convolution layers all use the ReLU function as activation functions;
2.2) inputting the stacked character sample pictures obtained in the step 1) and the corresponding stacked character sample labels into a stacked character recognition network, training the stacked character recognition network by using an Adam optimization algorithm until the error of the deep learning classification network reaches the minimum value and keeps stable, and storing the weight parameter data in the indefinite number license plate number recognition network at the moment so as to obtain the trained stacked character recognition network; wherein the initial value of learning rate is set to 0.001, the attenuation speed is 3000 steps, and the attenuation rate is 0.9; the loss function adopts CTC loss, the Dropout layers adopt 0.5 node retention rate during training, and the mean value and variance of the batch normalization layer adopt 0.8 moving average coefficients.
2. The deep learning-based license plate recognition method containing stacked characters according to claim 1, characterized in that: in the step 1), the stacked character sample pictures contain two or more characters in the vertical direction, each character is mapped to an integer mark from top to bottom according to the character dictionary in sequence, and the integer marks are connected in sequence to obtain stacked character sample labels; the character dictionary is a mapping relation between 37 characters and integer marks, the 37 characters are 10 Arabic numerals and 26 capital letters, a blank class is formed, and the integer marks are integer labels from 0 to 36.
3. The deep learning-based license plate recognition method containing stacked characters according to claim 1, characterized in that: the step 3) is specifically as follows: and detecting all letters, numbers, stacked characters and the like in the license plate picture by using a target detection algorithm Cascade or Mobile-SSD algorithm with the category number of 37 to obtain a single-row character frame, a stacked character frame, a corresponding recognition result and a corresponding confidence coefficient.
4. The deep learning-based license plate recognition method containing stacked characters according to claim 1, characterized in that: the step 4) is specifically as follows:
4.1) combining the single-row character frames and the stacked character frames obtained in the step 3) into candidate character frames, calculating the intersection ratio between every two character frames in the candidate character frames, and classifying the character frames with the intersection ratio larger than the intersection ratio threshold value into a cluster;
and 4.2) taking the character frame with the highest confidence coefficient from each cluster of character frames to form a final true character frame, wherein the true character frame consists of a true single-row character frame and a true stacking character frame, and then dividing the true character frame into the true single-row character frame and the true stacking character frame according to the original classification of each character frame.
5. The deep learning-based license plate recognition method containing stacked characters according to claim 1, characterized in that: the step 5) is specifically as follows: inputting the stacked character pictures into a stacked character recognition network to obtain a label prediction probability distribution matrix Adj of the stacked character pictures, processing the label prediction probability distribution matrix Adj to obtain an intermediate prediction result, wherein each element value in the intermediate prediction result is an integer identifier of each predicted stacked character, and mapping the integer identifiers to corresponding characters according to a character dictionary to obtain a stacked character recognition result and finish the recognition of the stacked character pictures.
6. The deep learning-based license plate recognition method containing stacked characters according to claim 1, characterized in that: the step 6) is specifically as follows: designating the length N of the stacked character, comparing the recognition result length of the stacked character in the step 5) with the designated character length, if the lengths are not equal, sequentially dividing the stacked character picture in equal proportion from top to bottom to obtain N small character pictures, sequentially recognizing the N small character pictures by using a character recognition algorithm, splicing the recognition results to obtain a recognition result of the character string type with the length of N, and using the recognition result character string as the final recognition result of the true stacked character frame; if the length of the stack character is not specified, the original real stack character frame recognition result is kept unchanged.
CN201910870894.2A 2019-09-16 2019-09-16 License plate recognition method containing stacked characters based on deep learning Active CN110717493B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910870894.2A CN110717493B (en) 2019-09-16 2019-09-16 License plate recognition method containing stacked characters based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910870894.2A CN110717493B (en) 2019-09-16 2019-09-16 License plate recognition method containing stacked characters based on deep learning

Publications (2)

Publication Number Publication Date
CN110717493A CN110717493A (en) 2020-01-21
CN110717493B true CN110717493B (en) 2022-04-01

Family

ID=69210471

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910870894.2A Active CN110717493B (en) 2019-09-16 2019-09-16 License plate recognition method containing stacked characters based on deep learning

Country Status (1)

Country Link
CN (1) CN110717493B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111414911A (en) * 2020-03-23 2020-07-14 湖南信息学院 Card number identification method and system based on deep learning
CN111881914B (en) * 2020-06-23 2024-02-13 安徽清新互联信息科技有限公司 License plate character segmentation method and system based on self-learning threshold
CN113239854B (en) * 2021-05-27 2023-12-19 北京环境特性研究所 Ship identity recognition method and system based on deep learning
CN114332843B (en) * 2022-03-14 2022-07-08 浙商银行股份有限公司 Click verification code identification method and device based on double-current twin convolutional network

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108009543A (en) * 2017-11-29 2018-05-08 深圳市华尊科技股份有限公司 A kind of licence plate recognition method and device
CN109165643A (en) * 2018-08-21 2019-01-08 浙江工业大学 A kind of licence plate recognition method based on deep learning
CN110210475A (en) * 2019-05-06 2019-09-06 浙江大学 A kind of characters on license plate image partition method of non-binaryzation and edge detection

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9384423B2 (en) * 2013-05-28 2016-07-05 Xerox Corporation System and method for OCR output verification

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108009543A (en) * 2017-11-29 2018-05-08 深圳市华尊科技股份有限公司 A kind of licence plate recognition method and device
CN109165643A (en) * 2018-08-21 2019-01-08 浙江工业大学 A kind of licence plate recognition method based on deep learning
CN110210475A (en) * 2019-05-06 2019-09-06 浙江大学 A kind of characters on license plate image partition method of non-binaryzation and edge detection

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Effects of License Plate Attributes on Automatic License Plate Recognition;Findley D J,et al;《Transportation research record》;20131231;第34-44页 *
Toward end-to-end car license plate detection and recognition with deep neural networks;Li H,et al;《 IEEE Transactions on Intelligent Transportation Systems》;20180802;第1-11页 *
基于图像分析和深度学习的船名识别识别与识别研究;刘宝龙;《中国优秀博硕士学位论文全文数据库(博士)工程科技Ⅱ辑》;20190115;C034-57 *

Also Published As

Publication number Publication date
CN110717493A (en) 2020-01-21

Similar Documents

Publication Publication Date Title
CN110717493B (en) License plate recognition method containing stacked characters based on deep learning
CN113850825B (en) Remote sensing image road segmentation method based on context information and multi-scale feature fusion
CN106557579B (en) Vehicle model retrieval system and method based on convolutional neural network
CN111310773A (en) Efficient license plate positioning method of convolutional neural network
CN113688836A (en) Real-time road image semantic segmentation method and system based on deep learning
CN114359130A (en) Road crack detection method based on unmanned aerial vehicle image
CN111178451A (en) License plate detection method based on YOLOv3 network
CN113177560A (en) Universal lightweight deep learning vehicle detection method
CN112990065A (en) Optimized YOLOv5 model-based vehicle classification detection method
CN109325407B (en) Optical remote sensing video target detection method based on F-SSD network filtering
CN114913498A (en) Parallel multi-scale feature aggregation lane line detection method based on key point estimation
CN112949633A (en) Improved YOLOv 3-based infrared target detection method
CN111353396A (en) Concrete crack segmentation method based on SCSEOCUnet
Zang et al. Traffic lane detection using fully convolutional neural network
CN116206112A (en) Remote sensing image semantic segmentation method based on multi-scale feature fusion and SAM
CN116524189A (en) High-resolution remote sensing image semantic segmentation method based on coding and decoding indexing edge characterization
CN111310820A (en) Foundation meteorological cloud chart classification method based on cross validation depth CNN feature integration
CN114639067A (en) Multi-scale full-scene monitoring target detection method based on attention mechanism
CN112668662B (en) Outdoor mountain forest environment target detection method based on improved YOLOv3 network
CN109284752A (en) A kind of rapid detection method of vehicle
CN115205568B (en) Road traffic multi-element detection method based on multi-scale feature fusion
CN116630702A (en) Pavement adhesion coefficient prediction method based on semantic segmentation network
CN116363072A (en) Light aerial image detection method and system
CN113160291B (en) Change detection method based on image registration
CN115294548A (en) Lane line detection method based on position selection and classification method in row direction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant