CN110717493B

CN110717493B - License plate recognition method containing stacked characters based on deep learning

Info

Publication number: CN110717493B
Application number: CN201910870894.2A
Authority: CN
Inventors: 张三元; 祁忠琪; 涂凯; 吴书楷
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2019-09-16
Filing date: 2019-09-16
Publication date: 2022-04-01
Anticipated expiration: 2039-09-16
Also published as: CN110717493A

Abstract

The invention discloses a license plate recognition method containing stacked characters based on deep learning. Constructing a stacked character recognition network, and training by using stacked character sample pictures and corresponding stacked character sample labels; acquiring a license plate region picture, and performing character detection on the license plate region picture by using a target detection technology; intercepting the area of which the label is detected to be a stacking character and sending the area into a stacking character recognition network for recognition; post-processing the result of identifying the character length which is not the designated character length; using a non-maximum value to inhibit and delete the pseudo character frame to obtain a real character frame; and splicing the recognition results of the real character frames from left to right to obtain a final license plate recognition result. The invention makes up the defect of the current license plate recognition technology that the license plate containing the stacked characters is not supported, and has extremely high robustness for the recognition of the complex license plate through the constructed complete license plate recognition system, and the invention has extremely high use value for the license plate recognition application.

Description

License plate recognition method containing stacked characters based on deep learning

Technical Field

The invention relates to the field of license plate image recognition, in particular to a license plate recognition method containing stacked characters based on deep learning.

Background

The intelligent traffic technology not only provides great convenience for people's travel, but also improves safety guarantee, wherein license plate recognition is used as important application for constructing intelligent traffic and intelligent cities, and plays an important role in the fields of road vehicle tracking, high-speed automatic charging, traffic law enforcement assistance, unattended parking lots and the like. The current license plate recognition application has higher recognition accuracy on license plates only containing single-line characters, but cannot support more complex and special license plates, wherein most of the license plates with stacked characters are not supported by the current license plate recognition technology. Therefore, the more universal license plate identification application is a key problem in the current license plate identification field.

Disclosure of Invention

In order to solve the problems in the background art, the invention provides a license plate recognition method containing stacked characters based on deep learning, which not only supports the license plate containing only a single line of characters, but also has extremely high robustness for the license plate with the stacked characters.

The technical scheme adopted by the invention comprises the following steps:

1) shooting, collecting and obtaining stacked character sample pictures in a license plate, wherein each stacked character sample picture is provided with a stacked character sample number, mapping each character in the stacked character sample numbers to a corresponding integer identifier according to a character dictionary, and connecting the stacked character sample numbers according to the original character sequence to obtain a stacked character sample label;

2) constructing a stacked character recognition network for recognizing the stacked character sample pictures, training the stacked character recognition network by using the stacked character sample pictures and the stacked character sample labels in the step 1), and storing weight parameters of the stacked character recognition network after the training is finished so as to obtain the trained stacked character recognition network;

3) acquiring a license plate picture, and detecting and acquiring single-row characters and stacked characters in the license plate picture by using a target detection method (MobileNet-SSD or Cascade) to acquire a single-row character frame and a stacked character frame as well as corresponding recognition results and confidence degrees;

the single-line character is a character having only one line above and below, and the stacked character is a character having at least two lines above and below, and the upper and lower LG characters in fig. 7 constitute the stacked character.

4) Processing the result obtained in the step 3) by using a non-maximum suppression algorithm to delete the pseudo character frame or the overlapped character frame to obtain a final true single-row character frame and a true stacked character frame; the dummy character box is a character box that does not actually contain a character, and the overlapping character box refers to a character box that has a large overlap ratio with other character boxes and a relatively small confidence.

5) Intercepting a stack character from an original license plate picture according to the true stack character frame to obtain the stack character as a stack character picture, and inputting the stack character picture into the stack character recognition network constructed in the step 2) to obtain a stack character recognition result;

6) post-processing the stacked character recognition result in the step 5): if the length of the stacked character recognition result is not the designated character length, directly dividing the stacked character picture in the dividing step 4) into small character pictures with the designated character lengths of the same height from top to bottom, and sequentially performing character recognition by using a character classification algorithm;

finally, splicing the recognition results of all the segmented small character pictures from top to bottom to serve as re-recognition results;

7) and splicing the re-recognition results of the real single-row character frames and the real stacked character frames from left to right in sequence to obtain the final recognition result of the license plate containing the stacked characters.

In the step 1), the stacked character sample pictures contain two or more characters in the vertical direction, each character is mapped to an integer mark from top to bottom according to the character dictionary in sequence, and the integer marks are connected in sequence to obtain stacked character sample labels; the character dictionary is a mapping relation between 37 characters and integer marks, the 37 characters are 10 Arabic numerals and 26 capital letters, a blank class is formed, and the integer marks are integer labels from 0 to 36. Where the blank class is used to represent a non-character class.

The stacked character recognition network in the step 2) is as follows:

2.1) the stacked character recognition network comprises three convolution layers, two network branches, a multiple convolution module, a maximum outer pooling layer, an average outer pooling layer, a merging layer, a dimension compression layer and a dimension transposition layer;

the method comprises the steps that an input picture sequentially passes through a first convolution layer, a maximum outer pooling layer and a multiple convolution module and then is respectively input into two network branches, the outputs of the two network branches are merged in channel dimensions through a merging layer, then sequentially passes through a second convolution layer, a third convolution layer and an average outer pooling layer, an input dimension compression layer compresses a third dimension, namely the dimension of the width of the input picture, and finally a dimension transposition layer transposes the first dimension and the second dimension (namely the dimensions of the number and the height of the input picture) to obtain the final network output;

the first network branch comprises a first maximum internal pooling layer, a multiple convolution module, a second maximum internal pooling layer, a first Dropout layer, a separable convolution residual module and a second Dropout layer which are connected in sequence;

the second network branch comprises a first inner convolution layer, a third maximum inner pooling layer, a separable convolution residual module, a fourth maximum inner pooling layer, a separable convolution residual module, a fifth maximum inner pooling layer, a third Dropout layer, a sixth maximum inner pooling layer and a fourth Dropout layer which are connected in sequence;

the multiple convolution module inputs a characteristic diagram and outputs the number of channels, the multiple convolution module comprises two branches, the first branch comprises a first two convolution layers, a compression expansion module and a second two convolution layers which are sequentially connected, the second branch only comprises one convolution layer, the input characteristic diagram respectively passes through the two branches, the outputs of the two branches are connected to an element-by-element addition layer for bitwise summation, and finally, the output of the two branches passes through an independent convolution layer to obtain the number of channels;

the separable convolution residual error module inputs a characteristic diagram and outputs the convolution step length of the number, height and width of channels; the separable convolution residual module comprises two branches, wherein the first branch comprises a separable convolution layer, a first batch normalization layer, a convolution layer and a second batch normalization layer which are sequentially connected, and the second branch only comprises one convolution layer; after the input characteristic graph passes through two branches respectively, the outputs of the two branches are connected to a merging layer to be merged on the channel dimension and then output to obtain the convolution step length of the channel number, the height and the width;

the compression expansion module inputs a characteristic diagram, the input characteristic diagram sequentially passes through a channel number acquisition layer, a global average pooling layer and two convolution layers to obtain a channel weight, and the channel weight is multiplied by the original input characteristic diagram to obtain the output of the compression expansion module;

all convolution layers except the first network branch and module and the maximum outer pooling layer adopt convolution kernels with the size of 3x3, and the convolution or pooling step length in the height direction and the width direction is (1, 1). The number of convolution kernels of the convolution layer in the first convolution layer of the network, the convolution layer in the second network branch and the two convolution layers after the two branches are combined are respectively 64,64,76 and 38. The number of output channels of the first multiple convolution module in the network structure is 128, the number of output channels of the two separable convolution residual modules in the second network branch is 128 and 256 respectively, and the height step and the width step are (2,2) and (1,1) respectively. The average outer pooling layer employed convolution kernel size of 1x15 with height and width steps of (2, 2). The convolution kernels of the two largest inner pooling layers in the second network branch are both 3x3, and the height step length and the width step length are (2,1) and (1,2) respectively; the number of output channels of the multiple convolution module is 256, the number of output channels of the separable convolution residual module is 256, and the height step and the width step are (1, 1).

The number of output channels of the multiple convolution module is C1, the height and width step lengths of the four convolution layers in the first branch are (1,1), the sizes of convolution kernels are 1x1, 3x3, 3x1 and 1x3 respectively, and the numbers of the convolution kernels are C1/2, C1/4, C1/2 and C1/2 respectively; the height and width step length of the convolutional layer in the second branch and the convolutional layer after element-by-element addition are both (1,1), the size of the convolutional kernel is 1x1, and the number of the convolutional kernels is C1/2 and C1 respectively.

The number of output channels of the separable convolution residual module is C2, the step length of high and wide convolution is (S1, S2), the convolution kernel size of the separable convolution layer in the first branch is 3x3, and the step length of high and wide convolution is (S1, S2); the convolution kernels in the convolution layers in the first branch and the second branch are (3,3), the number of the convolution kernels is C2/2, and the height step and the width step are (S1, S2) and (1,1) respectively.

The compression expansion module obtains the number of channels of the input feature map through the channel number acquisition layer, wherein the number of the channels is C3, the sizes of convolution kernels of the two convolution layers in the first branch are both 1x1, the height and width step lengths are both (1,1), and the number of the convolution kernels is C3.

In all the convolution layers, the second convolution layer except the compression and expansion module uses the sigmod as an activation function, and the other convolution layers all use the ReLU function as activation functions;

2.2) inputting the stacked character sample pictures obtained in the step 1) and the corresponding stacked character sample labels into a stacked character recognition network, training the stacked character recognition network by using an Adam optimization algorithm until the error of the deep learning classification network reaches the minimum value and keeps stable, and storing the weight parameter data in the indefinite number license plate number recognition network at the moment so as to obtain the trained stacked character recognition network; wherein the initial value of learning rate is set to 0.001, the attenuation speed is 3000 steps, and the attenuation rate is 0.9; the loss function adopts CTC loss, the Dropout layers adopt 0.5 node retention rate during training, and the mean value and variance of the batch normalization layer adopt 0.8 moving average coefficients. The number of pictures in a batch per training iteration is 32 pictures.

In the network structure constructed by the invention, the height and width step length of the pooling of the first maximum internal pooling layer and the second maximum internal pooling layer in the first branch of the whole network are (2,1) and (1,2) in sequence, and through the design of the step length with unequal width and height, the network can learn the pixel information of cross-row and cross-column in the characteristic diagram sequentially and independently, so that the network can achieve better identification effect; the second branch of the multiple convolution module and the second branch of the separable convolution residual error module are jump connection structures in the deep learning network, the structures and the batch normalization layer play the same role, and can help the loss of the network in training to quickly converge to the minimum value under the condition of excessive network layer number or excessively complex network, so that the training efficiency is improved; the separable convolution layer in the separable convolution residual module can ensure that the parameter quantity is greatly reduced under the condition of not reducing the network performance, and the network identification efficiency is improved; the first branch of the compression and expansion module can independently learn the importance degree of different channels in the feature diagram through the network, increase the weight of important channels and reduce the weight of unimportant channels so as to help the whole stacked character recognition network to achieve the best recognition effect.

The step 3) is specifically as follows: and detecting all letters, numbers, stacked characters and the like in the license plate picture by using a target detection algorithm (Cascade or Mobile-SSD algorithm) with the category number of 37 to obtain a single-row character frame, a stacked character frame, a corresponding recognition result and a corresponding confidence coefficient. Wherein 37 classes of the target detection algorithm are background class +10 digits +25 capital letters (except for capital letter O) + a stacked character class.

The step 4) is specifically as follows:

4.1) combining the single-row character frames and the stacked character frames obtained in the step 3) into candidate character frames, calculating the intersection ratio between every two character frames in the candidate character frames, and classifying the character frames with the intersection ratio larger than the intersection ratio threshold value into a cluster;

and 4.2) taking the character frame with the highest confidence coefficient from each cluster of character frames to form a final true character frame, wherein the true character frame consists of a true single-row character frame and a true stacking character frame, and then dividing the true character frame into the true single-row character frame and the true stacking character frame according to the original classification of each character frame.

The step 5) is specifically as follows:

inputting the stacked character pictures into a stacked character recognition network to obtain a label prediction probability distribution matrix Adj of the stacked character pictures, sequentially processing the label prediction probability distribution matrix Adj by using a tf.nn.ctc _ beam _ search _ decoder function and a tf.sparse _ to _ dense function of tensor flow software (tenserflow) to obtain an intermediate prediction result, wherein each element value in the intermediate prediction result is an integer identifier of each predicted stacked character, and the integer identifiers are inversely mapped to corresponding characters according to a character dictionary to obtain a stacked character recognition result, so that the recognition of the stacked character pictures is completed.

The size of the label prediction probability distribution matrix Adj is W_text37, each element Adj (m, n) in the label prediction probability distribution matrix Adj represents the probability that the m-th row of pixels in the input picture to be tested is the n-th type of characters; wherein W_textFor the height of the license plate number region sample picture, m represents a matrix column index (m ═ 0,1,2_text-1) And n denotes a character category subscript (n ═ 0,1,2 … 36).

The step 6) is specifically as follows: designating the length N of the stacked character, comparing the recognition result length of the stacked character in the step 5) with the designated character length, if the lengths are not equal, sequentially dividing the stacked character picture in equal proportion from top to bottom to obtain N small character pictures, sequentially recognizing the N small character pictures by using a character recognition algorithm (such as neural network or template matching), splicing the recognition results to obtain a recognition result of the character string type with the length N, and using the recognition result character string as the final recognition result of the true stacked character frame; if the length of the stack character is not specified, the original real stack character frame recognition result is kept unchanged.

The invention has the beneficial effects that:

aiming at the difficult problem of complex license plate recognition in the current license plate recognition field, the invention mainly discloses a license plate recognition method capable of recognizing stacked characters, and the invention makes up the vacancy of the current license plate recognition field and the defect of the current license plate recognition technology that the license plate containing the stacked characters is not supported, thereby greatly improving the universality of license plate recognition, having extremely high robustness for the recognition of the complex license plate and extremely improving the application of the license plate recognition.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a block diagram of a stacked character recognition network;

FIG. 3 is a block diagram of a multiple convolution module;

FIG. 4 is a block diagram of a separable convolution residual module;

FIG. 5 is a block diagram of a compression expansion module;

FIG. 6 is a sample picture of stacked characters;

fig. 7 is a schematic diagram of the recognition result.

Detailed Description

The invention is described in further detail below with reference to the figures and the embodiments.

As shown in fig. 1, the implementation of the present invention is as follows:

the method comprises the following steps: acquiring stacked character sample pictures in the license plate as shown in fig. 6, wherein each stacked character sample picture has a stacked character sample number, mapping each character in the stacked character sample numbers to a corresponding integer mark according to a character dictionary, and connecting according to the original character sequence to obtain a stacked character sample label. The stacked character sample picture comprises two or more than two characters in the vertical direction, each character is mapped to an integer mark from top to bottom according to the character dictionary in sequence, and the integer marks are connected in sequence to obtain the stacked character sample label. The character dictionary is a mapping relation between 37 characters and integer marks, the 37 characters are 10 Arabic numerals and 26 capital letters, a blank class is formed, and the integer marks are integer labels from 0 to 36. Where the blank class is used to represent a non-character class.

Step two: constructing a stacked character recognition network for recognizing the stacked character sample pictures, training the stacked character recognition network by using the stacked character sample pictures and the stacked character sample labels in the step 1), and storing weight parameters of the stacked character recognition network after the training is finished, thereby obtaining the trained stacked character recognition network.

2.1) as shown in FIG. 2, the stacked character recognition network includes three convolutional layers, two network branches, one multiple convolutional module, a max outer pooling layer, an average outer pooling layer, a merging layer, a dimension compression layer, and a dimension transposition layer. The input picture sequentially passes through a first convolution layer, a maximum outer pooling layer and a multiple convolution module and then respectively enters two network branches, output results of the two network branches are merged in a channel dimension through a merging layer, then sequentially passes through the two convolution layers and an average outer pooling layer, then enters a dimension compression layer and compresses a third dimension (namely the dimension of the width of the input picture), and finally the first dimension and the second dimension (namely the dimensions of the number and the height of the input picture respectively) are transposed through a dimension transposition layer to obtain final network output.

The first network branch comprises a first maximum inner pooling layer, a multiple convolution module, a second maximum inner pooling layer, a Dropout layer, a separable convolution residual module and a Dropout layer which are connected in sequence.

The second network branch comprises a convolution layer, a third maximum inner pooling layer, a separable convolution residual module, a fourth maximum inner pooling layer, a separable convolution residual module, a fifth maximum inner pooling layer, a Dropout layer, a sixth maximum inner pooling layer and a Dropout layer which are connected in sequence.

As shown in fig. 3, the multiple convolution module has two input parameters, which are the input feature map and the number of output channels. The module comprises two branches, wherein the first branch comprises two convolution layers, a compression expansion module and two convolution layers which are connected in sequence; the second branch comprises only one convolutional layer. And after the input characteristic diagram passes through two branches respectively, connecting the outputs of the two branches to an element-by-element addition layer for bitwise summation, and obtaining the final module output after passing through a convolution layer.

As shown in fig. 4, the separable convolution residual module has four input parameters, which are input feature map, number of output channels, height and width convolution step size. The module comprises two branches, wherein the first branch comprises a separable convolution layer, a first batch normalization layer, a convolution layer and a second batch normalization layer which are sequentially connected; the second branch comprises only one convolutional layer. And after the input characteristic diagram passes through the two branches respectively, connecting the outputs of the two branches to a merging layer, and merging the outputs in the channel dimension to obtain the output of the final module.

As shown in FIG. 5, the companding module has an output parameter, i.e., an input feature map. And the input feature map sequentially passes through a channel number acquisition layer, a global average pooling layer and two convolution layers to obtain channel weights, and the channel weights are multiplied by the original input feature map to obtain the output of the final module.

And (3) the above convolution layers except the second convolution layer in the compression and expansion module use the sigmod as an activation function, and the other convolution layers all use the ReLU function as activation functions.

2.2) dividing the obtained stacked character sample picture (shown in figure 6) and the corresponding stacked character sample label into a training set and a testing set according to the proportion of 7:3, inputting the training set into a stacked character recognition network, training the stacked character recognition network by using an Adam optimization algorithm until the error of the deep learning classification network reaches the minimum value and keeps stable, and storing the weight parameter data in the indefinite long license plate number recognition network at the moment; wherein the initial value of learning rate is set to 0.001, the attenuation speed is 3000 steps, and the attenuation rate is 0.9; the loss function adopts CTC loss, the Dropout layers adopt 0.5 node retention rate during training, and the mean value and variance of the batch normalization layer adopt 0.8 moving average coefficients. The number of pictures in a batch per training iteration is 32 pictures.

Step three: acquiring a license plate picture, and detecting all letters, numbers, stacked characters and the like in the license plate picture by using a target detection algorithm (Cascade or Mobile-SSD algorithm) with 37 types of types to obtain a single-row character frame, a stacked character frame, a corresponding recognition result and a corresponding confidence coefficient. Wherein 37 classes of the target detection algorithm are background class +10 digits +25 capital letters (except for capital letter O) + a stacked character class.

Step four: and deleting the false character boxes or the overlapped character boxes in the step three by using a non-maximum suppression algorithm to obtain the final true single-row character box and the true stacked character box. The character box is a character box which does not actually contain characters, and the overlapped character box refers to a character box which has a larger overlapping proportion with other character boxes and relatively smaller confidence coefficient. The method comprises the following specific steps:

4.1) set the cross-over ratio threshold to 0.7,

4.2) combining the single-row character frame and the stacked character frame obtained in the step 3) into a candidate character frame. And calculating the intersection ratio between every two character frames in the candidate character frames, and classifying the character frames with the intersection ratio larger than the intersection ratio threshold value into a cluster.

And 4.3) taking the character box with the highest confidence coefficient in each cluster of character boxes to form a final true character box, wherein the true character box consists of a true single-row character box and a true stack character box.

Step five: and intercepting the original license plate picture according to the true stacking character frame to obtain a stacking character as a stacking character picture, and inputting the stacking character recognition network constructed in the second step to obtain a stacking character recognition result. The specific process is as follows: inputting the stacked character pictures into a stacked character recognition network to obtain a tag prediction probability distribution matrix Adj of the stacked character pictures, processing the tag prediction probability distribution matrix Adj by using a tf.nn.ctc _ beam _ search _ decoder function and a tf.sparse _ to _ dense function carried by tensoflow in sequence to obtain an intermediate prediction result, wherein each element value in the intermediate prediction result is an integer identifier of each predicted stacked character, and the integer identifiers are inversely mapped to corresponding characters according to a character dictionary to obtain a stacked character recognition result so as to finish the recognition of the stacked character pictures;

the size of the label prediction probability distribution matrix Adj is W_text37, each element Adj (m, n) in the label prediction probability distribution matrix Adj represents the probability that the m-th row of pixels in the input picture to be tested is the n-th type of characters; wherein W_textIs the height of a sample picture of the license plate number area, m represents the index of the matrix column (m is 0,1,2,3 …, W)_text-1), n denotes a character category subscript (n ═ 0,1,2 … 36).

Step six: and D, post-processing the stacked character result in the step five. And B, designating the length N of the stacked character, comparing the recognition result length of the stacked character in the step five with the designated character length, if the lengths are not equal, sequentially dividing the stacked character picture in equal proportion from top to bottom to obtain N small character pictures, sequentially recognizing the N small character pictures by using a character recognition algorithm (such as neural network or template matching), splicing the recognition results to obtain a recognition result character string with the length N, and using the recognition result character string as the final recognition result of the true stacked character frame. If the length of the stack character is not specified, the original real stack character frame recognition result is kept unchanged.

Step seven: and splicing the recognition results of the real single-row character frames and the real stacked character frames from left to right in sequence to obtain the final recognition result of the license plate containing the stacked characters.

The specific embodiment is as follows:

the invention trains the stacked character recognition network by respectively adopting the stacked character sample picture and the corresponding stacked character sample label as shown in fig. 6, wherein the stacked character sample picture can contain two or more than two other characters in the vertical direction.

In this embodiment, 13692 stacked character pictures are collected, and the training set are divided according to the ratio of 7:3, and the stacked character recognition network is trained by using the training set. Finally, 14802 license plate pictures are tested, wherein 3303 license plates containing stacked characters are included, and finally the recognition accuracy of 96.45% on a 14802 total license plate picture set is achieved, and the recognition accuracy of 95.03% on 3303 license plates containing stacked characters is achieved. Fig. 7 is a schematic diagram of recognition results of two examples of license plates with stacked characters.

The detection results of the embodiments show that the method of the invention has higher recognition accuracy for the license plate which generally only contains single line characters, and also has higher recognition accuracy for the complex license plate which contains stacked characters, and has great application potential in the field of license plate recognition.

The foregoing detailed description is intended to illustrate and not limit the invention, which is intended to be within the spirit and scope of the appended claims, and any changes and modifications that fall within the true spirit and scope of the invention are intended to be covered by the following claims.

Claims

1. A license plate recognition method containing stacked characters based on deep learning is characterized in that: the method comprises the following steps:

3) acquiring a license plate picture, and detecting and acquiring single-row characters and stacked characters in the license plate picture by using a target detection method to obtain single-row character frames and stacked character frames and corresponding recognition results and confidence degrees;

4) processing the result obtained in the step 3) by using a non-maximum suppression algorithm to delete the pseudo character frame or the overlapped character frame to obtain a final true single-row character frame and a true stacked character frame;

7) splicing the re-recognition results of the real single-row character frames and the real stacked character frames from left to right in sequence to obtain the final recognition result of the license plate containing the stacked characters;

the stacked character recognition network in the step 2) is as follows:

the input picture sequentially passes through a first convolution layer, a maximum outer pooling layer and a multiple convolution module and then is respectively input into two network branches, the outputs of the two network branches are merged in the channel dimension through a merging layer, then sequentially passes through a second convolution layer, a third convolution layer and an average outer pooling layer, an input dimension compression layer compresses the third dimension, and finally the first dimension and the second dimension are transposed through a dimension transposition layer to obtain the final network output;

2.2) inputting the stacked character sample pictures obtained in the step 1) and the corresponding stacked character sample labels into a stacked character recognition network, training the stacked character recognition network by using an Adam optimization algorithm until the error of the deep learning classification network reaches the minimum value and keeps stable, and storing the weight parameter data in the indefinite number license plate number recognition network at the moment so as to obtain the trained stacked character recognition network; wherein the initial value of learning rate is set to 0.001, the attenuation speed is 3000 steps, and the attenuation rate is 0.9; the loss function adopts CTC loss, the Dropout layers adopt 0.5 node retention rate during training, and the mean value and variance of the batch normalization layer adopt 0.8 moving average coefficients.

2. The deep learning-based license plate recognition method containing stacked characters according to claim 1, characterized in that: in the step 1), the stacked character sample pictures contain two or more characters in the vertical direction, each character is mapped to an integer mark from top to bottom according to the character dictionary in sequence, and the integer marks are connected in sequence to obtain stacked character sample labels; the character dictionary is a mapping relation between 37 characters and integer marks, the 37 characters are 10 Arabic numerals and 26 capital letters, a blank class is formed, and the integer marks are integer labels from 0 to 36.

3. The deep learning-based license plate recognition method containing stacked characters according to claim 1, characterized in that: the step 3) is specifically as follows: and detecting all letters, numbers, stacked characters and the like in the license plate picture by using a target detection algorithm Cascade or Mobile-SSD algorithm with the category number of 37 to obtain a single-row character frame, a stacked character frame, a corresponding recognition result and a corresponding confidence coefficient.

4. The deep learning-based license plate recognition method containing stacked characters according to claim 1, characterized in that: the step 4) is specifically as follows:

5. The deep learning-based license plate recognition method containing stacked characters according to claim 1, characterized in that: the step 5) is specifically as follows: inputting the stacked character pictures into a stacked character recognition network to obtain a label prediction probability distribution matrix Adj of the stacked character pictures, processing the label prediction probability distribution matrix Adj to obtain an intermediate prediction result, wherein each element value in the intermediate prediction result is an integer identifier of each predicted stacked character, and mapping the integer identifiers to corresponding characters according to a character dictionary to obtain a stacked character recognition result and finish the recognition of the stacked character pictures.

6. The deep learning-based license plate recognition method containing stacked characters according to claim 1, characterized in that: the step 6) is specifically as follows: designating the length N of the stacked character, comparing the recognition result length of the stacked character in the step 5) with the designated character length, if the lengths are not equal, sequentially dividing the stacked character picture in equal proportion from top to bottom to obtain N small character pictures, sequentially recognizing the N small character pictures by using a character recognition algorithm, splicing the recognition results to obtain a recognition result of the character string type with the length of N, and using the recognition result character string as the final recognition result of the true stacked character frame; if the length of the stack character is not specified, the original real stack character frame recognition result is kept unchanged.