WO2021135254A1

WO2021135254A1 - License plate number recognition method and apparatus, electronic device, and storage medium

Info

Publication number: WO2021135254A1
Application number: PCT/CN2020/108989
Authority: WO
Inventors: 曾卓熙
Original assignee: 深圳云天励飞技术股份有限公司
Priority date: 2019-12-31
Filing date: 2020-08-13
Publication date: 2021-07-08
Also published as: CN111191663B; CN111191663A

Abstract

A vehicle license plate number recognition method and apparatus, an electronic device, and a storage medium. The method comprises: inputting an image to be recognized into a preset feature encoding space to perform correction and encoding to obtain a feature image having a plurality of channels (101), the image to be recognized comprising license plate information, the feature image comprising a plurality of feature regions corresponding to the plurality of channels, and the channels having time sequence attributes; inputting the feature image into a preset feature decoding space according to the time sequence attributes, and decoding the feature regions corresponding to the channels by means of an attention mechanism according to the time sequence attributes (102); and outputting the decoding results according to the time sequence attributes to obtain the recognition result of the image to be recognized (103), such that the accumulation of errors in multiple steps can be avoided, and the robustness of license plate number recognition can be improved, and in addition, end-to-end license plate number recognition can be achieved using only the encoding space and the decoding space in the entire recognition process.

Description

License plate number recognition method, device, electronic equipment and storage medium

Technical field

The present invention relates to the field of artificial intelligence technology, and in particular to a method, device, electronic equipment and storage medium for recognizing a person's license plate number.

Background technique

Image recognition is currently one of the commonly used technologies in traffic, community or parking lot management. For example, using image recognition-based license plate number recognition to identify the vehicle's license plate number. At present, traditional license plate number recognition is generally divided into multiple independent steps, such as: 1. Image normalization: the license plate image is programmed into a "formal image" through computer vision methods (such as homography matrix homography, etc.). 2. Image preprocessing: processing the occlusion, dirt, light, etc. of the image here (such as binary distribution binarized, etc.) 3. Character segmentation: character segmentation through computer vision methods (such as edge detection, etc.) 4. Character recognition: Recognize the segmented characters (such as random forest, support vector machine svm, logistic regression and other machine learning or deep learning methods). As a result, the errors in each step may accumulate, resulting in a poor final recognition effect, and it is not easy to locate where the problem occurs. Moreover, traditional license plate recognition has relatively high requirements for input images, and has strict requirements on angle and clarity. The various limitations of traditional license plate recognition result in strict requirements for the installation of cameras and monitoring scenes, and the recognition rate is easily affected by weather and light. Therefore, traditional license plate recognition is susceptible to many factors, which leads to poor recognition results and poor robustness.

Summary of the invention

The embodiment of the present invention provides a method for recognizing a license plate number, which can improve the robustness of the recognition of a license plate number.

In the first aspect, an embodiment of the present invention provides a method for recognizing a license plate number, including:

Input the image to be recognized into the preset feature encoding space for correction and encoding, and obtain a feature image with multiple channels. The image to be identified includes license plate information, and the feature image includes multiple channels corresponding to the multiple channels. In a characteristic area, the channel has a time sequence attribute;

Inputting the feature image into a preset feature decoding space according to the time series attribute, and in the feature decoding space, using an attention mechanism to decode feature regions in the feature image according to the time series attribute;

The decoding result is output according to the time sequence attribute, and the recognition result of the image to be recognized is obtained.

Optionally, the preset feature encoding space includes a pre-trained space transformation network and a pre-trained encoding network, and the image to be recognized is input into the preset feature encoding space for correction and encoding, and the encoding has Feature images of multiple channels, including:

Performing correction prediction on the image to be recognized in the pre-trained spatial transformation network, and correcting the image to be recognized according to the prediction result to obtain a corrected image;

Input the corrected image to the pre-trained coding network, and perform convolution calculation on the corrected image through multiple convolution kernels in the coding network to obtain a feature image with multiple channels, wherein, The number of the channels is the same as the number of the convolution kernels, and the timing attributes of the channels are associated with the order of calculation of the convolution kernels.

Optionally, the preset feature decoding space includes a pre-trained attention mechanism and a pre-trained long- and short-term memory network, and the feature image is input into the preset feature decoding space according to the timing attributes , And decode the characteristic area corresponding to the channel according to the time sequence attribute through the attention mechanism, including:

When the feature image is input into the feature decoding space according to the time sequence attribute, reporting the time sequence attribute of each channel to the pre-trained attention mechanism;

Through the pre-trained attention mechanism, the feature regions corresponding to the channels are sorted according to the time sequence attributes, and the pre-trained long- and short-term memory network is notified according to the sorting order to sequentially decode the sequence corresponding to the sorting Characteristic area.

Optionally, the notifying the preset trained long and short-term memory network according to the ranking to sequentially decode the feature regions corresponding to the ranking includes:

Output first attention parameters in the order by the pre-trained attention mechanism, and notify the pre-trained long and short-term memory network to decode the first feature region through the first attention parameters;

When decoding the first feature region, the pre-trained attention mechanism outputs a second attention parameter according to the order, and the second attention parameter includes the position of the second feature region;

After the decoding of the first characteristic region is completed, the pre-trained long and short-term memory network decodes the second characteristic region;

Until the decoding of all feature regions is completed in turn.

Optionally, the decoding of the second characteristic region by the pre-trained long-short-term memory network after the decoding of the first characteristic region is completed includes:

After the first feature region is decoded, the decoded features of the first feature region and the second feature region are used as inputs, and input into the pre-trained long-short-term memory network for decoding.

Optionally, after inputting the image to be recognized into a preset feature encoding space for correction and encoding to obtain a feature image with multiple channels, the method further includes:

Up-sampling the characteristic image so that the size of the characteristic image is the same as the size of the image to be recognized;

Performing pixel point prediction on the up-sampled feature image according to the channel of the feature image, and predicting the feature area to which each pixel point in the up-sampled feature image belongs;

According to the time series attribute of the channel, annotate the feature area to which each pixel in the up-sampled feature image belongs, so that the feature area to which each pixel in the up-sampled feature image belongs has time sequence Attributes to get the marked feature image;

The inputting the characteristic image into a preset characteristic decoding space according to the time series attribute, and decoding the characteristic region corresponding to the channel according to the time series attribute through an attention mechanism, includes:

The labeled feature image is input into a preset feature decoding space according to the time series attribute, and the feature area is decoded in the feature decoding space according to the time series attribute through an attention mechanism.

In a second aspect, an embodiment of the present invention provides a license plate number recognition device, including:

The encoding module is used to input the image to be recognized into a preset feature encoding space for correction and encoding to obtain a feature image with multiple channels. The image to be identified includes license plate information, and the feature image includes A plurality of characteristic regions corresponding to a channel, the channel having a time sequence attribute;

The decoding module is used to input the characteristic image into a preset characteristic decoding space according to the time sequence attribute, and in the characteristic decoding space, the characteristic regions in the characteristic image are sequentially arranged according to the time sequence attribute through the attention mechanism. Decode

The output module is used to output the decoding result according to the time sequence attribute to obtain the recognition result of the image to be recognized.

In a third aspect, an embodiment of the present invention provides an electronic device including: a memory, a processor, and a computer program stored on the memory and capable of running on the processor, and when the processor executes the computer program The steps in the license plate number recognition method provided by the embodiment of the present invention are realized.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium having a computer program stored on the computer-readable storage medium, and when the computer program is executed by a processor, the method for recognizing a license plate number provided by the embodiment of the present invention is implemented A step of.

In the embodiment of the present invention, the image to be recognized is input into a preset feature encoding space for correction and encoding, and a feature image with multiple channels is obtained. The image to be identified includes license plate information, and the feature image includes the multiple A plurality of feature regions corresponding to each channel, the channel has time series attributes; the feature image is input into the preset feature decoding space according to the time series attributes, and the feature region corresponding to the channel is set according to all the features through the attention mechanism. The time sequence attribute is decoded; the decoding result is output according to the time sequence attribute, and the recognition result of the image to be recognized is obtained. Perform correction and feature coding on the image to be recognized of the license plate number in the feature encoding space, and decode the feature region in time sequence in the feature decoding space, which avoids the accumulation of errors in multiple steps and improves the robustness of license plate number recognition ; And the entire recognition process only goes through the coding space and the decoding space, and end-to-end license plate number recognition can be realized.

Description of the drawings

In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only These are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without creative work.

FIG. 1 is a flowchart of a method for recognizing a license plate number provided by an embodiment of the present invention;

2 is a flowchart of another method for recognizing a license plate number provided by an embodiment of the present invention;

3 is a flowchart of another method for recognizing a license plate number provided by an embodiment of the present invention;

4 is a schematic diagram of the structure of a license plate number recognition device provided by an embodiment of the present invention;

FIG. 5 is a schematic structural diagram of another vehicle license plate number recognition device provided by an embodiment of the present invention;

Fig. 6 is a schematic structural diagram of another vehicle license plate number recognition device provided by an embodiment of the present invention;

FIG. 7 is a schematic structural diagram of another vehicle license plate number recognition device provided by an embodiment of the present invention;

8 is a schematic structural diagram of another vehicle license plate number recognition device provided by an embodiment of the present invention;

FIG. 9 is a schematic structural diagram of an electronic device provided by an embodiment of the present invention.

Detailed ways

The technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, rather than all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present invention.

Please refer to FIG. 1. FIG. 1 is a flowchart of a method for recognizing a license plate number according to an embodiment of the present invention. As shown in FIG. 1, it includes the following steps:

101. Input the image to be recognized into a preset feature encoding space for correction and encoding, to obtain a feature image with multiple channels.

Wherein, the above-mentioned image to be recognized includes license plate information, the above-mentioned characteristic image includes a plurality of characteristic regions corresponding to a plurality of channels, and the above-mentioned channels have time series attributes.

The above-mentioned image to be recognized can be a static image of a vehicle license plate or an image frame of a dynamic video uploaded by the user, or a static image or a static image of the vehicle license plate obtained by a camera deployed on a traffic road, entrances and exits of communities, and entrances and exits of parking lots. The image frame of the dynamic video.

The above-mentioned license plate information in the image to be recognized may be one or more, that is, there are one or more license plate numbers to be recognized in an image to be recognized.

The aforementioned feature encoding space may be a fully convolutional network space, and the aforementioned fully convolutional network space may be calculated by convolution to predict the correction parameters of the image to be recognized, and correct the image to be recognized according to the predicted correction parameters. The above-mentioned fully convolutional network space can be calculated by convolution to predict the feature area corresponding to each character in the license plate information.

The above correction can be understood as performing spatial transformation and alignment of the image to be recognized, and may include translation, scaling, and rotation of the image to be recognized.

The above-mentioned feature area is determined by the channel in the full convolutional network, and the above-mentioned channel is the output channel obtained after convolution calculation. Specifically, it is determined by the channel value of the channel. In a fully convolutional network, the convolution kernel is used to perform convolution calculations on the image to be recognized, and the corresponding features are extracted, and one convolution kernel corresponds to one channel. For example, the parameters (3, W, H) of the license plate to be recognized, where W, H is the height and width of the license plate, 3 is the RGB three-color channel of the license plate to be recognized, and the RGB three channels are convolved through a convolution kernel. After calculation, the output obtained is the channel after the sum of the corresponding channel values of the RGB three channels. For example, after the RGB three channels are respectively convolved through a convolution kernel, (R1, R2, R3, ..., Rn), (G1, G2, G3, ..., Gn), (B1, B2, B3, ..., Bn), then the channel after the addition is (R1+G1+B1, R2+G2+B2, R3 +G3+B3,..., Rn+Gn+Bn), so it can be considered that one convolution kernel corresponds to one channel. According to different channel values on the same feature point, different feature regions are determined. For example, on the same feature point, the largest channel value indicates that the feature point belongs to the feature region corresponding to the channel. Take the license plate as an example to further illustrate that a normal car license plate is composed of 7 characters. In the convolution process, these 7 characters need to be segmented. Each character becomes a characteristic area, which corresponds to a channel, which can also be called a character area. , After convolving the license plate image, each character area will be represented by a channel. Different channels represent different character regions, and the character region to which a feature point belongs is the character region corresponding to the channel with the largest channel value at the feature.

Therefore, the feature area corresponding to each feature point can be determined by traversing the maximum channel value of each feature point. Since the license plate number is composed of multiple characters, after feature encoding is performed in the feature encoding space, the output feature image needs to correspond to the feature area where multiple characters are located, so the output of the feature encoding space is multiple corresponding to the number of characters The characteristic image of the channel. The above-mentioned multiple channels have timing attributes, which are determined by the convolution calculation order of the convolution kernel during the encoding process. For example, after the first convolution kernel performs the convolution calculation, the first channel is obtained, and the second After the convolution kernel is calculated, the second channel is obtained. It can be seen that due to the calculation sequence of the convolution kernel, the channel has a timing attribute.

It should be understood that in the feature coding space, the feature coding of the image to be recognized is a feature extraction process of the image to be recognized; the correction of the image to be recognized is a predictive correction, and the effect of the correction is positively correlated with the perfection of the training data. Before the number is recognized, no complicated image preprocessing steps are required, and the image to be recognized can be directly input.

102. Input the feature image into a preset feature decoding space according to the time series attributes, and decode the feature regions in the feature image according to the time series attributes in the feature decoding space through an attention mechanism.

In this step, the feature image is the feature image obtained through feature encoding space encoding in step 101. The feature image includes channels corresponding to the number of license plate characters. Each channel corresponds to a different feature area, which can also be understood as each channel corresponds to Different character areas. It can be understood that inputting the feature image into the preset feature decoding space according to the time series attribute refers to inputting the multiple channels corresponding to the feature image into the feature decoding space according to the time series attribute.

The feature decoding space sequentially decodes the feature regions corresponding to each channel to decode the characters represented by the corresponding feature regions.

The above-mentioned time sequence attributes of each channel are maintained by the attention mechanism. After the feature image is input into the feature decoding space, since the channel of the feature image has a time sequence attribute during input, the attention mechanism will sort the channel timing of the feature image. And output the attention parameters corresponding to the sorting, so that each channel is decoded according to the timing attribute.

The above-mentioned feature decoding space may be a neural network based on time sequence, such as recurrent neural network (English: Recurrent Neural Network, abbreviated as: CNN), long-short-term memory network (English: Long-Short Term Memory Cells, abbreviated as: LSTM). The above-mentioned neural network based on time series can make predictions based on the relationship between the previous character and the next character. For example, according to the specifications of the relevant license plate number, in the car license plate "Zhe J·L9098", the previous character is the Chinese character "Zhe" When the next character is a letter, the probability is 100%, that is, when the previous character is a Chinese character type, when decoding the next character, you don’t need to consider the latter character as a Chinese character or number. It is only in the case of a 24-letter character. Decode in the category. It is equivalent to the following characters to be decoded depending on the preceding characters.

It should be noted that when decoding, the decoded characters are related to the structure of the license plate number. Taking the common domestic civilian license plate as an example, the license plate includes three parts. The first part is the abbreviation of the province, autonomous region, and municipality, and the second part It is the code of the licensing organization, and the third part is the serial number. In the car license plate "ZheJ·L9098", the first part is "Zhe", the second part is "J", and the third part is "L9098". According to the domestic administrative division, the first part is the abbreviation characters of provinces, autonomous regions, and municipalities directly under the Central Government, with 31 corresponding Chinese characters, and the second character is the code of the licensing agency, which is represented by the characters corresponding to uppercase letters. There are 24 corresponding characters. Uppercase alphabetic characters (because I and O in uppercase letters are easily confused with the numbers 1, 0, so alphabetic characters are not included in the license plate number compilation, so there are 24), the numbers are 0-9 and there are 10 characters in total, so the total is 65 Characters are available for decoding. In traditional decoding, since there is no decoding based on timing, the previous decoding result is not considered, so that every character on the license plate must be decoded from these 65 characters. For decoding based on time sequence, the first character decoding only needs to be decoded in 31 Chinese characters, the second character only needs to be decoded in 24 alphabetic characters, and the remaining characters only need to be decoded in alphabetic characters and numeric characters in turn. A total of 34 characters can be decoded.

Of course, the above is just an example of a common civilian license plate, which should not be regarded as a limitation of the present invention. There can also be other license plates for different purposes with different license plate number structures, such as police car license plates, coach license plates, and entry-exit license plates. , Embassy license plates, military vehicle license plates, armed police license plates, civil aviation license plates, trailer license plates, agricultural license plates, individual license plates, etc. have different license plate number structures.

The aforementioned attention mechanism may be a channel attention module (English: Attention Reinment Module, abbreviated as ARM). The above-mentioned channel attention module can assign corresponding attention parameters to the feature regions corresponding to each channel, and the above attention parameters are the positions in the channels where the corresponding feature regions are located. For example, the channel value of each feature point on the channel where the feature area corresponding to the "Zhe" character is located is greater than the value of other channels. At this time, the position of the feature area corresponding to the "Zhe" character is used as the attention parameter. When decoding, the feature decoding space is notified to decode the position according to the attention parameter.

The above-mentioned attention mechanism can also be an attention mechanism directly aimed at the two-dimensional spatial position of the feature region in the feature image. According to the height and width of the feature image, the two-dimensional spatial position of the feature region corresponding to each character in the feature image is calculated. The attention mechanism assigns corresponding attention parameters to the two-dimensional spatial position of the feature region corresponding to each character in the order from top to bottom and from left to right. At the beginning of decoding, the feature decoding space is notified to decode the feature regions in sequence according to the attention parameter.

103. Output the decoding result according to the time sequence attribute, and obtain the recognition result of the image to be recognized.

The above decoding result is the character corresponding to the license plate information in the image to be recognized. Since the decoding is performed according to the time series attribute in the feature decoding space, the obtained decoded character also has the time series attribute, and the obtained decoded character is according to the time series. The attributes are output to satisfy the character sorting of the license plate number.

It should be noted that the license plate number recognition method provided by the embodiment of the present invention can be applied to mobile phones, monitors, computers, servers and other devices that need to recognize license plate numbers.

Optionally, please refer to FIG. 2. FIG. 2 is a flowchart of another method for recognizing a license plate number according to an embodiment of the present invention. The difference from the embodiment in FIG. 1 is that the preset feature encoding space includes a pre-trained space The transformation network and the pre-trained coding network, the preset feature decoding space includes the pre-trained attention mechanism and the pre-trained long and short-term memory network, as shown in Figure 2, including the following steps:

201. Perform correction prediction on the image to be recognized in a pre-trained spatial transformation network, and correct the image to be recognized according to the prediction result to obtain a corrected image.

In this step, the aforementioned pre-trained spatial transformation network may be an STN (Spatial Transform Network) spatial transformation network. The aforementioned spatial transformation network and coding network can form a fully convolutional neural network, so that the feature coding space is a fully convolutional neural network.

The above-mentioned spatial transformation network may be set before the coding network. In this way, the image to be recognized can be transformed to meet the image requirements of the coding network through the spatial transformation network, which can be understood as transforming any input image into a desired input image of the coding network.

In the spatial transformation network, by calculating the parameters of the spatial transformation, the parameters are different according to the form of the image transformation to be recognized. For example, when a 2D affine transformation is implemented, the parameter is the output of a 6-dimensional (2x3) vector. After the parameters of the space transformation are calculated, the corresponding space transformation function is generated according to the parameters, and the image to be recognized is transformed into the image expected by the coding network according to the transformation function.

Specifically, in the STN spatial transformation network, the image to be recognized is processed through three parts, namely Localisation net (location network), Grid generator (grid generation) and Sample (sample output). Among them, the Localisation net determines the input required transformation parameter θ, the Grid generator finds the output and input feature mapping T(θ) through θ and the defined transformation method, Sample combines the position mapping and transformation parameters to select the input features and combines the double line Sexual interpolation sampling is output, so that the image to be recognized is transformed into the image expected by the coding network. Since in the Localisation net, a regression layer needs to be connected after several convolution or full link operations to return to the output transformation parameter θ, which is a regression prediction parameter, so the STN spatial transformation network is a spatial transformation that can be trained Therefore, the STN spatial transformation network can adaptively learn the spatial transformation methods for different data through training. Moreover, the STN spatial transformation network can not only perform spatial transformation on the input, but also can be inserted into any layer of the coding network as a network module to realize the spatial transformation of different feature images. Finally, the coding network can learn to translate, scale, rotate and The invariance of more common distortions improves the robustness of feature coding of the coding network.

202. Input the corrected image to a pre-trained coding network, and perform convolution calculation on the corrected image through multiple convolution kernels in the coding network to obtain a feature image with multiple channels.

Wherein, the number of the aforementioned channels is the same as the number of convolution kernels, and the timing attributes of the aforementioned channels are related to the order of calculation of the convolution kernels.

The above-mentioned corrected image is a to-be-identified image corrected by the spatial transformation network in the feature coding space.

The above-mentioned pre-trained coding network may be a convolutional neural network, which is used to extract the characteristic region where each character in the license plate information is located.

In a possible embodiment, the aforementioned coding network has multiple calculation layers, and a space conversion network can be set between every two calculation layers to perform spatial conversion on the channels calculated by the previous calculation layer, so as to satisfy the next calculation. The input expectation of each layer is to correct the input of each calculation layer to reduce the degree of error accumulation, thereby improving the recognition accuracy.

The aforementioned coding network is a coding network obtained by training based on character images as a data set. The above-mentioned data set can be composed of 31 Chinese characters, 24 alphabetic characters, 10 numeric characters, a total of 65 characters, and each character corresponds to multiple images in different situations. The coding network is trained through the data set, so that the coding network can learn to encode the characteristic region to which the character belongs, so as to encode the characteristic region where the character is located. Specifically, the weight parameter corresponding to the convolution kernel in the coding network is trained to make the coding network obtain the corresponding channel through the corresponding convolution kernel and convolution calculation when performing convolution calculation on the image to be recognized. The channel corresponds to the characteristic area to which the character belongs.

Optionally, in a possible embodiment, the encoding network may be a fully convolutional neural network, and the fully convolutional network may accept input images of any size, that is, the size of the image to be recognized does not need to be processed, and the fully convolutional network uses The deconvolution calculation layer up-samples the feature image of the last convolution layer so that the size of the feature image is the same as the size of the input image, so that a prediction can be generated for each pixel, while retaining the original input image Spatial information. Therefore, when the coding network is a fully convolutional neural network, the output feature image is a feature image with the same spatial information as the image to be recognized, that is, the position information of each feature region extracted can be divided into pixels in the spatial information of the image to be recognized. The point distribution position is characterized, and the pixel points can be traversed and classified on the up-sampled feature image. Of course, the above-mentioned traversal classification is based on the channel of the feature image. The classification of each pixel corresponds to the channel with the highest channel value, and further to the feature area corresponding to the channel with the highest channel value.

In addition, when the feature image of the last convolutional layer is up-sampled, the feature area corresponding to each channel can be labeled, so that each pixel on the up-sampled feature image corresponds to the channel label, which is equivalent to The feature regions in the up-sampled feature images are labeled, so that the feature regions have time series properties. In this case, the corresponding channel may not be reserved. The aforementioned attention mechanism will prompt for the feature region and the corresponding label in the up-sampled feature image, so that the feature region is displayed in the feature according to the time series attribute. Decoding in the decoding space.

203. When the feature image is input into the feature decoding space according to the time sequence attribute, the time sequence attribute of each channel is reported to the pre-trained attention mechanism.

204. Sort the characteristic regions corresponding to the channels according to time sequence attributes through the pre-trained attention mechanism, and notify the pre-trained long- and short-term memory network according to the sort to sequentially decode the characteristic regions corresponding to the sort.

In the

above steps

203 and 204, the attention mechanism obtains the channel sequence a according to the timing attribute of the channel, and the attention mechanism calculates the weight at _{,i of} _{each channel a i} at the current time t, which can be calculated by the formula:

e _ti = f _aat (a _i ,h _t-1 )

_{Among them, f aat} in the formula is the attention perception function, a _i is the current input vector, h _t-1 is the decoding state at the previous moment, and L is the number of channels.

According to the timing of the input channel and the corresponding weight output a channel, you can pass

Output a channel and input the channel to the long and short-term memory network for decoding.

When the aforementioned long-short-term memory network decodes the characteristic region corresponding to the current channel, it will obtain the location of the next characteristic region to be decoded according to the output of the attention mechanism.

Specifically, the pre-trained attention mechanism is used to output the first attention parameters in order, and the pre-trained long and short-term memory network is notified through the first attention parameters to decode the first feature region. When decoding the first feature region, the pre-trained attention mechanism outputs the second attention parameter according to the order, and the second attention parameter includes the position of the second feature region; after the first feature region is decoded, pass The second attention parameter informs the pre-trained long and short-term memory network to pay attention to the location of the second feature region, so that the pre-trained long and short-term memory network decodes the second feature region; After the feature region is decoded, according to the second attention parameter, take the decoded features of the first feature region and the second feature region as input, and input them into the pre-trained long-short-term memory network for decoding; loop decoding until all feature regions are completed in turn Decoding.

It should be noted that since the spatial transformation network and the coding network are deployed in the feature coding space, the attention mechanism and the long-short-term memory network are deployed in the feature decoding space, end-to-end training can be realized, that is, the feature coding space and the Feature decoding space for training. Therefore, before the image to be recognized is input into the feature decoding space, there is no need to preprocess the image.

205. Output the decoding result according to the time sequence attribute, and obtain the recognition result of the image to be recognized.

In the embodiment of the present invention, after the image to be recognized of the license plate number is corrected by the spatial transformation network in the feature encoding space, the image to be recognized is corrected by the encoding network, and the feature region is processed in the feature decoding space in time sequence. Decoding is an end-to-end decoding form, which avoids the accumulation of errors in multiple steps in image preprocessing, and improves the robustness of license plate number recognition; moreover, the entire training process and recognition process only go through the encoding space and the decoding space. Realize end-to-end license plate number recognition.

As shown in FIG. 3, FIG. 3 is a flowchart of another method for recognizing a license plate number provided by an embodiment of the present invention, which is composed of an encoder and a decoder, and the STN layer is deployed in the encoder to be recognized. The image is rectified and the convolutional neural network is used for feature extraction, and the decoder is a combination of long and short-term memory network and attention mechanism. As shown in Figure 3, the image information of the license plate to be recognized is "Zhe J·L9098", and the input includes image parameters such as color channel (3, RGB), width (W), height (H), and feature encoding is performed in the encoding space Then, the feature image is obtained. In the time sequence attribute of the channel, the channel corresponding feature areas in the feature image are the first feature area corresponding to Chinese characters, the second feature area corresponding to alphabetic characters, and the third feature area corresponding to letters/numbers. To the seventh feature region, when the feature image is input into the decoding space, the attention mechanism sorts the feature regions corresponding to each channel according to the time sequence attribute, and prompts the feature regions corresponding to each channel. At h0, the attention mechanism outputs The first attention parameter, the first attention parameter is composed of the start instruction <start> + the location of the first feature region. At h1, input the first feature region to the long and short-term memory network in the decoder, and then decode When the time, the long and short-term memory network in the decoder will decode which Chinese character the first feature area belongs to among the 31 Chinese characters, and the decoding result is "Zhe". At this time, the current decoding state will be saved, and the attention mechanism will output the first Two attention parameters, the second attention parameter is composed of the previous decoding state + the location of the second feature region. At h2, enter the decoding state of the first feature region and the length of the second feature region to the decoder The memory network, when decoding, because the previous decoding state is the Chinese character decoding state, in the ordinary car license plate rules, the probability of a Chinese character followed by a letter is 100%, and the long and short-term memory network in the decoder will decode in 24 letters Find out which letter the second feature area belongs to, and the decoding result is "J". At this time, the current decoding state will be saved, and the attention mechanism will output the third attention parameter. The third attention parameter is from the previous decoding state. +The position of the third feature area is composed of, at h3, input the decoding state of the second feature area and the long and short-term memory network of the third feature area into the decoder. When decoding, the previous decoding state is a letter state In the ordinary car license plate rules, the probability of a letter followed by a Chinese character is 0%. The long and short-term memory network in the decoder will decode which letter or number the third characteristic area belongs to among 24 letters and 10 numbers. The decoding result is "L", at this time, the current decoding state will be saved, and the attention mechanism will output the fourth attention parameter. Until the long and short-term memory network outputs <end> to end the recognition, it is considered that the recognition has been completed and the decoding result is output.

In the embodiment of the present invention, since the STN layer is deployed in the encoder to correct the image to be recognized and the convolutional neural network to perform feature extraction, the decoder is an architecture that combines a long and short-term memory network and an attention mechanism, so that the encoder + decoder With the characteristics of deep neural networks, deep learning methods can be used to drive the training of the entire encoder + decoder model with data. The more complete the training data, the more scenes that can be identified, which improves the robustness of the model. In addition, since the encoder + decoder is an end-to-end model, there is no need to preprocess the image, which improves the speed of the license plate number recognition. Since there are no multiple steps in the preprocessing process, it will not cause errors to accumulate. Improve the recognition accuracy of the license plate number.

It should be noted that the license plate number recognition method provided in the embodiments of the present invention can be applied to mobile phones, monitors, computers, servers and other devices that need to perform license plate number recognition.

Please refer to FIG. 4. FIG. 4 is a schematic structural diagram of a license plate number recognition device provided by an embodiment of the present invention. As shown in FIG. 4, it includes:

The encoding module 401 is used to input the image to be recognized into a preset feature encoding space for correction and encoding to obtain a feature image with multiple channels. The image to be identified includes license plate information, and the feature image includes the Multiple characteristic regions corresponding to each channel, the channel having a time sequence attribute;

The decoding module 402 is configured to input the characteristic image into a preset characteristic decoding space according to the time sequence attribute, and in the characteristic decoding space, use an attention mechanism to adjust the characteristic region in the characteristic image according to the time sequence attribute. Decoding sequentially;

The output module 403 is configured to output the decoding result according to the time sequence attribute to obtain the recognition result of the image to be recognized.

Optionally, as shown in Figure 5, the preset feature coding space includes a pre-trained space transformation network and a pre-trained coding network. The coding module 401 includes:

The correction unit 4011 is configured to perform correction prediction on the image to be recognized in the pre-trained spatial transformation network, and correct the image to be recognized according to the prediction result to obtain a corrected image;

The encoding unit 4012 is configured to input the corrected image into the pre-trained encoding network, and perform convolution calculation on the corrected image through multiple convolution cores in the encoding network to obtain multiple channels The feature image of, wherein the number of the channels is the same as the number of the convolution kernels, and the timing attributes of the channels are associated with the order of calculation of the convolution kernels.

Optionally, as shown in FIG. 6, the preset feature decoding space includes a pre-trained attention mechanism and a pre-trained long and short-term memory network, and the decoding module 402 includes:

The attention unit 4021 is configured to report the time series attribute of each channel to the pre-trained attention mechanism when the characteristic image is input into the characteristic decoding space according to the time series attribute;

The decoding unit 4022 is configured to sort the feature regions corresponding to the channel according to the time sequence attribute through the pre-trained attention mechanism, and notify the pre-trained long-short-term memory network to decode sequentially according to the sorting Corresponds to the sorted characteristic area.

Optionally, as shown in FIG. 7, the decoding unit 4022 includes:

The first decoding subunit 40221 is configured to output first attention parameters in the order through the pre-trained attention mechanism, and notify the pre-trained long and short-term memory network to perform the first attention parameters through the first attention parameters. Decoding the characteristic area;

The output subunit 40222 is configured to output second attention parameters according to the order by the pre-trained attention mechanism when decoding the first feature region, and the second attention parameters include the information of the second feature region. position;

The second decoding subunit 40223 is configured to notify the pre-trained long and short-term memory network to decode the second feature region through the second attention parameter after the decoding of the first feature region is completed;

The loop sub-unit 40224 is used for loop decoding until the decoding of all characteristic regions is completed in sequence.

Optionally, as shown in FIG. 7, the second decoding subunit 40223 is further configured to, after decoding the first characteristic region, determine the value of the first characteristic region according to the second attention parameter The decoded feature and the second feature region are used as inputs, and are input to the pre-trained long and short-term memory network for decoding.

Optionally, as shown in FIG. 8, the device further includes:

The up-sampling module 404 is configured to up-sample the characteristic image so that the size of the characteristic image is the same as the size of the image to be recognized;

The prediction module 405 is configured to perform pixel prediction on the up-sampled feature image according to the channel of the feature image, and predict the feature area to which each pixel in the up-sampled feature image belongs;

The labeling module 406 is configured to label the feature area to which each pixel in the up-sampled feature image belongs according to the time sequence attribute of the channel, so that each pixel in the up-sampled feature image The characteristic area to which it belongs has time-series attributes, and the marked characteristic image is obtained;

The decoding module 402 is further configured to input the labeled feature image into a preset feature decoding space according to the time sequence attribute, and use an attention mechanism to place the feature region in the feature decoding space according to the time sequence attribute. To decode.

It should be noted that the license plate number recognition device provided in the embodiment of the present invention can be applied to mobile phones, monitors, computers, servers and other devices that need to perform license plate number recognition.

The license plate number recognition device provided by the embodiment of the present invention can realize each process realized by the license plate number recognition method in the foregoing method embodiment, and can achieve the same beneficial effects. To avoid repetition, I won’t repeat them here.

Referring to FIG. 9, FIG. 9 is a schematic structural diagram of an electronic device provided by an embodiment of the present invention. As shown in FIG. 9, it includes: a memory 902, a processor 901, and a memory 902 stored in the memory 902 and available in the processor. The computer program running on 901, of which:

The processor 901 is configured to call a computer program stored in the memory 902, and execute the following steps:

Optionally, the preset feature coding space includes a pre-trained space transformation network and a pre-trained coding network. The processor 901 performs the input of the image to be recognized into the preset feature coding space. Correction and encoding, encoding to obtain feature images with multiple channels, including:

Optionally, the preset feature decoding space includes a pre-trained attention mechanism and a pre-trained long and short-term memory network, and the processor 901 executes the input of the feature image according to the time sequence attribute Go to the preset feature decoding space, and decode the feature area corresponding to the channel according to the time sequence attribute through the attention mechanism, including:

Optionally, the execution of the processor 901 to notify the preset trained long and short-term memory network according to the order to sequentially decode the feature regions corresponding to the order includes:

After the decoding of the first feature region is completed, notify the pre-trained long and short-term memory network to decode the second feature region through the second attention parameter;

Loop decoding until the decoding of all feature regions is completed in sequence.

Optionally, the processor 901 executes the decoding of the second characteristic region after the first characteristic region is decoded, and the second attention parameter is used to notify the pre-trained long and short-term memory network to decode the second characteristic region. ,include:

After the first feature region is decoded, according to the second attention parameter, the decoded features of the first feature region and the second feature region are used as input, and then input to the pre-trained long and short-term memory network In the decoding.

Optionally, after the image to be recognized is input into a preset feature encoding space for correction and encoding to obtain a feature image with multiple channels, the processor 901 further executes the following steps:

The processor 901 executes the input of the characteristic image into a preset characteristic decoding space according to the time series attribute, and decodes the characteristic region corresponding to the channel according to the time series attribute through an attention mechanism, including :

It should be noted that the above-mentioned electronic device may be applied to devices such as mobile phones, monitors, computers, servers, etc., that require license plate number recognition.

The electronic device provided in the embodiment of the present invention can implement each process implemented by the license plate number recognition method in the foregoing method embodiment, and can achieve the same beneficial effects. To avoid repetition, details are not described herein again.

The embodiment of the present invention also provides a computer-readable storage medium. The computer-readable storage medium stores a computer program. To achieve the same technical effect, in order to avoid repetition, I will not repeat them here.

A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through a computer program. The program can be stored in a computer readable storage medium, and the program can be stored in a computer readable storage medium. During execution, it may include the procedures of the above-mentioned method embodiments. Wherein, the storage medium may be a magnetic disk, an optical disc, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM for short), etc.

The above-disclosed are only the preferred embodiments of the present invention, which of course cannot be used to limit the scope of rights of the present invention. Therefore, equivalent changes made according to the claims of the present invention still fall within the scope of the present invention.

Claims

A method for recognizing a license plate number is characterized in that it comprises the following steps:

Input the image to be recognized into the preset feature encoding space for correction and encoding, and obtain a feature image with multiple channels. The image to be identified includes license plate information, and the feature image includes multiple channels corresponding to the multiple channels. In a characteristic area, the channel has a time sequence attribute;

Inputting the feature image into a preset feature decoding space according to the time series attribute, and in the feature decoding space, using an attention mechanism to decode feature regions in the feature image according to the time series attribute;

The decoding result is output according to the time sequence attribute, and the recognition result of the image to be recognized is obtained.
The method of claim 1, wherein the preset feature coding space includes a pre-trained space transformation network and a pre-trained coding network, and the image to be recognized is input to the preset feature coding Spatial correction and coding, coding to obtain feature images with multiple channels, including:

Performing correction prediction on the image to be recognized in the pre-trained spatial transformation network, and correcting the image to be recognized according to the prediction result to obtain a corrected image;

Perform convolution calculation on the corrected image through multiple convolution kernels in the coding network to obtain a feature image with multiple channels, where the number of channels is the same as the number of convolution kernels, and The time sequence attribute of the channel is associated with the sequence of calculation of the convolution kernel.
The method of claim 1, wherein the preset feature decoding space includes a pre-trained attention mechanism and a pre-trained long- and short-term memory network, and the feature image is divided according to the time sequence The attributes are input into the preset feature decoding space, and the feature region corresponding to the channel is decoded according to the time sequence attributes through the attention mechanism, including:

When the feature image is input into the feature decoding space according to the time sequence attribute, reporting the time sequence attribute of each channel to the pre-trained attention mechanism;

Through the pre-trained attention mechanism, the feature regions corresponding to the channels are sorted according to the time sequence attributes, and the pre-trained long- and short-term memory network is notified according to the sorting order to sequentially decode the sequence corresponding to the sorting Characteristic area.
The method according to claim 3, wherein said notifying said preset trained long and short-term memory network according to said ranking to sequentially decode the characteristic regions corresponding to said ranking, comprising:

Output first attention parameters in the order by the pre-trained attention mechanism, and notify the pre-trained long and short-term memory network to decode the first feature region through the first attention parameters;

When decoding the first feature region, the pre-trained attention mechanism outputs a second attention parameter according to the order, and the second attention parameter includes the position of the second feature region;

After the decoding of the first feature region is completed, notify the pre-trained long and short-term memory network to decode the second feature region through the second attention parameter;

Loop decoding until the decoding of all feature regions is completed in sequence.
The method according to claim 4, characterized in that, after the decoding of the first feature region is completed, the second attention parameter is used to notify the pre-trained long and short-term memory network of the second feature region. Decoding, including:

After the first feature region is decoded, according to the second attention parameter, the decoded features of the first feature region and the second feature region are used as input, and then input to the pre-trained long and short-term memory network In the decoding.
The method according to claim 1, characterized in that, after inputting the image to be recognized into a preset feature encoding space for correction and encoding to obtain a feature image with multiple channels, the method further comprises:

Up-sampling the characteristic image so that the size of the characteristic image is the same as the size of the image to be recognized;

Performing pixel point prediction on the up-sampled feature image according to the channel of the feature image, and predicting the feature area to which each pixel point in the up-sampled feature image belongs;

According to the time series attribute of the channel, annotate the feature area to which each pixel in the up-sampled feature image belongs, so that the feature area to which each pixel in the up-sampled feature image belongs has time sequence Attributes to get the marked feature image;

The inputting the characteristic image into a preset characteristic decoding space according to the time series attribute, and decoding the characteristic region corresponding to the channel according to the time series attribute through an attention mechanism, includes:

The labeled feature image is input into a preset feature decoding space according to the time sequence attribute, and the feature regions in the labeled feature image are sequentially decoded according to the time sequence attribute in the feature decoding space through an attention mechanism.
A vehicle license plate number recognition device, characterized in that the device includes:

The encoding module is used to input the image to be recognized into a preset feature encoding space for correction and encoding to obtain a feature image with multiple channels. The image to be identified includes license plate information, and the feature image includes A plurality of characteristic regions corresponding to a channel, the channel having a time sequence attribute;

The decoding module is used to input the characteristic image into a preset characteristic decoding space according to the time sequence attribute, and in the characteristic decoding space, the characteristic regions in the characteristic image are sequentially arranged according to the time sequence attribute through the attention mechanism. Decode

The output module is used to output the decoding result according to the time sequence attribute to obtain the recognition result of the image to be recognized.
8. The device of claim 7, wherein the preset feature coding space includes a pre-trained spatial transformation network and a pre-trained coding network, and the coding module includes:

The correction unit is configured to perform correction prediction on the image to be recognized in the pre-trained spatial transformation network, and correct the image to be recognized according to the prediction result to obtain a corrected image;

The encoding unit is configured to input the corrected image into the pre-trained encoding network, and perform convolution calculation on the corrected image through multiple convolution kernels in the encoding network to obtain a multiple channel A feature image, wherein the number of the channels is the same as the number of the convolution kernels, and the timing attributes of the channels are associated with the order of calculation of the convolution kernels.
An electronic device, characterized by comprising: a memory, a processor, and a computer program stored on the memory and capable of running on the processor. The processor executes the computer program as claimed in claim 1. Steps in the license plate number recognition method described in any one of to 6.
A computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the license plate number according to any one of claims 1 to 6 is realized. Identify the steps in the method.