CN112861840A

CN112861840A - Complex scene character recognition method and system based on multi-feature fusion convolutional network

Info

Publication number: CN112861840A
Application number: CN202110260333.8A
Authority: CN
Inventors: 孙锬锋; 蒋兴浩; 许可; 舒常思
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2021-03-10
Filing date: 2021-03-10
Publication date: 2021-05-28

Abstract

The invention provides a complex scene character recognition method and a complex scene character recognition system based on a multi-feature fusion convolutional network, wherein the complex scene character recognition method comprises the following steps: a characteristic extraction step: constructing a convolutional neural network based on a multi-feature fusion method, and extracting features of image characters to obtain a feature map containing relative position information and time sequence information; a confidence degree estimation step: constructing a bidirectional LSTM network, and inputting all the feature maps into the bidirectional LSTM network to obtain an image character confidence coefficient estimation sequence; a mapping step: and constructing a transcription layer, and mapping the image character confidence coefficient estimation sequence to obtain an output sequence as a character recognition result. The invention solves the problem that the license plate character recognition precision is not high in the existing method under the conditions of fuzzy images, overlarge license plate inclination angle, rain, snow, fog and other weather conditions and complex scenes of overexposure or over-darkness and the like, and improves the universality of the license plate character recognition method in practical application.

Description

Complex scene character recognition method and system based on multi-feature fusion convolutional network

Technical Field

The invention relates to the field of computer vision, in particular to a method and a system for recognizing characters in a complex scene based on a multi-feature fusion convolutional network.

Background

Due to the rapid development of the economic level of our country in recent years, the demand for character recognition is also increasing. The automatic recognition of the characters in the complex scene is realized, the management efficiency can be improved, and the labor cost can be reduced. Therefore, character recognition technology has become a hot spot in recent years. The conventional character recognition techniques can be divided into two-stage character recognition techniques and one-stage character recognition techniques.

The two-stage character recognition technology is that the first stage performs character segmentation, and the second stage recognizes a segmented single character image. The character segmentation method comprises methods such as edge extraction, horizontal and vertical projection, characteristic projection and the like; the character recognition method comprises a template matching method, a hidden Markov model, a support vector machine, an artificial neural network and other methods. Due to the fact that the joint between the two steps is prone to errors and continuous semantic information is damaged, the robustness of the overall recognition is poor. And the method is difficult to realize the parallelization of the calculation, thereby causing high average processing delay.

The method comprises a step of character recognition technology, namely, inputting a complete character sequence image by a recognition system, and obtaining a recognized character sequence result in one step according to a character recognition model of the recognition system. The current common approach is to use a convolutional neural network model. The method keeps complete semantic information of the character sequence, and has better robustness and higher identification accuracy. Meanwhile, the method can realize parallelization calculation to a certain extent so as to improve the processing efficiency.

In the existing character Recognition technology, a paper "a Light CNN for End-to-End Car Light sites Detection and Recognition" published by w.wang, j.yang, m.chen and p.wang in IEEE Access, vol.7 at 11/28/2019 proposes an End-to-End character Recognition network model, performs feature extraction by using a CNN convolutional neural network, and then constructs an RNN network to train the feature network. The method can identify the character sequence without segmentation, but compared with the multi-feature fusion convolution network designed by the invention, the CNN convolution network adopted in the feature extraction can not extract effective features with high quality when aiming at the license plate under a complex scene, and the accuracy rate of the license plate character identification under the complex scene is lower. A patent document of Hu nan Tumo communication technology Limited company, which is published in 2019, 12 and 27, provides an end-to-end-based license plate character recognition model in a real-time license plate recognition method based on deep learning in a complex scene (CN 110619327A). A lightweight MobileNet neural network is used as a feature extraction network and added into a deep learning object detection algorithm SSD, and character categories are obtained by adopting full-link mapping. However, compared with the character recognition method provided by the invention, which adopts the recurrent neural network and the connection time sequence classification to realize the recognition of the character sequence with the indefinite length, the method adopts seven parallel full-connection layers to predict seven characters in the license plate character recognition respectively, so that the license plate character recognition method cannot recognize a new energy license plate with 8 characters, and has lower accuracy in recognizing license plate characters in a complex scene.

The complex scene comprises: and the character recognition accuracy is not high in scenes caused by the weather conditions such as image blurring, overlarge inclination angle of the character sequence, rain, snow, fog and the like, and overexposure and the like.

Disclosure of Invention

Aiming at the defects in the prior art, the invention aims to provide a complex scene character recognition method and system based on a multi-feature fusion convolutional network.

The invention provides a complex scene character recognition method based on a multi-feature fusion convolutional network, which comprises the following steps:

a characteristic extraction step: constructing a convolutional neural network based on a multi-feature fusion method, and extracting features of image characters to obtain a feature map containing relative position information and time sequence information;

a confidence degree estimation step: constructing a bidirectional LSTM network, and inputting all the feature maps into the bidirectional LSTM network to obtain an image character confidence coefficient estimation sequence;

a mapping step: and constructing a transcription layer, and mapping the image character confidence coefficient estimation sequence to obtain an output sequence as a character recognition result.

Preferably, the method further comprises the following steps:

model training: training the convolutional neural network through a sample picture;

and (3) testing the model: and fixing the parameters of the trained convolutional neural network, and testing the accuracy of the convolutional neural network.

Preferably, the feature extraction step includes:

and constructing a convolutional neural network, and adding a multilayer feature fusion structure into a second layer of the convolutional neural network, wherein the multilayer feature fusion structure comprises two branches added on convolutional layers of the convolutional neural network, one branch is connected with one 1 × 1 convolutional layer, and the other branch is connected with one 5 × 5 convolutional layer.

Preferably, the bidirectional LSTM network comprises: forward LSTM and backward LSTM;

and the forward LSTM and the backward LSTM are formed by connecting a plurality of LSTM units in a chain manner, each LSTM unit comprises an input gate and an output gate, the characteristic diagram is correspondingly input into the input gate of the corresponding LSTM unit, and the output value of the output gate is converted by applying an activation function to obtain the image character confidence coefficient estimation sequence.

Preferably, the image character confidence estimation sequence is set to y ═ y (y)₁,y₂，…，y_T) Then the conditional probability of the target sequence pi is

T is the number of LSTM units, a shorter sequence is obtained through many-to-one mapping and is used as a final prediction result, different target sequences pi are mapped to obtain the same result, and therefore the probability of the final output result is the sum of conditional probabilities of all obtained target sequences pi

Where β is the sequence-to-sequence mapping function and l is the mapping sequence.

The invention provides a complex scene character recognition system based on a multi-feature fusion convolutional network, which comprises the following steps:

a feature extraction module: constructing a convolutional neural network based on a multi-feature fusion method, and extracting features of image characters to obtain a feature map containing relative position information and time sequence information;

a confidence estimation module: constructing a bidirectional LSTM network, and inputting all the feature maps into the bidirectional LSTM network to obtain an image character confidence coefficient estimation sequence;

a mapping module: and constructing a transcription layer, and mapping the image character confidence coefficient estimation sequence to obtain an output sequence as a character recognition result.

Preferably, the method further comprises the following steps:

a model training module: training the convolutional neural network through a sample picture;

a model testing module: and fixing the parameters of the trained convolutional neural network, and testing the accuracy of the convolutional neural network.

Preferably, the feature extraction module includes:

Preferably, the image character confidence estimation sequence is set to y ═ y (y)₁,y₂,…,y_T) Then the conditional probability of the target sequence pi is

Compared with the prior art, the invention has the following beneficial effects:

1. the license plate character recognition method disclosed by the invention is suitable for license plate recognition application in different scenes, can support license plate recognition of different types and character lengths, solves the problem that the license plate character recognition precision is not high in the existing method under the conditions of blurred images, overlarge license plate inclination angle, rain, snow, fog and other weather conditions and complex scenes of overexposure or over-darkness and the like, and improves the universality of the license plate character recognition method in practical application.

2. Compared with the traditional character recognition method, the deep learning network model-based vehicle license plate recognition method based on the deep learning network model does not need a character segmentation step, retains complete semantic information of the vehicle license plate, and has better robustness and higher recognition accuracy.

3. When the method is used for feature extraction, a multi-feature fusion method is adopted, compared with the traditional convolution, the multi-feature fusion method can better learn low-level features and high-level features in the license plate, and effectively prevent the reduction of recognition accuracy caused by feature loss, so that the recognition accuracy of the method in a complex scene is improved.

Drawings

Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:

fig. 1 is an overall framework diagram of a license plate character recognition method in a complex scene based on deep learning according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of a multi-feature fused feature extraction network according to an embodiment of the present invention.

Detailed Description

The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.

The embodiment takes license plate recognition as an example, and the license plate recognition comprises a common type license plate and special types of license plates such as a new energy source, a police plate and a military plate. Those skilled in the art will appreciate that the present invention is also applicable to other fields of character recognition, such as character recognition on media such as displays, paper, etc.

As shown in fig. 1, the complex scene character recognition method based on the multi-feature fusion convolutional network provided by the present invention includes:

a characteristic extraction step: and constructing a convolutional neural network based on a multi-feature fusion method, and extracting features of image characters to obtain a feature map containing relative position information and time sequence information. The convolutional neural network is mainly constructed by a convolutional layer, a maximum pooling layer and a ReLU.

A confidence degree estimation step: and constructing a bidirectional LSTM network, and inputting all the characteristic graphs into the bidirectional LSTM network to obtain an image character confidence coefficient estimation sequence.

Model training: and training the convolutional neural network through a sample picture.

The characteristic extraction step comprises: as shown in fig. 2, a convolutional neural network is constructed, and a multi-layer feature fusion structure is added to a second layer of the convolutional neural network, wherein the multi-layer feature fusion structure comprises two branches added to convolutional layers of the convolutional neural network, one branch is connected with one 1 × 1 convolutional layer, and the other branch is connected with one 5 × 5 convolutional layer.

The bidirectional LSTM network includes: forward LSTM and backward LSTM. And the forward LSTM and the backward LSTM are formed by connecting a plurality of LSTM units in a chain manner, each LSTM unit comprises an input gate and an output gate, the characteristic diagram is correspondingly input into the input gate of the corresponding LSTM unit, and the output value of the output gate is converted by applying an activation function to obtain the image character confidence coefficient estimation sequence.

The image character confidence estimation sequence is set as y ═ y (y)₁,y₂,…,y_T) Then the conditional probability of the target sequence pi is

Data set

The training and testing data set comprises a real data set collected from a real environment and a synthetic data set generated using a computer.

The license plate images in the real license plate data set come from real shooting and an open source China license plate data set CCPD, and contain 7561 license plate images together. CCPD data sets were designed by the paper Towards End-to-End License Plate Detection and Recognition A Large Dataset and Baseline (Xu Z, Yang W, Meng A, Computer Vision-ECCV 2018.Springer, Cham, 2018.) A License Plate data set of China can be publicly downloaded in htps:// githu. com/detectRecog/CCPD for a total of over 30 million plates. In the invention, 3400 small license plates, 2700 large license plates, 825 new energy license plates and 390 other special license plates are selected from CCPD data set. The real shooting and collecting is to shoot the license plate areas of different vehicles by hands of people holding mobile phones, mainly shoot small license plates and large license plates under the real conditions, and specially collect rare new energy license plates and other special license plates; in the shooting process, complex scenes such as different angles, different backgrounds and different illumination conditions are shot in a certain proportion. 116 small license plates, 89 large license plates, 32 new energy license plates and 9 other special license plates are selected from the reality shooting collection.

The synthetic license plate data set generates 10 thousands of simulated license plates by using an OpenCV-based method, and then carries out style migration on the simulated license plates through a countermeasure generating network to obtain the synthetic license plates which accord with the real style of complex scenes. Wherein 4 pieces of small license plates, 3 pieces of large license plates, 2 pieces of new energy license plates and 1 piece of the rest license plates.

For the normal condition test set, the selected images have no intersection with the training data set, and 1000 license plate images are randomly selected; 400 complex license plate images are selected from the test data set under the complex scene.

Description of the test

In the testing process of the embodiment, the fake plate character recognition model is built and trained by using Keras based on the complex scene of deep learning.

Firstly, a recognition model is pre-trained by using a generated data set, so that the recognition model learns certain priori knowledge to obtain proper initial weight. And then, fine tuning is carried out by using the weight parameters in the real data set multi-model to obtain a better network weight. In the training process, an EarlyStopping function of Keras is adopted to prevent overfitting in the training.

And after training is finished, selecting the weight with the lowest loss on the test set from all the stored intermediate results for use. The test is performed in both the normal and complex test sets. The accuracy of the test is defined as the number of license plates with correct character recognition/the total number of license plates tested.

Test results

According to the invention, on the test set of the normal condition data set, the license plate character recognition accuracy is 92.7%, and on the test set of the complex scene data set, the license plate character recognition accuracy is 87.2%; the commonly used CRNN-CTC license plate recognition network has the license plate character recognition accuracy rate of 91.2% on a test set of a normal condition data set and the license plate character recognition accuracy rate of 80.0% on a test set of a complex scene data set; the method has higher recognition accuracy in the test set of correct conditions and complex scenes, has obvious advantages in the test set of complex license plates, and proves the effectiveness of the method.

Those skilled in the art will appreciate that, in addition to implementing the system and its various devices, modules, units provided by the present invention as pure computer readable program code, the system and its various devices, modules, units provided by the present invention can be fully implemented by logically programming method steps in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system and various devices, modules and units thereof provided by the invention can be regarded as a hardware component, and the devices, modules and units included in the system for realizing various functions can also be regarded as structures in the hardware component; means, modules, units for performing the various functions may also be regarded as structures within both software modules and hardware components for performing the method.

The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims

1. A complex scene character recognition method based on a multi-feature fusion convolutional network is characterized by comprising the following steps:

2. The complex scene character recognition method based on the multi-feature fusion convolutional network of claim 1, further comprising:

3. The complex scene character recognition method based on the multi-feature fusion convolutional network of claim 1, wherein the feature extraction step comprises:

4. The complex scene character recognition method based on the multi-feature fusion convolutional network of claim 1, wherein the bidirectional LSTM network comprises: forward LSTM and backward LSTM;

5. The method of claim 1, wherein the image character confidence estimation sequence is set to y ═ y (y)₁，y₂,…,y_T) Then the conditional probability of the target sequence pi is

6. A complex scene character recognition system based on a multi-feature fusion convolutional network is characterized by comprising:

7. The complex scene character recognition system based on the multi-feature fusion convolutional network of claim 6, further comprising:

8. The complex scene character recognition system based on the multi-feature fusion convolutional network of claim 6, wherein the feature extraction module comprises:

9. The complex scene character recognition system based on multi-feature fusion convolutional network of claim 6, wherein the bidirectional LSTM network comprises: forward LSTM and backward LSTM;

10. The complex scene character recognition system based on multi-feature fusion convolution network of claim 6 wherein the image character confidence estimation sequence is set to y ═ y (y)₁,y₂,…,y_T) Then the conditional probability of the target sequence pi is