CN110889385A

CN110889385A - Handwritten text recognition method based on local adjacent attention

Info

Publication number: CN110889385A
Application number: CN201911211051.8A
Authority: CN
Inventors: 吴烨; 李锐; 于治楼
Original assignee: Shandong Inspur Artificial Intelligence Research Institute Co Ltd
Current assignee: Shandong Inspur Artificial Intelligence Research Institute Co Ltd
Priority date: 2019-12-02
Filing date: 2019-12-02
Publication date: 2020-03-17

Abstract

The invention particularly relates to a handwritten text recognition method based on local adjacent attention. The handwritten text recognition method based on local adjacent attention adopts an encoder-decoder framework, and also considers the characteristics of the adjacent area when calculating the attention of the area by utilizing a local adjacent attention network, so that a more accurate attention area is obtained, and the attention position is not shifted. The handwritten text recognition method based on local adjacent attention adopts a local adjacent attention network, so that noise can be effectively controlled, and the deviation of attention is avoided; the local adjacent attention network can carry out end-to-end training with the encoder-decoder framework, so that the recognition effect is improved, images do not need to be preprocessed, models do not need to be pre-trained, character-level labeling is not needed, an excellent recognition effect can still be obtained on the premise of simplifying the models and reducing the calculation amount, and the local adjacent attention network is more efficient and high in practicability.

Description

Handwritten text recognition method based on local adjacent attention

Technical Field

The invention relates to the technical field of computer vision and natural language processing in deep learning, in particular to a handwritten text recognition method based on local adjacent attention.

Background

Since handwritten text images in real scenes are very complex, the images often contain distorted or overlapping characters, characters of different fonts, sizes and colors, and complex background noise. Therefore, the text information in the text recognition task picture is essential for the visual semantic understanding task. However, the handwritten text Recognition is different from the conventional OCR (Optical Character Recognition), and the main reason is that each person has different writing habits, such as font, size, density, and even direction.

With the application of the deep neural network, the effect of regular text recognition is improved rapidly, but the efficiency of irregular text recognition is still low, and a large rising space exists.

Early approaches to irregular text recognition were mainly based on corrective approaches. This type of method attempts to correct irregular text blocks into regular blocks, however, correcting severe distortion or warping causes great difficulty in recognizing the text, resulting in a low recognition rate.

In recent years, attention has been widely used for text recognition and some effect has been achieved. Text recognition methods based on attention mechanisms typically employ an encoder-decoder framework. In the encoding stage, the image is converted into a series of feature vectors through a convolutional neural network or a cyclic neural network, and each feature vector corresponds to one region of the input image, namely the region concerned by the attention network. In the decoding stage, the attention network firstly calculates the weight of each feature region at the current moment by referring to the historical information of the target character and the vector feature during encoding, and obtains the image feature at the current moment after weighting and summing the weight and the original vector feature. The target character is generated using a recurrent neural network based on the attention information and historical information of the target character at the time of decoding.

However, the attention area calculated by the attention model in the decoding stage is often offset from the real label due to the complexity of the image, and the offset is called attention offset. To address the attention-bias problem, the technician implements an additional control network to correct the attention-bias during the decoding phase. Usually, the training data only contains image and word level labels, but this method needs additional character level labels to supervise the training of the control network, and the character level labels are very labor and time consuming. In addition, the addition of the control network not only requires extra pre-training time, but also increases the parameter amount of the whole method, and may cause problems of difficult training or low generalization capability.

In summary, the attention-based method has the problem of attention deviation, which results in a low recognition rate, whereas in the prior art, in order to solve the problem of attention deviation, the model becomes more complex, and character-level labeling is required, which is difficult to apply to practical problems.

In view of the above situation, the present invention provides a method for recognizing handwritten text based on local proximity attention.

Disclosure of Invention

In order to make up for the defects of the prior art, the invention provides a simple and efficient handwritten text recognition method based on local adjacent attention.

The invention is realized by the following technical scheme:

a handwritten text recognition method based on local adjacent attention is characterized in that: by adopting the encoder-decoder framework, in order to solve the problem of attention deviation, the local neighbor attention network is utilized to calculate the attention degree of the region, and not only the characteristics and the historical information of the current region are considered, but also the characteristics of the adjacent region are considered, so that a more accurate attention region is obtained, and the attention position is not deviated.

A convolutional neural network and a cyclic neural network are configured in the encoder; the handwritten text image is input into an encoder to obtain the overall characteristics of the encoder.

The encoding stage steps within the encoder are as follows:

firstly, inputting a handwritten text image into a convolutional neural network to obtain a characteristic diagram;

secondly, dividing the characteristic diagram into a series of characteristics according to columns as the input of each moment of the recurrent neural network;

and thirdly, outputting the integral characteristics of the encoder by the recurrent neural network.

The recurrent neural network has time sequence memory capability, and the sequence of the characters can be memorized by using the integral characteristics of each character corresponding to the moment under the condition of not losing the characteristics.

And a circular neural network and a local neighbor attention network are configured in the decoder, and the output of the circular neural network is jointly determined by the hidden layer state at the current moment and the output of the local neighbor attention network.

The local neighbor attention network calculates the attention degree of each region at the current moment by referring to the feature map of each region and the neighbor region thereof and the state of the hidden layer of the recurrent neural network, and obtains the attention feature map at the current moment by weighting and summing with the original feature map.

In the decoding stage in the decoder, the overall characteristics of the encoder output in the encoding stage are used as the input of the zero-th time of the recurrent neural network, the input of the first time is used as a starting mark, the character output at the previous time is used as the input of the later time, and the decoder output ending mark indicates the end of the identification process.

The invention relates to a handwritten text recognition method based on local adjacent attention, which adopts end-to-end training, the parameters of a cyclic neural network in an encoding stage and a decoding stage are not shared, the input of the cyclic neural network of a decoder is characters in a label during training, and the input of the cyclic neural network of the decoder is characters predicted at the last moment in the recognition process. A fully trained network can easily and undisturbed recognize complex handwritten text images.

The invention relates to a handwritten text recognition method based on local adjacent attention, which comprises the following implementation steps:

firstly, acquiring original data as a training set and a test set;

the training set and test set contain information that may be present in various situations, such as a clean font and cluttered writing, a clean background and background noise with grids, different font sizes, etc.

Step two, training set data are disorganized, a small batch gradient descent algorithm is used for optimizing the whole network model, and the learning rate and the optimizer parameters are adjusted according to the actual condition;

and thirdly, testing the network model by using the test set data, and selecting the model with good test effect as the final identification model.

The invention has the beneficial effects that: the handwritten text recognition method based on local adjacent attention adopts a local adjacent attention network, so that noise can be effectively controlled, and the deviation of attention is avoided; the local adjacent attention network can carry out end-to-end training with the encoder-decoder framework, so that the recognition effect is improved, images do not need to be preprocessed, models do not need to be pre-trained, character-level labeling is not needed, an excellent recognition effect can still be obtained on the premise of simplifying the models and reducing the calculation amount, and the local adjacent attention network is more efficient and high in practicability.

Drawings

FIG. 1 is a diagram illustrating a method for recognizing handwritten text based on local proximity attention according to the present invention.

Detailed Description

In order to make those skilled in the art better understand the technical solution of the present invention, the technical solution in the embodiment of the present invention will be clearly and completely described below with reference to the embodiment of the present invention. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The handwritten text recognition method based on local adjacent attention adopts an encoder-decoder framework, and in order to solve the problem of attention deviation, when the local adjacent attention network is used for calculating the attention degree of the area, not only the characteristics and the historical information of the current area are considered, but also the characteristics of the adjacent area are considered, so that a more accurate attention area is obtained, and the attention position is not deviated.

The encoding stage steps within the encoder are as follows:

The handwritten text recognition method based on local adjacent attention adopts end-to-end training, parameters of a cyclic neural network in an encoding stage and a decoding stage are not shared, characters in a label are input into the cyclic neural network of a decoder during training, and characters predicted at the last moment are input into the cyclic neural network during recognition. A fully trained network can easily and undisturbed recognize complex handwritten text images.

The handwritten text recognition method based on local adjacent attention comprises the following implementation steps:

firstly, acquiring original data as a training set and a test set;

Compared with the prior art, the handwritten text recognition method based on local adjacent attention has the following characteristics:

firstly, a local adjacent attention network is adopted, and the neighbor information of a region is considered, so that the noise can be effectively controlled, and the deviation of attention is avoided;

secondly, the local adjacent attention network can be trained end to end with the encoder-decoder framework, so that the recognition effect is improved, the image is not required to be preprocessed, the model is not required to be pre-trained, and the character level marking is not required;

thirdly, on the premise of simplifying the model and reducing the calculation amount, excellent recognition effect is still obtained, more efficiency is achieved, and practicability is high.

A method for recognizing handwritten text based on local proximity attention in the embodiment of the present invention is described in detail above. While the present invention has been described with reference to specific examples, which are provided to assist in understanding the core concepts of the present invention, it is intended that all other embodiments that can be obtained by those skilled in the art without departing from the spirit of the present invention shall fall within the scope of the present invention.

Claims

1. A handwritten text recognition method based on local adjacent attention is characterized in that: by adopting the encoder-decoder framework, in order to solve the problem of attention deviation, the local neighbor attention network is utilized to calculate the attention degree of the region, and not only the characteristics and the historical information of the current region are considered, but also the characteristics of the adjacent region are considered, so that a more accurate attention region is obtained, and the attention position is not deviated.

2. The method of local proximity attention based handwritten text recognition according to claim 1, characterized in that: a convolutional neural network and a cyclic neural network are configured in the encoder; the handwritten text image is input into an encoder to obtain the overall characteristics of the encoder.

3. The method of claim 2, wherein the encoding stage within the encoder comprises the steps of:

4. The method of local proximity attention based handwritten text recognition according to claim 3, characterized in that: the recurrent neural network has time sequence memory capability, and the sequence of the characters can be memorized by using the integral characteristics of each character corresponding to the moment under the condition of not losing the characteristics.

5. The method of local proximity attention based handwritten text recognition according to claim 1, characterized in that: and a circular neural network and a local neighbor attention network are configured in the decoder, and the output of the circular neural network is jointly determined by the hidden layer state at the current moment and the output of the local neighbor attention network.

6. The method of local proximity attention based handwritten text recognition according to claim 1 or 5, characterized in that: the local neighbor attention network calculates the attention degree of each region at the current moment by referring to the feature map of each region and the neighbor region thereof and the state of the hidden layer of the recurrent neural network, and obtains the attention feature map at the current moment by weighting and summing with the original feature map.

7. The method of local proximity attention based handwritten text recognition according to claim 6, characterized in that: in the decoding stage in the decoder, the overall characteristics of the encoder output in the encoding stage are used as the input of the zero-th time of the recurrent neural network, the input of the first time is used as a starting mark, the character output at the previous time is used as the input of the later time, and the decoder output ending mark indicates the end of the identification process.

8. The method of local proximity attention based handwritten text recognition according to claim 3 or 6, characterized in that: the method adopts end-to-end training, the parameters of the cyclic neural network in the encoding stage and the decoding stage are not shared, the input of the cyclic neural network of the decoder is characters in a label during training, and the input of the cyclic neural network in the recognition process is characters predicted at the last moment.

9. The method of claim 8, wherein the steps of:

firstly, acquiring original data as a training set and a test set;