CN110889385A - Handwritten text recognition method based on local adjacent attention - Google Patents

Handwritten text recognition method based on local adjacent attention Download PDF

Info

Publication number
CN110889385A
CN110889385A CN201911211051.8A CN201911211051A CN110889385A CN 110889385 A CN110889385 A CN 110889385A CN 201911211051 A CN201911211051 A CN 201911211051A CN 110889385 A CN110889385 A CN 110889385A
Authority
CN
China
Prior art keywords
attention
neural network
handwritten text
local
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911211051.8A
Other languages
Chinese (zh)
Inventor
吴烨
李锐
于治楼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Inspur Artificial Intelligence Research Institute Co Ltd
Original Assignee
Shandong Inspur Artificial Intelligence Research Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Inspur Artificial Intelligence Research Institute Co Ltd filed Critical Shandong Inspur Artificial Intelligence Research Institute Co Ltd
Priority to CN201911211051.8A priority Critical patent/CN110889385A/en
Publication of CN110889385A publication Critical patent/CN110889385A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/32Digital ink
    • G06V30/333Preprocessing; Feature extraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24147Distances to closest patterns, e.g. nearest neighbour classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/32Digital ink
    • G06V30/36Matching; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Abstract

The invention particularly relates to a handwritten text recognition method based on local adjacent attention. The handwritten text recognition method based on local adjacent attention adopts an encoder-decoder framework, and also considers the characteristics of the adjacent area when calculating the attention of the area by utilizing a local adjacent attention network, so that a more accurate attention area is obtained, and the attention position is not shifted. The handwritten text recognition method based on local adjacent attention adopts a local adjacent attention network, so that noise can be effectively controlled, and the deviation of attention is avoided; the local adjacent attention network can carry out end-to-end training with the encoder-decoder framework, so that the recognition effect is improved, images do not need to be preprocessed, models do not need to be pre-trained, character-level labeling is not needed, an excellent recognition effect can still be obtained on the premise of simplifying the models and reducing the calculation amount, and the local adjacent attention network is more efficient and high in practicability.

Description

Handwritten text recognition method based on local adjacent attention
Technical Field
The invention relates to the technical field of computer vision and natural language processing in deep learning, in particular to a handwritten text recognition method based on local adjacent attention.
Background
Since handwritten text images in real scenes are very complex, the images often contain distorted or overlapping characters, characters of different fonts, sizes and colors, and complex background noise. Therefore, the text information in the text recognition task picture is essential for the visual semantic understanding task. However, the handwritten text Recognition is different from the conventional OCR (Optical Character Recognition), and the main reason is that each person has different writing habits, such as font, size, density, and even direction.
With the application of the deep neural network, the effect of regular text recognition is improved rapidly, but the efficiency of irregular text recognition is still low, and a large rising space exists.
Early approaches to irregular text recognition were mainly based on corrective approaches. This type of method attempts to correct irregular text blocks into regular blocks, however, correcting severe distortion or warping causes great difficulty in recognizing the text, resulting in a low recognition rate.
In recent years, attention has been widely used for text recognition and some effect has been achieved. Text recognition methods based on attention mechanisms typically employ an encoder-decoder framework. In the encoding stage, the image is converted into a series of feature vectors through a convolutional neural network or a cyclic neural network, and each feature vector corresponds to one region of the input image, namely the region concerned by the attention network. In the decoding stage, the attention network firstly calculates the weight of each feature region at the current moment by referring to the historical information of the target character and the vector feature during encoding, and obtains the image feature at the current moment after weighting and summing the weight and the original vector feature. The target character is generated using a recurrent neural network based on the attention information and historical information of the target character at the time of decoding.
However, the attention area calculated by the attention model in the decoding stage is often offset from the real label due to the complexity of the image, and the offset is called attention offset. To address the attention-bias problem, the technician implements an additional control network to correct the attention-bias during the decoding phase. Usually, the training data only contains image and word level labels, but this method needs additional character level labels to supervise the training of the control network, and the character level labels are very labor and time consuming. In addition, the addition of the control network not only requires extra pre-training time, but also increases the parameter amount of the whole method, and may cause problems of difficult training or low generalization capability.
In summary, the attention-based method has the problem of attention deviation, which results in a low recognition rate, whereas in the prior art, in order to solve the problem of attention deviation, the model becomes more complex, and character-level labeling is required, which is difficult to apply to practical problems.
In view of the above situation, the present invention provides a method for recognizing handwritten text based on local proximity attention.
Disclosure of Invention
In order to make up for the defects of the prior art, the invention provides a simple and efficient handwritten text recognition method based on local adjacent attention.
The invention is realized by the following technical scheme:
a handwritten text recognition method based on local adjacent attention is characterized in that: by adopting the encoder-decoder framework, in order to solve the problem of attention deviation, the local neighbor attention network is utilized to calculate the attention degree of the region, and not only the characteristics and the historical information of the current region are considered, but also the characteristics of the adjacent region are considered, so that a more accurate attention region is obtained, and the attention position is not deviated.
A convolutional neural network and a cyclic neural network are configured in the encoder; the handwritten text image is input into an encoder to obtain the overall characteristics of the encoder.
The encoding stage steps within the encoder are as follows:
firstly, inputting a handwritten text image into a convolutional neural network to obtain a characteristic diagram;
secondly, dividing the characteristic diagram into a series of characteristics according to columns as the input of each moment of the recurrent neural network;
and thirdly, outputting the integral characteristics of the encoder by the recurrent neural network.
The recurrent neural network has time sequence memory capability, and the sequence of the characters can be memorized by using the integral characteristics of each character corresponding to the moment under the condition of not losing the characteristics.
And a circular neural network and a local neighbor attention network are configured in the decoder, and the output of the circular neural network is jointly determined by the hidden layer state at the current moment and the output of the local neighbor attention network.
The local neighbor attention network calculates the attention degree of each region at the current moment by referring to the feature map of each region and the neighbor region thereof and the state of the hidden layer of the recurrent neural network, and obtains the attention feature map at the current moment by weighting and summing with the original feature map.
In the decoding stage in the decoder, the overall characteristics of the encoder output in the encoding stage are used as the input of the zero-th time of the recurrent neural network, the input of the first time is used as a starting mark, the character output at the previous time is used as the input of the later time, and the decoder output ending mark indicates the end of the identification process.
The invention relates to a handwritten text recognition method based on local adjacent attention, which adopts end-to-end training, the parameters of a cyclic neural network in an encoding stage and a decoding stage are not shared, the input of the cyclic neural network of a decoder is characters in a label during training, and the input of the cyclic neural network of the decoder is characters predicted at the last moment in the recognition process. A fully trained network can easily and undisturbed recognize complex handwritten text images.
The invention relates to a handwritten text recognition method based on local adjacent attention, which comprises the following implementation steps:
firstly, acquiring original data as a training set and a test set;
the training set and test set contain information that may be present in various situations, such as a clean font and cluttered writing, a clean background and background noise with grids, different font sizes, etc.
Step two, training set data are disorganized, a small batch gradient descent algorithm is used for optimizing the whole network model, and the learning rate and the optimizer parameters are adjusted according to the actual condition;
and thirdly, testing the network model by using the test set data, and selecting the model with good test effect as the final identification model.
The invention has the beneficial effects that: the handwritten text recognition method based on local adjacent attention adopts a local adjacent attention network, so that noise can be effectively controlled, and the deviation of attention is avoided; the local adjacent attention network can carry out end-to-end training with the encoder-decoder framework, so that the recognition effect is improved, images do not need to be preprocessed, models do not need to be pre-trained, character-level labeling is not needed, an excellent recognition effect can still be obtained on the premise of simplifying the models and reducing the calculation amount, and the local adjacent attention network is more efficient and high in practicability.
Drawings
FIG. 1 is a diagram illustrating a method for recognizing handwritten text based on local proximity attention according to the present invention.
Detailed Description
In order to make those skilled in the art better understand the technical solution of the present invention, the technical solution in the embodiment of the present invention will be clearly and completely described below with reference to the embodiment of the present invention. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The handwritten text recognition method based on local adjacent attention adopts an encoder-decoder framework, and in order to solve the problem of attention deviation, when the local adjacent attention network is used for calculating the attention degree of the area, not only the characteristics and the historical information of the current area are considered, but also the characteristics of the adjacent area are considered, so that a more accurate attention area is obtained, and the attention position is not deviated.
A convolutional neural network and a cyclic neural network are configured in the encoder; the handwritten text image is input into an encoder to obtain the overall characteristics of the encoder.
The encoding stage steps within the encoder are as follows:
firstly, inputting a handwritten text image into a convolutional neural network to obtain a characteristic diagram;
secondly, dividing the characteristic diagram into a series of characteristics according to columns as the input of each moment of the recurrent neural network;
and thirdly, outputting the integral characteristics of the encoder by the recurrent neural network.
The recurrent neural network has time sequence memory capability, and the sequence of the characters can be memorized by using the integral characteristics of each character corresponding to the moment under the condition of not losing the characteristics.
And a circular neural network and a local neighbor attention network are configured in the decoder, and the output of the circular neural network is jointly determined by the hidden layer state at the current moment and the output of the local neighbor attention network.
The local neighbor attention network calculates the attention degree of each region at the current moment by referring to the feature map of each region and the neighbor region thereof and the state of the hidden layer of the recurrent neural network, and obtains the attention feature map at the current moment by weighting and summing with the original feature map.
In the decoding stage in the decoder, the overall characteristics of the encoder output in the encoding stage are used as the input of the zero-th time of the recurrent neural network, the input of the first time is used as a starting mark, the character output at the previous time is used as the input of the later time, and the decoder output ending mark indicates the end of the identification process.
The handwritten text recognition method based on local adjacent attention adopts end-to-end training, parameters of a cyclic neural network in an encoding stage and a decoding stage are not shared, characters in a label are input into the cyclic neural network of a decoder during training, and characters predicted at the last moment are input into the cyclic neural network during recognition. A fully trained network can easily and undisturbed recognize complex handwritten text images.
The handwritten text recognition method based on local adjacent attention comprises the following implementation steps:
firstly, acquiring original data as a training set and a test set;
the training set and test set contain information that may be present in various situations, such as a clean font and cluttered writing, a clean background and background noise with grids, different font sizes, etc.
Step two, training set data are disorganized, a small batch gradient descent algorithm is used for optimizing the whole network model, and the learning rate and the optimizer parameters are adjusted according to the actual condition;
and thirdly, testing the network model by using the test set data, and selecting the model with good test effect as the final identification model.
Compared with the prior art, the handwritten text recognition method based on local adjacent attention has the following characteristics:
firstly, a local adjacent attention network is adopted, and the neighbor information of a region is considered, so that the noise can be effectively controlled, and the deviation of attention is avoided;
secondly, the local adjacent attention network can be trained end to end with the encoder-decoder framework, so that the recognition effect is improved, the image is not required to be preprocessed, the model is not required to be pre-trained, and the character level marking is not required;
thirdly, on the premise of simplifying the model and reducing the calculation amount, excellent recognition effect is still obtained, more efficiency is achieved, and practicability is high.
A method for recognizing handwritten text based on local proximity attention in the embodiment of the present invention is described in detail above. While the present invention has been described with reference to specific examples, which are provided to assist in understanding the core concepts of the present invention, it is intended that all other embodiments that can be obtained by those skilled in the art without departing from the spirit of the present invention shall fall within the scope of the present invention.

Claims (9)

1. A handwritten text recognition method based on local adjacent attention is characterized in that: by adopting the encoder-decoder framework, in order to solve the problem of attention deviation, the local neighbor attention network is utilized to calculate the attention degree of the region, and not only the characteristics and the historical information of the current region are considered, but also the characteristics of the adjacent region are considered, so that a more accurate attention region is obtained, and the attention position is not deviated.
2. The method of local proximity attention based handwritten text recognition according to claim 1, characterized in that: a convolutional neural network and a cyclic neural network are configured in the encoder; the handwritten text image is input into an encoder to obtain the overall characteristics of the encoder.
3. The method of claim 2, wherein the encoding stage within the encoder comprises the steps of:
firstly, inputting a handwritten text image into a convolutional neural network to obtain a characteristic diagram;
secondly, dividing the characteristic diagram into a series of characteristics according to columns as the input of each moment of the recurrent neural network;
and thirdly, outputting the integral characteristics of the encoder by the recurrent neural network.
4. The method of local proximity attention based handwritten text recognition according to claim 3, characterized in that: the recurrent neural network has time sequence memory capability, and the sequence of the characters can be memorized by using the integral characteristics of each character corresponding to the moment under the condition of not losing the characteristics.
5. The method of local proximity attention based handwritten text recognition according to claim 1, characterized in that: and a circular neural network and a local neighbor attention network are configured in the decoder, and the output of the circular neural network is jointly determined by the hidden layer state at the current moment and the output of the local neighbor attention network.
6. The method of local proximity attention based handwritten text recognition according to claim 1 or 5, characterized in that: the local neighbor attention network calculates the attention degree of each region at the current moment by referring to the feature map of each region and the neighbor region thereof and the state of the hidden layer of the recurrent neural network, and obtains the attention feature map at the current moment by weighting and summing with the original feature map.
7. The method of local proximity attention based handwritten text recognition according to claim 6, characterized in that: in the decoding stage in the decoder, the overall characteristics of the encoder output in the encoding stage are used as the input of the zero-th time of the recurrent neural network, the input of the first time is used as a starting mark, the character output at the previous time is used as the input of the later time, and the decoder output ending mark indicates the end of the identification process.
8. The method of local proximity attention based handwritten text recognition according to claim 3 or 6, characterized in that: the method adopts end-to-end training, the parameters of the cyclic neural network in the encoding stage and the decoding stage are not shared, the input of the cyclic neural network of the decoder is characters in a label during training, and the input of the cyclic neural network in the recognition process is characters predicted at the last moment.
9. The method of claim 8, wherein the steps of:
firstly, acquiring original data as a training set and a test set;
step two, training set data are disorganized, a small batch gradient descent algorithm is used for optimizing the whole network model, and the learning rate and the optimizer parameters are adjusted according to the actual condition;
and thirdly, testing the network model by using the test set data, and selecting the model with good test effect as the final identification model.
CN201911211051.8A 2019-12-02 2019-12-02 Handwritten text recognition method based on local adjacent attention Pending CN110889385A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911211051.8A CN110889385A (en) 2019-12-02 2019-12-02 Handwritten text recognition method based on local adjacent attention

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911211051.8A CN110889385A (en) 2019-12-02 2019-12-02 Handwritten text recognition method based on local adjacent attention

Publications (1)

Publication Number Publication Date
CN110889385A true CN110889385A (en) 2020-03-17

Family

ID=69749847

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911211051.8A Pending CN110889385A (en) 2019-12-02 2019-12-02 Handwritten text recognition method based on local adjacent attention

Country Status (1)

Country Link
CN (1) CN110889385A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112418197A (en) * 2021-01-22 2021-02-26 北京世纪好未来教育科技有限公司 Simplified image acquisition model training method, image text recognition method and related device
CN113469184A (en) * 2021-04-21 2021-10-01 华东师范大学 Character recognition method for handwritten Chinese based on multi-modal data

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109492679A (en) * 2018-10-24 2019-03-19 杭州电子科技大学 Based on attention mechanism and the character recognition method for being coupled chronological classification loss
CN109543667A (en) * 2018-11-14 2019-03-29 北京工业大学 A kind of text recognition method based on attention mechanism
CN109919174A (en) * 2019-01-16 2019-06-21 北京大学 A kind of character recognition method based on gate cascade attention mechanism
US20190279035A1 (en) * 2016-04-11 2019-09-12 A2Ia S.A.S. Systems and methods for recognizing characters in digitized documents

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190279035A1 (en) * 2016-04-11 2019-09-12 A2Ia S.A.S. Systems and methods for recognizing characters in digitized documents
CN109492679A (en) * 2018-10-24 2019-03-19 杭州电子科技大学 Based on attention mechanism and the character recognition method for being coupled chronological classification loss
CN109543667A (en) * 2018-11-14 2019-03-29 北京工业大学 A kind of text recognition method based on attention mechanism
CN109919174A (en) * 2019-01-16 2019-06-21 北京大学 A kind of character recognition method based on gate cascade attention mechanism

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
刘衍平: "基于深度学习的端到端场景文本识别方法研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
姚义等: "基于深度学习的结构化图像标注研究", 《电脑知识与技术》 *
浦世亮等: "基于注意力矫正的自然场景文字识别", 《中国公共安全》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112418197A (en) * 2021-01-22 2021-02-26 北京世纪好未来教育科技有限公司 Simplified image acquisition model training method, image text recognition method and related device
CN113469184A (en) * 2021-04-21 2021-10-01 华东师范大学 Character recognition method for handwritten Chinese based on multi-modal data
CN113469184B (en) * 2021-04-21 2022-08-12 华东师范大学 Character recognition method for handwritten Chinese character based on multi-mode data

Similar Documents

Publication Publication Date Title
CN111723585B (en) Style-controllable image text real-time translation and conversion method
CN110059694B (en) Intelligent identification method for character data in complex scene of power industry
US20190180154A1 (en) Text recognition using artificial intelligence
US5956419A (en) Unsupervised training of character templates using unsegmented samples
CN109858488B (en) Handwritten sample recognition method and system based on sample enhancement
US20230237841A1 (en) Occlusion Detection
CN110969129B (en) End-to-end tax bill text detection and recognition method
CN111950453A (en) Optional-shape text recognition method based on selective attention mechanism
CN109886174A (en) A kind of natural scene character recognition method of warehouse shelf Sign Board Text region
CN112070114B (en) Scene character recognition method and system based on Gaussian constraint attention mechanism network
CN111523622B (en) Method for simulating handwriting by mechanical arm based on characteristic image self-learning
US11915465B2 (en) Apparatus and methods for converting lineless tables into lined tables using generative adversarial networks
CN112580507A (en) Deep learning text character detection method based on image moment correction
CN110889385A (en) Handwritten text recognition method based on local adjacent attention
CN110969154A (en) Text recognition method and device, computer equipment and storage medium
CN110598698A (en) Natural scene text detection method and system based on adaptive regional suggestion network
CN111144411A (en) Method and system for correcting and identifying irregular text based on saliency map
CN112819840A (en) High-precision image instance segmentation method integrating deep learning and traditional processing
CN115082922A (en) Water meter digital picture processing method and system based on deep learning
CN111274863A (en) Text prediction method based on text peak probability density
CN111814508A (en) Character recognition method, system and equipment
US20230110558A1 (en) Systems and methods for detecting objects
CN113379001B (en) Processing method and device for image recognition model
CN116152824A (en) Invoice information extraction method and system
CN115690811A (en) Lattice character recognition method based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200317