CN111967470A - Text recognition method and system based on decoupling attention mechanism - Google Patents

Text recognition method and system based on decoupling attention mechanism Download PDF

Info

Publication number
CN111967470A
CN111967470A CN202010841738.6A CN202010841738A CN111967470A CN 111967470 A CN111967470 A CN 111967470A CN 202010841738 A CN202010841738 A CN 202010841738A CN 111967470 A CN111967470 A CN 111967470A
Authority
CN
China
Prior art keywords
text
neural network
layer
image
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010841738.6A
Other languages
Chinese (zh)
Inventor
朱远志
金连文
王天玮
陈晓雪
罗灿杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Zhuhai Institute of Modern Industrial Innovation of South China University of Technology
Original Assignee
South China University of Technology SCUT
Zhuhai Institute of Modern Industrial Innovation of South China University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT, Zhuhai Institute of Modern Industrial Innovation of South China University of Technology filed Critical South China University of Technology SCUT
Priority to CN202010841738.6A priority Critical patent/CN111967470A/en
Publication of CN111967470A publication Critical patent/CN111967470A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • G06V20/63Scene text, e.g. street names
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Image Analysis (AREA)
  • Character Discrimination (AREA)

Abstract

The invention discloses a text recognition method and system based on a decoupling attention mechanism, which mainly comprise a feature coding module, a convolution alignment module and a text decoding module, wherein the feature coding module extracts visual features from an input image based on a deep convolution neural network; the convolution alignment module replaces a traditional score-based recursive alignment module, multi-scale visual features are extracted from the feature coding module to serve as input, and a full convolution neural network is used for generating an attention diagram channel by channel; the text decoding module obtains a final prediction result by combining the characteristic diagram and the attention diagram through the gated recursion unit, and the method has the advantages of simple realization, high identification precision, effectiveness, flexibility and robustness, excellent performance in various text identification fields such as scene text identification, handwritten text identification and the like, and good practical application value.

Description

Text recognition method and system based on decoupling attention mechanism
Technical Field
The invention belongs to the technical field of pattern recognition and artificial intelligence, and particularly relates to an accurate image recognition method related to a deep neural network.
Background
In recent years, text recognition has attracted the research interest of most scholars. Thanks to deep learning and the study of sequence problems, many text recognition techniques have enjoyed significant success. The connection time classification technique and the attention mechanism technique are two popular methods for solving the sequence problem, wherein the attention mechanism technique exhibits more prominent performance and has been widely studied in recent years.
Attention-driven techniques were first proposed to solve the machine translation problem and are increasingly being used to address the scene text recognition problem. Since then, attention-based technologies have dominated a portion of the development in the field of text recognition. Attention-based techniques in text recognition are used to align and recognize characters. In the previous work, it was noted that the alignment operation of force mechanism techniques was always combined with the decoding operation. Specifically, the alignment operation of conventional attention-based techniques is implemented using two types of information. One is a feature map, which is visual information obtained by encoding an image by an encoder; the second is historical decoding information, which can be a hidden layer state in the recursive process or an embedded vector of the previous decoding result. The main idea behind the attention mechanism technology is matching, namely, a part of characteristics of the characteristic graph is given, and an attention score is calculated by scoring the matching degree of the part of characteristics and historical decoding information.
Conventional attention-based techniques often face serious alignment problems because the relationship of the alignment and decoding operations together inevitably leads to accumulation and propagation of errors. The alignment operation based on matching is very susceptible to the decoding result, for example, when two similar substrings exist in a string, the attention point of the attention mechanism technology is easily jumped from one substring to another substring through historical decoding information, which is also the reason why the attention mechanism technology is difficult to align long sequences observed in the literature, because the longer sequences are more likely to generate similar substrings. This motivates us to find a way to decouple the alignment operation from the historical decoding information, thereby alleviating this negative impact.
Disclosure of Invention
The invention aims to provide a text recognition method and system based on a decoupling attention mechanism.
In order to achieve the purpose, the invention provides the following scheme:
a text recognition method based on a decoupling attention mechanism comprises the following steps:
s1, extracting image features according to the text image and coding to obtain a feature map;
s2, aligning the feature maps to obtain a target image, constructing a deep convolutional neural network model, processing the target image based on the deep convolutional neural network model to obtain an attention map and training;
s3, performing accurate character recognition on the feature map and the attention map based on the deep convolutional neural network recognition model;
preferably, the text image is a scene text image and/or a handwritten character image;
preferably, the scene text image and/or the handwritten text image are characterized by:
the scene text image characteristics comprise a scene text training data set and a scene text real evaluation data set, and the scene text training data set and the scene text real evaluation data set cover various different font styles, light and shadow changes and resolution changes;
the handwritten text image characteristics comprise a handwritten text real training data set and a handwritten text real evaluation data set, and the handwritten text real training data set and the handwritten text real evaluation data set contain different writing styles;
preferably, the scene text image training data set has a complete text part and occupies more than two thirds of the image area, and comprises a plurality of different font styles, so that the scene text image training data set is allowed to cover light and shadow changes and resolution changes;
preferably, the scene text truth evaluation data set is obtained by shooting through a mobile phone and a special hardware camera device, in the shooting process, the text in the normalized scene text image occupies more than two thirds of the image area, inclination and blur are allowed to exist, and the shot scene text image covers application scenes with different font styles;
preferably, the real training data and the real evaluation data of the handwritten text are written and collected by different people respectively, and the training data and the evaluation data have independence.
Preferably, the text image alignment processing method includes:
stretching the scene text training data set and the scene text real evaluation data set image data to be converted into a uniform size;
and scaling the handwritten text real training data set and the handwritten text real evaluation data set in a way of keeping the original image proportion, and filling the periphery until the sizes are unified.
Preferably, the deep convolutional neural network construction method comprises the following steps:
extracting multi-scale visual features based on the feature codes;
carrying out convolution and deconvolution through a full convolution neural network to construct a deep convolution neural network model;
a deconvolution stage, each output feature is added by the corresponding feature mapping of the convolution stage;
the convolution process is down sampling, the deconvolution process is up sampling, except the last deconvolution process, a nonlinear layer is connected after all the convolution and deconvolution processes are finished, and a ReLu function is used;
preferably, the network structure of the deep convolutional neural network model is an input layer, a convolutional layer and a residual error layer;
preferably, the residual layer is divided into a first convolution layer, a first batch of normalization layers, a first nonlinear layer, a second convolution layer, a second batch of normalization layers, a down-sampling layer, and a second nonlinear layer.
Preferably, in the training of the deep convolutional neural network model in S2, a back propagation algorithm is adopted, and all parameters of the network model are updated by calculating a transfer gradient from the last layer and transferring layer by layer;
preferably, the deep convolutional neural network model training strategy adopts a supervision mode: training a universal deep network recognition model by using text image data and corresponding labeling information;
preferably, the input image of the deep convolutional neural network model is a handwritten text image and/or a scene text image, and the output is a character sequence in the text image and/or the scene text image.
Preferably, the parameters of the deep convolutional neural network model training are set as follows:
the number of iterations of the deep convolutional neural network is 1,000,000;
the deep convolutional neural network optimizer is Adadelta;
the deep convolutional neural network learning rate is 1.0;
deep convolutional neural network learning rate updating strategy: the reduction is one tenth of the original at 50% and 75% of the total number of iterations, respectively.
Preferably, the specific method for recognizing the S3 text is as follows:
Fx,yrepresents said characteristic diagram, αt,x,yThe attention map representing the t time obtained by convolution alignment is calculated by equation (1) to obtain a semantic vector ct
Figure BDA0002641686740000031
Wherein W and H are the width and height of the characteristic diagram, and at the time t,
output ytComprises the following steps: y ist=Woht+bo, (2),
Wherein, WoAnd boIs a parameter, htRepresenting the hidden layer state of the gated recursion unit at the time t;
htthe way in which (a) is calculated is expressed as,
ht=GRU((et-l,ct),ht-1), (3),
etrepresenting the last bit output yt-1The encoded vector of (1); the final Loss function Loss is calculated as follows,
Figure BDA0002641686740000041
where θ represents all learnable parameters of the deep neural network model, gtRepresenting the sample tag value at time t.
A text recognition system based on a decoupling attention mechanism comprises a feature coding module, a convolution alignment module and a text decoding module,
the feature coding module extracts visual features from the text image based on a deep convolutional neural network;
the convolution alignment module extracts multi-scale visual features from the feature coding module and generates an attention diagram channel by channel through a deep convolution neural network;
and the text decoding module combines the feature map and the attention map by a gated recursion unit to obtain a final prediction result.
Preferably, the network structure of the deep convolutional neural network unit is an input layer unit, a convolutional layer unit and a residual layer unit;
preferably, the residual layer unit is divided into a first convolution layer unit, a first batch of normalization layer units, a first nonlinear layer unit, a second convolution layer unit, a second batch of normalization layer units, a down-sampling layer unit and a second nonlinear layer unit;
preferably, the nonlinear layer units in the residual layer unit all adopt a ReLU activation function;
preferably, the downsampling layer unit is implemented by the convolutional layer unit and the batch normalization layer unit.
The invention has the technical effects that:
(1) the present invention decouples the conventional attention mechanism modules. Compared with the traditional attention mechanism technology, the method and the device do not need to align the information fed back in the decoding stage, and avoid accumulation and propagation of decoding errors, so that the identification accuracy is higher.
(2) The method is simple to use, can be easily embedded into other models, is very flexible, and can be freely converted in one-dimensional texts and two-dimensional texts.
(3) And a back propagation algorithm is adopted, and the convolution kernel parameters are automatically adjusted, so that a more robust filter is obtained, and the filter can adapt to various complex environments.
(4) Compared with a manual mode, the method and the device can automatically finish the recognition of the scene text and the handwritten text, and can save manpower and material resources.
(5) The invention can provide more reliable alignment performance for the attention mechanism through the decoupling attention algorithm, and particularly has more robust characteristic compared with the traditional attention mechanism when facing long texts.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
FIG. 1 is a block diagram of the deep convolutional network recognition model structure of the present invention.
FIG. 2 is a flow chart of a text recognition method based on a decoupling attention mechanism according to the invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1: as shown in fig. 1, the text recognition system based on the decoupling attention mechanism includes a feature encoding module, a convolution alignment module and a text decoding module;
the feature coding module extracts visual features from the text image based on a deep convolutional neural network;
the convolution alignment module extracts multi-scale visual features from the feature coding module and generates an attention diagram channel by channel through a deep convolution neural network;
and the text decoding module combines the feature map and the attention map by a gated recursion unit to obtain a final prediction result.
As shown in fig. 2, the text recognition method based on the decoupling attention mechanism specifically includes the following steps:
firstly, carrying out feature extraction coding on a scene text image and/or a handwritten character image through a feature coding module to form a feature map;
the scene text image characteristics comprise a scene text training data set and a scene text real evaluation data set, wherein the scene text training data set and the scene text real evaluation data set cover various different font styles, light and shadow changes and resolution changes;
the handwritten text image characteristics comprise a handwritten text real training data set and a handwritten text real evaluation data set, and the handwritten text real training data set and the handwritten text real evaluation data set contain different writing styles;
the scene text image training data has a complete text part occupying more than two thirds of the area of the image, comprises various different font styles and is allowed to cover certain degree of light and shadow change and resolution change;
the real evaluation data set of the scene text is obtained by shooting through camera equipment such as a mobile phone and special hardware, in the shooting process, the text in the normalized scene text image occupies more than two thirds of the image area, certain inclination and fuzziness are allowed to exist, and the shot scene text image covers application scenes with different font styles;
the real training data and the real evaluation data of the handwritten text are written by different people and collected respectively, and the training data and the evaluation data have independence;
secondly, performing convolution alignment on the scene text image and/or the handwritten character image through a convolution alignment module, wherein the structure of the convolution alignment module is shown in table 1:
stretching image data of the scene text training data set and the scene text real evaluation data set to be in a uniform size;
scaling the handwritten text real training data set and the handwritten text real evaluation data set to keep the original image proportion, and filling the surroundings until the sizes are unified;
TABLE 1
Figure BDA0002641686740000061
The deep convolutional neural network is constructed and trained as shown in table 2, and the construction method of the deep convolutional neural network comprises the following steps: based on a convolution neural network, extracting visual features from the scene text image and/or the handwritten character image, extracting multi-scale visual features from a feature coding module as input, performing convolution and deconvolution through a full convolution neural network, wherein in a deconvolution stage, each output feature is added by corresponding feature mapping of the convolution stage, the convolution process is downsampling, the deconvolution process is upsampling, except for the last deconvolution process, a nonlinear layer is connected after all the convolution and deconvolution processes are finished, and a ReLu function is used; the number of output channels of the last deconvolution layer is maxT, different values are determined according to different text types, wherein a scene text is 25, a handwritten text is 150, and the last nonlinear layer adopts a Sigmoid function to keep an output attention diagram between 0 and 1; in the deep neural network model training, a back propagation algorithm is adopted, and all parameters of the network model are updated by calculating a transfer gradient from the last layer and transferring layer by layer;
TABLE 2
Figure BDA0002641686740000071
TABLE 3
Figure BDA0002641686740000081
As shown in table 3, the residual layer is divided into a first convolution layer, a first batch of normalization layers, a first nonlinear layer, a second convolution layer, a second batch of normalization layers, a down-sampling layer, and a second nonlinear layer;
nonlinear layers in the residual error layer all adopt a ReLU activation function;
the down-sampling layer is realized by a convolution layer and a batch normalization layer;
the deep neural network model training strategy adopts a supervision mode: training a universal deep network recognition model by using text image data and corresponding labeling information;
the input image of the deep neural network model is a handwritten text image and/or a scene text image, and the input image is output as a character sequence in the text image and/or the scene text image;
the parameters of the deep neural network model training are set as follows:
the number of iterations of the deep neural network is 1,000,000;
the deep neural network optimizer is Adadelta;
the deep neural network learning rate is 1.0;
deep neural network learning rate updating strategy: the reduction is one tenth of the original at 50% and 75% of the total number of iterations, respectively.
Thirdly, character recognition is carried out on the feature graph and the attention map through a character recognition module, the feature graph and the attention map are input, and accurate recognition is carried out on the image based on a depth network recognition model of a decoupling attention mechanism;
the specific method for carrying out character recognition comprises the following steps:
Fx,yrepresents a characteristic diagram, αt,x,yAn attention map indicating time t obtained by convolution alignment is calculated by equation (1) to obtain a semantic vector ct
Figure BDA0002641686740000091
Wherein W and H are the width and height of the characteristic diagram, and at the time t,
output ytComprises the following steps: y ist=Woht+bo, (2),
Wherein, WoAnd boIs a parameter, htRepresenting the hidden layer state of the gated recursion unit at the time t;
htthe way in which (a) is calculated is expressed as,
ht=GRU((et-l,ct),ht-l), (3),
etrepresenting the last bit output yt-1The encoded vector of (1); the final Loss function Loss is calculated as follows,
Figure BDA0002641686740000092
where θ represents all learnable parameters of the deep neural network model, gtA sample tag value representing time t;
inputting a text image, and accurately identifying the image based on a depth network identification model of a decoupling attention mechanism to obtain characters in the text image.
The above-described embodiments are merely illustrative of the preferred embodiments of the present invention, and do not limit the scope of the present invention, and various modifications and improvements of the technical solutions of the present invention may be made by those skilled in the art without departing from the spirit of the present invention, which is defined by the claims.

Claims (10)

1. A text recognition method based on a decoupling attention mechanism is characterized by comprising the following steps:
s1, extracting image features according to the text image and coding to obtain a feature map;
s2, aligning the feature maps to obtain a target image, constructing a deep convolutional neural network model, processing the target image based on the deep convolutional neural network model to obtain an attention map and training;
and S3, performing accurate character recognition on the feature map and the attention map based on the deep convolutional neural network recognition model.
2. The text recognition method based on the decoupling attention mechanism as claimed in claim 1, wherein:
the text image is a scene text image and/or a handwritten character image;
the scene text image and/or the handwritten character image are characterized in that:
the scene text image characteristics comprise a scene text training data set and a scene text real evaluation data set, and the scene text training data set and the scene text real evaluation data set cover various different font styles, light and shadow changes and resolution changes;
the handwritten text image features comprise a handwritten text real training data set and a handwritten text real evaluating data set, and the handwritten text real training data set and the handwritten text real evaluating data set contain different writing styles.
3. The text recognition method based on the decoupling attention mechanism as claimed in claim 2, wherein:
in the scene text training data set, the text part is complete and occupies more than two thirds of the image area, and the scene text training data set comprises a plurality of different font styles and is allowed to cover light and shadow changes and resolution changes;
the scene text real evaluation data set is obtained by shooting through a mobile phone and a special hardware camera device, in the shooting process, the text in the normalized scene text image occupies more than two thirds of the image area, inclination and fuzziness are allowed to exist, and the shot scene text image covers application scenes with various different font styles;
the handwritten text real training data set and the handwritten text real evaluation data set are respectively written by different people and collected, and the training data and the evaluation data have independence.
4. The text recognition method based on the decoupling attention mechanism as claimed in claim 2, wherein:
the text image alignment processing method comprises the following steps:
stretching the scene text training data set and the scene text real evaluation data set image data to be converted into a uniform size;
and scaling the handwritten text real training data set and the handwritten text real evaluation data set in proportion of original images, and filling surroundings until the sizes are unified.
5. The text recognition method based on the decoupling attention mechanism as claimed in claim 1, wherein:
in S2, the method for constructing the deep convolutional neural network includes:
extracting multi-scale visual features based on the feature codes;
carrying out convolution and deconvolution through a full convolution neural network to construct the deep convolution neural network model;
said deconvolution stage, each of said output features being summed by a corresponding feature map of said convolution stage;
the convolution process is down sampling, the deconvolution process is up sampling, except the last deconvolution process, all convolution and the deconvolution processes are all followed by a nonlinear layer by using a ReLu function;
the network structure of the deep convolutional neural network model is an input layer, a convolutional layer and a residual error layer;
the residual error layer is divided into a first convolution layer, a first batch of normalization layers, a first nonlinear layer, a second convolution layer, a second batch of normalization layers, a down-sampling layer and a second nonlinear layer.
6. The text recognition method based on the decoupling attention mechanism as claimed in claim 1, wherein:
in the training of the deep convolutional neural network model in the S2, a back propagation algorithm is adopted, and all parameters of the network model are updated by calculating a transfer gradient from the last layer and transferring layer by layer;
the deep convolutional neural network model training strategy adopts a supervision mode: training a universal deep network recognition model by using text image data and corresponding labeling information;
and the input image of the deep convolutional neural network model is the handwritten text image and/or the scene text image, and the input image is output as a character sequence in the text image and/or the scene text image.
7. The text recognition method based on the decoupling attention mechanism as claimed in claim 6, wherein:
the parameters of the deep convolutional neural network model training are set as follows:
the number of iterations of the deep convolutional neural network is 1,000,000;
the deep convolutional neural network optimizer is Adadelta;
the deep convolutional neural network learning rate is 1.0;
the deep convolutional neural network learning rate updating strategy comprises the following steps: the reduction is one tenth of the original at 50% and 75% of the total number of iterations, respectively.
8. The text recognition method based on the decoupling attention mechanism as claimed in claim 1, wherein:
the specific method for recognizing the S3 characters comprises the following steps:
Fx,yrepresents said characteristic diagram, αt,x,yThe attention map representing the t time obtained by convolution alignment is calculated by equation (1) to obtain a semantic vector ct
Figure FDA0002641686730000031
Wherein W and H are the width and height of the characteristic diagram, and at the time t,
output ytComprises the following steps: y ist=Woht+bo, (2),
Wherein, WoAnd boIs a parameter, htRepresenting the hidden layer state of the gated recursion unit at the time t;
htthe way in which (a) is calculated is expressed as,
ht=GRU((et-1,ct),ht-1), (3),
etrepresenting the last bit output yt-1The encoded vector of (1); the final Loss function Loss is calculated as follows,
Figure FDA0002641686730000032
where θ represents all learnable parameters of the deep neural network model, gtRepresenting the sample tag value at time t.
9. A text recognition system based on a decoupling attention mechanism is characterized by comprising a feature coding module, a convolution alignment module and a text decoding module,
the feature coding module extracts visual features from the text image based on a deep convolutional neural network;
the convolution alignment module extracts multi-scale visual features from the feature coding module and generates an attention diagram channel by channel through a deep convolution neural network;
the text decoding module combines the feature map and the attention map to obtain a final prediction result through a gated recursion unit.
10. The system for text recognition based on a decoupled attention mechanism of claim 9,
the network structure of the deep convolutional neural network unit comprises an input layer unit, a convolutional layer unit and a residual error layer unit;
the residual layer unit is divided into a first convolution layer unit, a first batch of normalization layer units, a first nonlinear layer unit, a second convolution layer unit, a second batch of normalization layer units, a down-sampling layer unit and a second nonlinear layer unit;
the nonlinear layer units in the residual layer units all adopt ReLU activation functions;
the down-sampling layer unit is realized by the convolution layer unit and the batch normalization layer unit.
CN202010841738.6A 2020-08-20 2020-08-20 Text recognition method and system based on decoupling attention mechanism Pending CN111967470A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010841738.6A CN111967470A (en) 2020-08-20 2020-08-20 Text recognition method and system based on decoupling attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010841738.6A CN111967470A (en) 2020-08-20 2020-08-20 Text recognition method and system based on decoupling attention mechanism

Publications (1)

Publication Number Publication Date
CN111967470A true CN111967470A (en) 2020-11-20

Family

ID=73387925

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010841738.6A Pending CN111967470A (en) 2020-08-20 2020-08-20 Text recognition method and system based on decoupling attention mechanism

Country Status (1)

Country Link
CN (1) CN111967470A (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112580738A (en) * 2020-12-25 2021-03-30 特赞(上海)信息科技有限公司 AttentionOCR text recognition method and device based on improvement
CN112597925A (en) * 2020-12-28 2021-04-02 作业帮教育科技(北京)有限公司 Handwritten handwriting recognition/extraction and erasing method, handwritten handwriting erasing system and electronic equipment
CN112686345A (en) * 2020-12-31 2021-04-20 江南大学 Off-line English handwriting recognition method based on attention mechanism
CN112686219A (en) * 2021-03-11 2021-04-20 北京世纪好未来教育科技有限公司 Handwritten text recognition method and computer storage medium
CN112733830A (en) * 2020-12-31 2021-04-30 上海芯翌智能科技有限公司 Shop signboard identification method and device, storage medium and computer equipment
CN113052175A (en) * 2021-03-26 2021-06-29 北京百度网讯科技有限公司 Target detection method and device, electronic equipment and readable storage medium
CN113065550A (en) * 2021-03-12 2021-07-02 国网河北省电力有限公司 Text recognition method based on self-attention mechanism
CN113158776A (en) * 2021-03-08 2021-07-23 国网河北省电力有限公司 Invoice text recognition method and device based on coding and decoding structure
CN113240056A (en) * 2021-07-12 2021-08-10 北京百度网讯科技有限公司 Multi-mode data joint learning model training method and device
CN113705730A (en) * 2021-09-24 2021-11-26 江苏城乡建设职业学院 Handwriting equation image recognition method based on convolution attention and label sampling
CN113807340A (en) * 2021-09-07 2021-12-17 南京信息工程大学 Method for recognizing irregular natural scene text based on attention mechanism
CN114170468A (en) * 2022-02-14 2022-03-11 阿里巴巴达摩院(杭州)科技有限公司 Text recognition method, storage medium and computer terminal
RU2768211C1 (en) * 2020-11-23 2022-03-23 Общество с ограниченной ответственностью "Аби Продакшн" Optical character recognition by means of combination of neural network models
CN114548067A (en) * 2022-01-14 2022-05-27 哈尔滨工业大学(深圳) Multi-modal named entity recognition method based on template and related equipment
CN117934974A (en) * 2024-03-21 2024-04-26 中国科学技术大学 Scene text task processing method, system, equipment and storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110717336A (en) * 2019-09-23 2020-01-21 华南理工大学 Scene text recognition method based on semantic relevance prediction and attention decoding

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110717336A (en) * 2019-09-23 2020-01-21 华南理工大学 Scene text recognition method based on semantic relevance prediction and attention decoding

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王天玮等: "Decoupled Attention Network for Text Recognition", 《34TH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE》, pages 12216 - 12224 *

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
RU2768211C1 (en) * 2020-11-23 2022-03-23 Общество с ограниченной ответственностью "Аби Продакшн" Optical character recognition by means of combination of neural network models
US11568140B2 (en) 2020-11-23 2023-01-31 Abbyy Development Inc. Optical character recognition using a combination of neural network models
CN112580738A (en) * 2020-12-25 2021-03-30 特赞(上海)信息科技有限公司 AttentionOCR text recognition method and device based on improvement
CN112580738B (en) * 2020-12-25 2021-07-23 特赞(上海)信息科技有限公司 AttentionOCR text recognition method and device based on improvement
CN112597925A (en) * 2020-12-28 2021-04-02 作业帮教育科技(北京)有限公司 Handwritten handwriting recognition/extraction and erasing method, handwritten handwriting erasing system and electronic equipment
CN112597925B (en) * 2020-12-28 2023-08-29 北京百舸飞驰科技有限公司 Handwriting recognition/extraction and erasure method, handwriting recognition/extraction and erasure system and electronic equipment
CN112686345A (en) * 2020-12-31 2021-04-20 江南大学 Off-line English handwriting recognition method based on attention mechanism
CN112733830A (en) * 2020-12-31 2021-04-30 上海芯翌智能科技有限公司 Shop signboard identification method and device, storage medium and computer equipment
CN112686345B (en) * 2020-12-31 2024-03-15 江南大学 Offline English handwriting recognition method based on attention mechanism
CN113158776A (en) * 2021-03-08 2021-07-23 国网河北省电力有限公司 Invoice text recognition method and device based on coding and decoding structure
CN113158776B (en) * 2021-03-08 2022-11-11 国网河北省电力有限公司 Invoice text recognition method and device based on coding and decoding structure
CN112686219A (en) * 2021-03-11 2021-04-20 北京世纪好未来教育科技有限公司 Handwritten text recognition method and computer storage medium
CN113065550A (en) * 2021-03-12 2021-07-02 国网河北省电力有限公司 Text recognition method based on self-attention mechanism
CN113052175B (en) * 2021-03-26 2024-03-29 北京百度网讯科技有限公司 Target detection method, target detection device, electronic equipment and readable storage medium
CN113052175A (en) * 2021-03-26 2021-06-29 北京百度网讯科技有限公司 Target detection method and device, electronic equipment and readable storage medium
CN113240056A (en) * 2021-07-12 2021-08-10 北京百度网讯科技有限公司 Multi-mode data joint learning model training method and device
CN113807340A (en) * 2021-09-07 2021-12-17 南京信息工程大学 Method for recognizing irregular natural scene text based on attention mechanism
CN113807340B (en) * 2021-09-07 2024-03-15 南京信息工程大学 Attention mechanism-based irregular natural scene text recognition method
CN113705730A (en) * 2021-09-24 2021-11-26 江苏城乡建设职业学院 Handwriting equation image recognition method based on convolution attention and label sampling
CN114548067B (en) * 2022-01-14 2023-04-18 哈尔滨工业大学(深圳) Template-based multi-modal named entity recognition method and related equipment
CN114548067A (en) * 2022-01-14 2022-05-27 哈尔滨工业大学(深圳) Multi-modal named entity recognition method based on template and related equipment
CN114170468B (en) * 2022-02-14 2022-05-31 阿里巴巴达摩院(杭州)科技有限公司 Text recognition method, storage medium and computer terminal
CN114170468A (en) * 2022-02-14 2022-03-11 阿里巴巴达摩院(杭州)科技有限公司 Text recognition method, storage medium and computer terminal
CN117934974A (en) * 2024-03-21 2024-04-26 中国科学技术大学 Scene text task processing method, system, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111967470A (en) Text recognition method and system based on decoupling attention mechanism
CN109543667B (en) Text recognition method based on attention mechanism
CN109558832B (en) Human body posture detection method, device, equipment and storage medium
CN109726657B (en) Deep learning scene text sequence recognition method
CN112733822B (en) End-to-end text detection and identification method
CN114187450A (en) Remote sensing image semantic segmentation method based on deep learning
CN113780149A (en) Method for efficiently extracting building target of remote sensing image based on attention mechanism
CN114596500B (en) Remote sensing image semantic segmentation method based on channel-space attention and DeeplabV plus
CN108898138A (en) Scene text recognition methods based on deep learning
CN109635726B (en) Landslide identification method based on combination of symmetric deep network and multi-scale pooling
CN111428727B (en) Natural scene text recognition method based on sequence transformation correction and attention mechanism
CN109934272B (en) Image matching method based on full convolution network
CN111310766A (en) License plate identification method based on coding and decoding and two-dimensional attention mechanism
CN111639564A (en) Video pedestrian re-identification method based on multi-attention heterogeneous network
CN113011288A (en) Mask RCNN algorithm-based remote sensing building detection method
CN110969089A (en) Lightweight face recognition system and recognition method under noise environment
CN111079514A (en) Face recognition method based on CLBP and convolutional neural network
CN111985332A (en) Gait recognition method for improving loss function based on deep learning
CN116524189A (en) High-resolution remote sensing image semantic segmentation method based on coding and decoding indexing edge characterization
CN114581905A (en) Scene text recognition method and system based on semantic enhancement mechanism
CN117079288B (en) Method and model for extracting key information for recognizing Chinese semantics in scene
AU2021104479A4 (en) Text recognition method and system based on decoupled attention mechanism
CN116758621A (en) Self-attention mechanism-based face expression depth convolution identification method for shielding people
CN117058437A (en) Flower classification method, system, equipment and medium based on knowledge distillation
CN116630610A (en) ROI region extraction method based on semantic segmentation model and conditional random field

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20201120