CN113807340B - Attention mechanism-based irregular natural scene text recognition method - Google Patents

Attention mechanism-based irregular natural scene text recognition method Download PDF

Info

Publication number
CN113807340B
CN113807340B CN202111043808.4A CN202111043808A CN113807340B CN 113807340 B CN113807340 B CN 113807340B CN 202111043808 A CN202111043808 A CN 202111043808A CN 113807340 B CN113807340 B CN 113807340B
Authority
CN
China
Prior art keywords
attention
visual
feature
layer
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111043808.4A
Other languages
Chinese (zh)
Other versions
CN113807340A (en
Inventor
孙亚杰
曹小玲
孙莹莹
董方怡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Information Science and Technology
Original Assignee
Nanjing University of Information Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Information Science and Technology filed Critical Nanjing University of Information Science and Technology
Priority to CN202111043808.4A priority Critical patent/CN113807340B/en
Publication of CN113807340A publication Critical patent/CN113807340A/en
Application granted granted Critical
Publication of CN113807340B publication Critical patent/CN113807340B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an irregular natural scene text recognition method based on an attention mechanism, wherein a natural scene text image correction module positions the shape of a text region and corrects the text region into a regular text image; the feature extraction module extracts visual feature graphs with different scales; the attention mechanism alignment module aligns the visual feature images with different scales by using a full convolution neural network to obtain visual attention feature images; the context feature space module is used for selecting the obtained visual attention feature map through context to obtain a context feature map of the image, and then connecting the visual attention feature map of the image with the context feature map to obtain a new feature space; the attention mechanism sequence recognition module uses the LSTM attention decoder to simultaneously decode the obtained context feature space to obtain a recognition result. The method and the device can improve the recognition effect of the irregular scene text, the recognition accuracy is not affected by nearby text and background noise, and the application scenes of character recognition are increased.

Description

Attention mechanism-based irregular natural scene text recognition method
Technical Field
The invention relates to a natural scene text recognition method, in particular to an irregular natural scene text recognition method based on an attention mechanism, and belongs to the technical field of pattern recognition and artificial intelligence.
Background
With the development of informatization technology, artificial intelligence is a current research hotspot, and natural scene text recognition is a part of the artificial intelligence technology, so that high importance is placed on researchers. Many natural scene text recognition techniques have achieved significant results at present, particularly in terms of regular scene text recognition, thanks to the rapid development of deep learning. But scene text images are often affected by shooting conditions, resulting in uneven quality of the scene text images, such as curved text, perspective text, noise, etc., which can affect the accuracy of recognition. In order to solve the problem of irregular scene text recognition, in recent years, there have been many research teams proposed to correct an original text image into an image having regular text by using a text correction model. The corrected image is prone to introducing new noise that can interfere with the accuracy of text recognition. In addition, the method using the attention mechanism has a significant influence on the field of natural scene recognition. However, most attention methods encounter alignment problems due to repeated use of historical decoding information.
The existing irregular scene text recognition technology does not solve the problems of newly added noise interference and attention alignment,
disclosure of Invention
Aiming at the defects, the invention provides an irregular scene text recognition method based on an attention mechanism, which aims to solve the problems in the background technology.
The invention is realized by the following technical scheme:
an irregular natural scene text recognition method based on an attention mechanism is characterized by comprising the following steps of:
(1) Positioning the shape of the text region by using a natural scene text image correction module, and correcting an irregular natural scene text image into a regular text image;
(2) Introducing a space-channel mixed attention mechanism into ResNet to construct a feature extraction module, and extracting visual feature graphs with different scales by using the feature extraction module;
(3) Using a full convolution neural network to align visual feature graphs of different scales to obtain a visual attention map; multiplying the visual feature map and the visual attention map to obtain a visual attention feature map;
(4) Selecting the obtained visual attention feature map through a double-layer BiLSTM context to obtain a context feature map of the image, and then connecting the visual attention feature map with the context feature map to obtain a new feature space D, wherein the feature space D comprises visual features and context features of the image;
(5) The recognition result is obtained by decoding the feature space D using an LSTM attention decoder.
Optionally, the specific process of step (1) is as follows:
(11) Constructing a positioning network, acquiring the shape of a text region, and positioning a datum point C of the upper edge and the lower edge; the positioning network includes 4 convolutional layers followed by 1 batch normalization layer and 2 maximum pooling layers; the positioning network adopts a Relu activation function;
(12) Calculating TPS transformation parameters by using the datum point C at a grid generator to obtain a sampling grid on a text image;
(13) And inputting the sampling grid and the original image into a sampler, and sampling the grid points on the original image to obtain a corrected image.
Alternatively, the positioning network, grid generator and sampler may all be micro, with the natural scene text image correction module following back propagation to update network parameters.
Optionally, the specific process of step (2) is as follows:
(21) Extracting channel attention map M based on channel attention mechanism c The method comprises the steps of carrying out a first treatment on the surface of the The channel attention mechanism comprises 1 maximum pooling layer, 1 average pooling layer and a multi-layer perceptron, and the activation function is sigmoid; the intermediate feature diagram F is used as the input of the maximum pooling layer and the average pooling layer respectively, the output obtained by the two pooling layers is forwarded to the multi-layer perceptron respectively, and finally the channel attention diagram M is extracted c
M c (F) =σ (MLP (AvgPool (F))+mlp (MaxPool (F))) formula (1)
Wherein F represents an intermediate feature map; avgpool is average pooling; maxPool is max pooling; MLP is a multi-layer perceptron; sigma represents a sigmoid activation function;
(22) Multiplying the channel attention map obtained in step (21) with the intermediate feature map to obtain F':
(23) Obtaining a spatial attention map M based on a spatial attention mechanism s The method comprises the steps of carrying out a first treatment on the surface of the The spatial attention mechanism comprises 1 maximum pooling layer, 1 average pooling layer and 1 convolution layer, F' obtained in the step (22) is taken as input to obtain maximum pooling characteristics and average pooling characteristics, the maximum pooling characteristics and the average pooling characteristics are integrated through the convolution layer, and finally the spatial attention force map M is obtained s
M s (F′)=σ(f 7×7 ([AvgPool(F′);MaxPool(F′)]) Arbitrary (3)
Wherein f 7×7 Convolution operation with a filter size of 7 x 7; sigma represents the Relu activation function;
(24) Multiplying the output of step (23) by said F 'to yield F':
(25) Adding the inputs x and F' of the overall spatial-channel mixed attention mechanism together to the Relu activation function yields the visual feature map F of the output v
F v =σ (F "+x) (5)
Wherein σ represents the Relu activation function.
Optionally, the specific process of step (3) is as follows:
the method comprises the steps of coding feature graphs with different sizes by utilizing a downsampling method in a convolution process, wherein the convolution process comprises convolution layers with the same layer number and deconvolution layers, the sizes of output of each layer of convolution layers are different, and the output of each layer of deconvolution layer is added with the output of the convolution layer with the corresponding size to be used as the input of the next deconvolution layer; finally, activating the Relu function to obtain a visual attention diagram; f (F) v Representing visual feature map, A att Representing a visual attention map obtained by attention alignment, a visual attention profile V is obtained by the following formula:
optionally, for step (4), using two layers of BiLSTM to output a context feature map H on the visual feature map, and combining the context feature map H and the visual attention feature map V to obtain a new feature space d= (V, H);
optionally, the following is implemented for step (5):
the predicted output of the encoder at time t is y t
y t =softmax(W o h t +b o ) (7)
Wherein W is o And b o To learn parameters, h t Represents the hidden state of LSTM at time t; softmax is a normalized exponential function; h is a t The calculation mode of (a) is expressed as follows:
h t =LSTM(y t-1 ,c t ,h t-1 ) (8)
Wherein y is t-1 Representing the prediction at time t-1. c t Representing semantic vectors, h t-1 Represents the hidden state of LSTM at time t-1; LSTM is long-short-term memory network
The final Loss function Loss is calculated as follows:
wherein X is i Representing a training picture; y is Y i Representing a predictive label;
and constructing a deep convolution network model according to the content, and sending the training set into the network model for training until the network model reaches convergence.
Optionally, the training of the deep convolutional network model is set as follows:
the epoch of the deep convolution network model is 10;
the optimizer of the deep convolution network model is Adadelta;
the learning rate of the deep convolution network model is 0.1;
the number of pictures read in each batch of the depth convolution network model is 64;
the parameter initialization mode of the deep convolution network model is Kaiming initialization.
The beneficial effects brought by adopting the technical scheme are as follows:
(1) A text image correction module is introduced to improve the recognition effect of the irregular scene text;
(2) Introducing a channel-space attention mechanism so that recognition accuracy is not affected by nearby text and background noise;
drawings
FIG. 1 is a network block diagram of irregular natural scene text recognition based on an attention mechanism;
FIG. 2 is a flow chart of an irregular natural scene text recognition method based on an attention mechanism of the present invention;
fig. 3 is a network configuration diagram of the feature extractor.
Detailed Description
Other advantages and effects of the present invention will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present invention with reference to specific examples. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention. It should be noted that the illustrations provided in the following embodiments merely illustrate the basic idea of the present invention by way of illustration, and the following embodiments and features in the embodiments may be combined with each other without conflict.
Wherein the drawings are for illustrative purposes only and are shown in schematic, non-physical, and not intended to limit the invention; for the purpose of better illustrating embodiments of the invention, certain elements of the drawings may be omitted, enlarged or reduced and do not represent the size of the actual product; it will be appreciated by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The invention provides an irregular natural scene text recognition method based on an attention mechanism, wherein the network structure of the irregular natural scene text recognition method is shown in figure 1, and the irregular natural scene text recognition method comprises a natural scene text image correction module, a feature extraction module, an attention alignment module and a text decoding module;
the natural scene text image correction module positions the shape of the text region and corrects the irregular natural scene text image into a regular text image;
the feature extractor extracts visual feature graphs with different scales;
the attention mechanism alignment model uses a full convolution neural network to align the visual feature images with different scales to obtain an attention force map;
the text decoding module decodes the visual feature map and the attention map simultaneously using an LSTM attention decoder to obtain a recognition result.
As shown in fig. 2, the irregular natural scene text recognition system based on the attention mechanism includes the steps of:
step one: a dataset is prepared, and the dataset is divided into a training dataset and a test dataset.
For the training data set, the present invention selects the synthetic data set SynthText training network. Network performance evaluations were performed on the universal seven test sets, including rule text data set IIIT5K, ICDAR2003, ICDAR2013 and rule text data sets SVT-Perspective, CUTE80, ICDAR2015.
Step two: firstly, an irregular text image I is corrected into an image I' with regular text by using a natural scene text image correction module, and the implementation process is as follows: inputting the image into a positioning network, detecting a text region of the image, and acquiring a group of datum points C of the upper edge and the lower edge of the text; then, the grid generator calculates TPS transformation parameters by using the reference point C, and a grid sampler P= { P on the image I is obtained according to TPS transformation i }. And finally, generating a corrected image I' on the sampler by performing bilinear sample insertion on the pixel points on the grid generator.
Step three: inputting the corrected image I' into a feature extractor to extract visual feature images F with different sizes v . The network structure of the feature extractor is shown in fig. 3, the network is composed of a basic convolution layer and 5 convolution blocks, each convolution block respectively comprises 3, 4, 6 and 3 layers of convolutions, and the input of each convolution block is spliced with a channel-space attention module after being activated by a Relu function to obtain an output sequence, so that the channel information and the space information of feature diagrams at different stages can be well combined.
Step four: the visual feature map is input into an attention mechanism alignment model, the feature maps with different sizes are encoded by using a convolution block, then the features with different sizes are added with the corresponding size features output by the convolution stage by using a deconvolution block, and then the attention map is obtained through the activation of a Relu function. The visual attention profile is multiplied by the visual attention profile.
Step five: the visual feature map extracts the context information via a context selector, which consists of two bilstms, and then connects the context information and the visual attention feature map to obtain a new feature space D.
Step six: the feature space D is input to a text decoder for decoding each character in turn.
And inputting a scene text image, and accurately identifying the image based on an irregular natural scene text identification model of an attention mechanism to obtain characters in the text image.
Finally, it is noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made thereto without departing from the spirit and scope of the present invention, which is intended to be covered by the claims of the present invention.

Claims (4)

1. An irregular natural scene text recognition method based on an attention mechanism is characterized by comprising the following steps of:
(1) Positioning the shape of the text region by using a natural scene text image correction module, and correcting an irregular natural scene text image into a regular text image;
(2) Introducing a space-channel mixed attention mechanism into ResNet to construct a feature extraction module, and using the feature extraction module
Extracting visual feature diagrams with different scales;
(3) Using a full convolution neural network to align visual feature graphs of different scales to obtain a visual attention map; multiplying the visual feature map and the visual attention map to obtain a visual attention feature map;
(4) Selecting the obtained visual attention feature map through a double-layer BiLSTM context to obtain a context feature map of the image, and then connecting the visual attention feature map with the context feature map to obtain a new feature space D, wherein the feature space D comprises visual features and context features of the image;
(5) Decoding the feature space D by using an LSTM attention decoder to obtain a recognition result;
the specific process of the step (1) is as follows:
(11) Constructing a positioning network, acquiring the shape of a text region, and positioning a datum point C of the upper edge and the lower edge; the positioning network comprises 4 convolution layers, wherein the convolution layers are connected with 1 batch normalization layer and 2 maximum pooling layers; the positioning network adopts a Relu activation function;
(12) Calculating TPS transformation parameters by using the datum point C at a grid generator to obtain a sampling grid on a text image;
(13) Inputting the sampling grid and the original image into a sampler, and sampling the grid points on the original image to obtain a corrected image;
the positioning network, the grid generator and the sampler are all micro, and the natural scene text image correction module updates network parameters by following back propagation;
for the step (4), a two-layer BiLSTM is adopted on the visual feature map to output a context feature map H, and the context feature map H and the visual attention feature map V are combined to obtain a new feature space D= (V, H);
the step (5) is specifically implemented as follows:
the predicted output of the decoder at time t is y t
y t =softmax(W o h t +b o ) (7)
Wherein W is o And b o To learn parameters, h t Represents the hidden state of LSTM at time t; softmax is a normalized exponential function;
h t the calculation mode of (a) is expressed as follows:
h t =LSTM(y t-1 ,c t ,h t-1 ) (8)
Wherein y is t-1 Representing a prediction of time t-1; c t Representing a semantic vector; h is a t-1 Represents the hidden state of LSTM at time t-1; LSTM is long-term memory network;
the final Loss function Loss is calculated as follows:
wherein X is i Representing a training picture; y is Y i Representing a predictive label;
and constructing a deep convolution network model according to the content, and sending the training set into the network model for training until the network model reaches convergence.
2. The method for recognizing irregular natural scene text based on an attention mechanism according to claim 1, wherein the specific process of the step (2) is as follows:
(21) Extracting channel attention map M based on channel attention mechanism c The method comprises the steps of carrying out a first treatment on the surface of the The channel attention mechanism comprises 1 maximum pooling layer, 1 average pooling layer and a multi-layer perceptron, and the activation function is sigmoid; the intermediate feature diagram F is used as the input of the maximum pooling layer and the average pooling layer respectively, the output obtained by the two pooling layers is forwarded to the multi-layer perceptron respectively, and finally the channel attention diagram M is extracted c
M c (F) =σ (MLP (AvgPool (F))+mlp (MaxPool (F))) formula (1)
Wherein F represents an intermediate feature map; avgpool is average pooling; maxPool is max pooling; MLP is a multi-layer perceptron; sigma represents a sigmoid activation function;
(22) Multiplying the channel attention map obtained in step (21) with the intermediate feature map to obtain F':
(23) Obtaining a spatial attention map Ms based on a spatial attention mechanism; the spatial attention mechanism comprises 1 maximum pooling layer, 1 average pooling layer and 1 convolution layer, F' obtained in the step (22) is taken as input to obtain maximum pooling characteristics and average pooling characteristics, the maximum pooling characteristics and the average pooling characteristics are integrated through the convolution layer, and finally the spatial attention force map M is obtained s
M s (F′)=σ(f 7×7 ([AvgPool(F′);MaxPool(F′)]) Formula (3);
wherein f 7×7 Convolution operation with a filter size of 7 x 7; sigma represents the Relu activation function;
(24) Multiplying the output of step (23) by said F 'to yield F':
(25) Adding the inputs x and F' of the overall spatial-channel mixed attention mechanism together to the Relu activation function yields the visual feature map F of the output v
F v =σ (F "+x) (5)
Wherein σ represents the Relu activation function.
3. The method for recognizing irregular natural scene text based on an attention mechanism according to claim 1, wherein the specific process of the step (3) is as follows:
the method comprises the steps of coding feature graphs with different sizes by utilizing a downsampling method in a convolution process, wherein the convolution process comprises convolution layers with the same layer number and deconvolution layers, the sizes of output of each layer of convolution layers are different, and the output of each layer of deconvolution layer is added with the output of the convolution layer with the corresponding size to be used as the input of the next deconvolution layer; finally, activating the Relu function to obtain a visual attention diagram; f (F) v Representing visual feature map, A att Representing a visual attention map obtained by attention alignment, a visual attention profile V is obtained by the following formula:
4. the method for recognizing irregular natural scene text based on an attention mechanism according to claim 1, wherein the training of the deep convolutional network model is set as follows:
the epoch of the deep convolution network model is 10;
the optimizer of the deep convolution network model is Adadelta;
the learning rate of the deep convolution network model is 0.1;
the number of pictures read in each batch of the depth convolution network model is 64;
the parameter initialization mode of the deep convolution network model is Kaiming initialization.
CN202111043808.4A 2021-09-07 2021-09-07 Attention mechanism-based irregular natural scene text recognition method Active CN113807340B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111043808.4A CN113807340B (en) 2021-09-07 2021-09-07 Attention mechanism-based irregular natural scene text recognition method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111043808.4A CN113807340B (en) 2021-09-07 2021-09-07 Attention mechanism-based irregular natural scene text recognition method

Publications (2)

Publication Number Publication Date
CN113807340A CN113807340A (en) 2021-12-17
CN113807340B true CN113807340B (en) 2024-03-15

Family

ID=78940697

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111043808.4A Active CN113807340B (en) 2021-09-07 2021-09-07 Attention mechanism-based irregular natural scene text recognition method

Country Status (1)

Country Link
CN (1) CN113807340B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114241467A (en) * 2021-12-21 2022-03-25 北京有竹居网络技术有限公司 Text recognition method and related equipment thereof
CN114937277B (en) * 2022-05-18 2023-04-11 北京百度网讯科技有限公司 Image-based text acquisition method and device, electronic equipment and storage medium
CN114863407B (en) * 2022-07-06 2022-10-04 宏龙科技(杭州)有限公司 Multi-task cold start target detection method based on visual language deep fusion

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109165697A (en) * 2018-10-12 2019-01-08 福州大学 A kind of natural scene character detecting method based on attention mechanism convolutional neural networks
CN109543667A (en) * 2018-11-14 2019-03-29 北京工业大学 A kind of text recognition method based on attention mechanism
WO2019192397A1 (en) * 2018-04-04 2019-10-10 华中科技大学 End-to-end recognition method for scene text in any shape
CN110378334A (en) * 2019-06-14 2019-10-25 华南理工大学 A kind of natural scene text recognition method based on two dimensional character attention mechanism
CN110427938A (en) * 2019-07-26 2019-11-08 中科视语(北京)科技有限公司 A kind of irregular character recognition device and method based on deep learning
CN111967470A (en) * 2020-08-20 2020-11-20 华南理工大学 Text recognition method and system based on decoupling attention mechanism
CN111985369A (en) * 2020-08-07 2020-11-24 西北工业大学 Course field multi-modal document classification method based on cross-modal attention convolution neural network
CN112215236A (en) * 2020-10-21 2021-01-12 科大讯飞股份有限公司 Text recognition method and device, electronic equipment and storage medium
CN112329760A (en) * 2020-11-17 2021-02-05 内蒙古工业大学 Method for recognizing and translating Mongolian in printed form from end to end based on space transformation network
WO2021115490A1 (en) * 2020-06-22 2021-06-17 平安科技(深圳)有限公司 Seal character detection and recognition method, device, and medium for complex environments
CN113011304A (en) * 2021-03-12 2021-06-22 山东大学 Human body posture estimation method and system based on attention multi-resolution network
CN113343707A (en) * 2021-06-04 2021-09-03 北京邮电大学 Scene text recognition method based on robustness characterization learning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3598339A1 (en) * 2018-07-19 2020-01-22 Tata Consultancy Services Limited Systems and methods for end-to-end handwritten text recognition using neural networks

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019192397A1 (en) * 2018-04-04 2019-10-10 华中科技大学 End-to-end recognition method for scene text in any shape
CN109165697A (en) * 2018-10-12 2019-01-08 福州大学 A kind of natural scene character detecting method based on attention mechanism convolutional neural networks
CN109543667A (en) * 2018-11-14 2019-03-29 北京工业大学 A kind of text recognition method based on attention mechanism
CN110378334A (en) * 2019-06-14 2019-10-25 华南理工大学 A kind of natural scene text recognition method based on two dimensional character attention mechanism
CN110427938A (en) * 2019-07-26 2019-11-08 中科视语(北京)科技有限公司 A kind of irregular character recognition device and method based on deep learning
WO2021115490A1 (en) * 2020-06-22 2021-06-17 平安科技(深圳)有限公司 Seal character detection and recognition method, device, and medium for complex environments
CN111985369A (en) * 2020-08-07 2020-11-24 西北工业大学 Course field multi-modal document classification method based on cross-modal attention convolution neural network
CN111967470A (en) * 2020-08-20 2020-11-20 华南理工大学 Text recognition method and system based on decoupling attention mechanism
CN112215236A (en) * 2020-10-21 2021-01-12 科大讯飞股份有限公司 Text recognition method and device, electronic equipment and storage medium
CN112329760A (en) * 2020-11-17 2021-02-05 内蒙古工业大学 Method for recognizing and translating Mongolian in printed form from end to end based on space transformation network
CN113011304A (en) * 2021-03-12 2021-06-22 山东大学 Human body posture estimation method and system based on attention multi-resolution network
CN113343707A (en) * 2021-06-04 2021-09-03 北京邮电大学 Scene text recognition method based on robustness characterization learning

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Representative Batch Normalization for Scene Text Recognition;Yajie Sun等;《KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS》;第16卷(第07期);2390-2406 *
What Is Wrong With Scene Text Recognition Model Comparisons?Dataset and Model Analysis;Jeonghun Baek等;《arXiv:1904.01906v4》;20191218;1-19 *
基于双注意力机制的场景中文文本识别;陈炫颖;《中国优秀硕士学位论文全文数据库_信息科技辑》;20210215;I138-1782 *
朱庆棠等 .《周围神经缺损修复材料的生物制造与临床评估》.中山大学出版社,2018,第136-139页. *

Also Published As

Publication number Publication date
CN113807340A (en) 2021-12-17

Similar Documents

Publication Publication Date Title
CN113807340B (en) Attention mechanism-based irregular natural scene text recognition method
CN109948691B (en) Image description generation method and device based on depth residual error network and attention
CN110414498B (en) Natural scene text recognition method based on cross attention mechanism
CN111160533A (en) Neural network acceleration method based on cross-resolution knowledge distillation
CN111950453A (en) Optional-shape text recognition method based on selective attention mechanism
CN114782694B (en) Unsupervised anomaly detection method, system, device and storage medium
CN111428727B (en) Natural scene text recognition method based on sequence transformation correction and attention mechanism
CN112070114B (en) Scene character recognition method and system based on Gaussian constraint attention mechanism network
CN112800876A (en) Method and system for embedding hypersphere features for re-identification
CN111598842A (en) Method and system for generating model of insulator defect sample and storage medium
CN111368773A (en) Mathematical formula identification method and device, terminal equipment and readable storage medium
CN115393396B (en) Unmanned aerial vehicle target tracking method based on mask pre-training
CN113160246A (en) Image semantic segmentation method based on depth supervision
CN115620010A (en) Semantic segmentation method for RGB-T bimodal feature fusion
CN113591978A (en) Image classification method, device and storage medium based on confidence penalty regularization self-knowledge distillation
CN116228792A (en) Medical image segmentation method, system and electronic device
CN114565628B (en) Image segmentation method and system based on boundary perception attention
CN116188509A (en) High-efficiency three-dimensional image segmentation method
CN116363149A (en) Medical image segmentation method based on U-Net improvement
CN115240259A (en) Face detection method and face detection system based on YOLO deep network in classroom environment
CN114581918A (en) Text recognition model training method and device
CN111310820A (en) Foundation meteorological cloud chart classification method based on cross validation depth CNN feature integration
CN111814693A (en) Marine ship identification method based on deep learning
CN116416649A (en) Video pedestrian re-identification method based on multi-scale resolution alignment
CN113256528B (en) Low-illumination video enhancement method based on multi-scale cascade depth residual error network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant