CN113807340B - Attention mechanism-based irregular natural scene text recognition method - Google Patents
Attention mechanism-based irregular natural scene text recognition method Download PDFInfo
- Publication number
- CN113807340B CN113807340B CN202111043808.4A CN202111043808A CN113807340B CN 113807340 B CN113807340 B CN 113807340B CN 202111043808 A CN202111043808 A CN 202111043808A CN 113807340 B CN113807340 B CN 113807340B
- Authority
- CN
- China
- Prior art keywords
- attention
- visual
- feature
- layer
- convolution
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 37
- 230000007246 mechanism Effects 0.000 title claims abstract description 33
- 230000001788 irregular Effects 0.000 title claims abstract description 25
- 230000000007 visual effect Effects 0.000 claims abstract description 46
- 238000003702 image correction Methods 0.000 claims abstract description 9
- 238000000605 extraction Methods 0.000 claims abstract description 6
- 238000013528 artificial neural network Methods 0.000 claims abstract description 4
- 238000011176 pooling Methods 0.000 claims description 28
- 230000006870 function Effects 0.000 claims description 20
- 230000004913 activation Effects 0.000 claims description 13
- 230000008569 process Effects 0.000 claims description 11
- 238000012549 training Methods 0.000 claims description 11
- 238000010586 diagram Methods 0.000 claims description 10
- 238000005070 sampling Methods 0.000 claims description 6
- 230000009466 transformation Effects 0.000 claims description 4
- 230000003213 activating effect Effects 0.000 claims description 2
- 238000004364 calculation method Methods 0.000 claims description 2
- 238000010606 normalization Methods 0.000 claims description 2
- 239000013598 vector Substances 0.000 claims description 2
- 230000007787 long-term memory Effects 0.000 claims 1
- 230000000694 effects Effects 0.000 abstract description 3
- 239000000284 extract Substances 0.000 abstract description 3
- 238000013473 artificial intelligence Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses an irregular natural scene text recognition method based on an attention mechanism, wherein a natural scene text image correction module positions the shape of a text region and corrects the text region into a regular text image; the feature extraction module extracts visual feature graphs with different scales; the attention mechanism alignment module aligns the visual feature images with different scales by using a full convolution neural network to obtain visual attention feature images; the context feature space module is used for selecting the obtained visual attention feature map through context to obtain a context feature map of the image, and then connecting the visual attention feature map of the image with the context feature map to obtain a new feature space; the attention mechanism sequence recognition module uses the LSTM attention decoder to simultaneously decode the obtained context feature space to obtain a recognition result. The method and the device can improve the recognition effect of the irregular scene text, the recognition accuracy is not affected by nearby text and background noise, and the application scenes of character recognition are increased.
Description
Technical Field
The invention relates to a natural scene text recognition method, in particular to an irregular natural scene text recognition method based on an attention mechanism, and belongs to the technical field of pattern recognition and artificial intelligence.
Background
With the development of informatization technology, artificial intelligence is a current research hotspot, and natural scene text recognition is a part of the artificial intelligence technology, so that high importance is placed on researchers. Many natural scene text recognition techniques have achieved significant results at present, particularly in terms of regular scene text recognition, thanks to the rapid development of deep learning. But scene text images are often affected by shooting conditions, resulting in uneven quality of the scene text images, such as curved text, perspective text, noise, etc., which can affect the accuracy of recognition. In order to solve the problem of irregular scene text recognition, in recent years, there have been many research teams proposed to correct an original text image into an image having regular text by using a text correction model. The corrected image is prone to introducing new noise that can interfere with the accuracy of text recognition. In addition, the method using the attention mechanism has a significant influence on the field of natural scene recognition. However, most attention methods encounter alignment problems due to repeated use of historical decoding information.
The existing irregular scene text recognition technology does not solve the problems of newly added noise interference and attention alignment,
disclosure of Invention
Aiming at the defects, the invention provides an irregular scene text recognition method based on an attention mechanism, which aims to solve the problems in the background technology.
The invention is realized by the following technical scheme:
an irregular natural scene text recognition method based on an attention mechanism is characterized by comprising the following steps of:
(1) Positioning the shape of the text region by using a natural scene text image correction module, and correcting an irregular natural scene text image into a regular text image;
(2) Introducing a space-channel mixed attention mechanism into ResNet to construct a feature extraction module, and extracting visual feature graphs with different scales by using the feature extraction module;
(3) Using a full convolution neural network to align visual feature graphs of different scales to obtain a visual attention map; multiplying the visual feature map and the visual attention map to obtain a visual attention feature map;
(4) Selecting the obtained visual attention feature map through a double-layer BiLSTM context to obtain a context feature map of the image, and then connecting the visual attention feature map with the context feature map to obtain a new feature space D, wherein the feature space D comprises visual features and context features of the image;
(5) The recognition result is obtained by decoding the feature space D using an LSTM attention decoder.
Optionally, the specific process of step (1) is as follows:
(11) Constructing a positioning network, acquiring the shape of a text region, and positioning a datum point C of the upper edge and the lower edge; the positioning network includes 4 convolutional layers followed by 1 batch normalization layer and 2 maximum pooling layers; the positioning network adopts a Relu activation function;
(12) Calculating TPS transformation parameters by using the datum point C at a grid generator to obtain a sampling grid on a text image;
(13) And inputting the sampling grid and the original image into a sampler, and sampling the grid points on the original image to obtain a corrected image.
Alternatively, the positioning network, grid generator and sampler may all be micro, with the natural scene text image correction module following back propagation to update network parameters.
Optionally, the specific process of step (2) is as follows:
(21) Extracting channel attention map M based on channel attention mechanism c The method comprises the steps of carrying out a first treatment on the surface of the The channel attention mechanism comprises 1 maximum pooling layer, 1 average pooling layer and a multi-layer perceptron, and the activation function is sigmoid; the intermediate feature diagram F is used as the input of the maximum pooling layer and the average pooling layer respectively, the output obtained by the two pooling layers is forwarded to the multi-layer perceptron respectively, and finally the channel attention diagram M is extracted c :
M c (F) =σ (MLP (AvgPool (F))+mlp (MaxPool (F))) formula (1)
Wherein F represents an intermediate feature map; avgpool is average pooling; maxPool is max pooling; MLP is a multi-layer perceptron; sigma represents a sigmoid activation function;
(22) Multiplying the channel attention map obtained in step (21) with the intermediate feature map to obtain F':
(23) Obtaining a spatial attention map M based on a spatial attention mechanism s The method comprises the steps of carrying out a first treatment on the surface of the The spatial attention mechanism comprises 1 maximum pooling layer, 1 average pooling layer and 1 convolution layer, F' obtained in the step (22) is taken as input to obtain maximum pooling characteristics and average pooling characteristics, the maximum pooling characteristics and the average pooling characteristics are integrated through the convolution layer, and finally the spatial attention force map M is obtained s :
M s (F′)=σ(f 7×7 ([AvgPool(F′);MaxPool(F′)]) Arbitrary (3)
Wherein f 7×7 Convolution operation with a filter size of 7 x 7; sigma represents the Relu activation function;
(24) Multiplying the output of step (23) by said F 'to yield F':
(25) Adding the inputs x and F' of the overall spatial-channel mixed attention mechanism together to the Relu activation function yields the visual feature map F of the output v :
F v =σ (F "+x) (5)
Wherein σ represents the Relu activation function.
Optionally, the specific process of step (3) is as follows:
the method comprises the steps of coding feature graphs with different sizes by utilizing a downsampling method in a convolution process, wherein the convolution process comprises convolution layers with the same layer number and deconvolution layers, the sizes of output of each layer of convolution layers are different, and the output of each layer of deconvolution layer is added with the output of the convolution layer with the corresponding size to be used as the input of the next deconvolution layer; finally, activating the Relu function to obtain a visual attention diagram; f (F) v Representing visual feature map, A att Representing a visual attention map obtained by attention alignment, a visual attention profile V is obtained by the following formula:
optionally, for step (4), using two layers of BiLSTM to output a context feature map H on the visual feature map, and combining the context feature map H and the visual attention feature map V to obtain a new feature space d= (V, H);
optionally, the following is implemented for step (5):
the predicted output of the encoder at time t is y t :
y t =softmax(W o h t +b o ) (7)
Wherein W is o And b o To learn parameters, h t Represents the hidden state of LSTM at time t; softmax is a normalized exponential function; h is a t The calculation mode of (a) is expressed as follows:
h t =LSTM(y t-1 ,c t ,h t-1 ) (8)
Wherein y is t-1 Representing the prediction at time t-1. c t Representing semantic vectors, h t-1 Represents the hidden state of LSTM at time t-1; LSTM is long-short-term memory network
The final Loss function Loss is calculated as follows:
wherein X is i Representing a training picture; y is Y i Representing a predictive label;
and constructing a deep convolution network model according to the content, and sending the training set into the network model for training until the network model reaches convergence.
Optionally, the training of the deep convolutional network model is set as follows:
the epoch of the deep convolution network model is 10;
the optimizer of the deep convolution network model is Adadelta;
the learning rate of the deep convolution network model is 0.1;
the number of pictures read in each batch of the depth convolution network model is 64;
the parameter initialization mode of the deep convolution network model is Kaiming initialization.
The beneficial effects brought by adopting the technical scheme are as follows:
(1) A text image correction module is introduced to improve the recognition effect of the irregular scene text;
(2) Introducing a channel-space attention mechanism so that recognition accuracy is not affected by nearby text and background noise;
drawings
FIG. 1 is a network block diagram of irregular natural scene text recognition based on an attention mechanism;
FIG. 2 is a flow chart of an irregular natural scene text recognition method based on an attention mechanism of the present invention;
fig. 3 is a network configuration diagram of the feature extractor.
Detailed Description
Other advantages and effects of the present invention will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present invention with reference to specific examples. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention. It should be noted that the illustrations provided in the following embodiments merely illustrate the basic idea of the present invention by way of illustration, and the following embodiments and features in the embodiments may be combined with each other without conflict.
Wherein the drawings are for illustrative purposes only and are shown in schematic, non-physical, and not intended to limit the invention; for the purpose of better illustrating embodiments of the invention, certain elements of the drawings may be omitted, enlarged or reduced and do not represent the size of the actual product; it will be appreciated by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The invention provides an irregular natural scene text recognition method based on an attention mechanism, wherein the network structure of the irregular natural scene text recognition method is shown in figure 1, and the irregular natural scene text recognition method comprises a natural scene text image correction module, a feature extraction module, an attention alignment module and a text decoding module;
the natural scene text image correction module positions the shape of the text region and corrects the irregular natural scene text image into a regular text image;
the feature extractor extracts visual feature graphs with different scales;
the attention mechanism alignment model uses a full convolution neural network to align the visual feature images with different scales to obtain an attention force map;
the text decoding module decodes the visual feature map and the attention map simultaneously using an LSTM attention decoder to obtain a recognition result.
As shown in fig. 2, the irregular natural scene text recognition system based on the attention mechanism includes the steps of:
step one: a dataset is prepared, and the dataset is divided into a training dataset and a test dataset.
For the training data set, the present invention selects the synthetic data set SynthText training network. Network performance evaluations were performed on the universal seven test sets, including rule text data set IIIT5K, ICDAR2003, ICDAR2013 and rule text data sets SVT-Perspective, CUTE80, ICDAR2015.
Step two: firstly, an irregular text image I is corrected into an image I' with regular text by using a natural scene text image correction module, and the implementation process is as follows: inputting the image into a positioning network, detecting a text region of the image, and acquiring a group of datum points C of the upper edge and the lower edge of the text; then, the grid generator calculates TPS transformation parameters by using the reference point C, and a grid sampler P= { P on the image I is obtained according to TPS transformation i }. And finally, generating a corrected image I' on the sampler by performing bilinear sample insertion on the pixel points on the grid generator.
Step three: inputting the corrected image I' into a feature extractor to extract visual feature images F with different sizes v . The network structure of the feature extractor is shown in fig. 3, the network is composed of a basic convolution layer and 5 convolution blocks, each convolution block respectively comprises 3, 4, 6 and 3 layers of convolutions, and the input of each convolution block is spliced with a channel-space attention module after being activated by a Relu function to obtain an output sequence, so that the channel information and the space information of feature diagrams at different stages can be well combined.
Step four: the visual feature map is input into an attention mechanism alignment model, the feature maps with different sizes are encoded by using a convolution block, then the features with different sizes are added with the corresponding size features output by the convolution stage by using a deconvolution block, and then the attention map is obtained through the activation of a Relu function. The visual attention profile is multiplied by the visual attention profile.
Step five: the visual feature map extracts the context information via a context selector, which consists of two bilstms, and then connects the context information and the visual attention feature map to obtain a new feature space D.
Step six: the feature space D is input to a text decoder for decoding each character in turn.
And inputting a scene text image, and accurately identifying the image based on an irregular natural scene text identification model of an attention mechanism to obtain characters in the text image.
Finally, it is noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made thereto without departing from the spirit and scope of the present invention, which is intended to be covered by the claims of the present invention.
Claims (4)
1. An irregular natural scene text recognition method based on an attention mechanism is characterized by comprising the following steps of:
(1) Positioning the shape of the text region by using a natural scene text image correction module, and correcting an irregular natural scene text image into a regular text image;
(2) Introducing a space-channel mixed attention mechanism into ResNet to construct a feature extraction module, and using the feature extraction module
Extracting visual feature diagrams with different scales;
(3) Using a full convolution neural network to align visual feature graphs of different scales to obtain a visual attention map; multiplying the visual feature map and the visual attention map to obtain a visual attention feature map;
(4) Selecting the obtained visual attention feature map through a double-layer BiLSTM context to obtain a context feature map of the image, and then connecting the visual attention feature map with the context feature map to obtain a new feature space D, wherein the feature space D comprises visual features and context features of the image;
(5) Decoding the feature space D by using an LSTM attention decoder to obtain a recognition result;
the specific process of the step (1) is as follows:
(11) Constructing a positioning network, acquiring the shape of a text region, and positioning a datum point C of the upper edge and the lower edge; the positioning network comprises 4 convolution layers, wherein the convolution layers are connected with 1 batch normalization layer and 2 maximum pooling layers; the positioning network adopts a Relu activation function;
(12) Calculating TPS transformation parameters by using the datum point C at a grid generator to obtain a sampling grid on a text image;
(13) Inputting the sampling grid and the original image into a sampler, and sampling the grid points on the original image to obtain a corrected image;
the positioning network, the grid generator and the sampler are all micro, and the natural scene text image correction module updates network parameters by following back propagation;
for the step (4), a two-layer BiLSTM is adopted on the visual feature map to output a context feature map H, and the context feature map H and the visual attention feature map V are combined to obtain a new feature space D= (V, H);
the step (5) is specifically implemented as follows:
the predicted output of the decoder at time t is y t :
y t =softmax(W o h t +b o ) (7)
Wherein W is o And b o To learn parameters, h t Represents the hidden state of LSTM at time t; softmax is a normalized exponential function;
h t the calculation mode of (a) is expressed as follows:
h t =LSTM(y t-1 ,c t ,h t-1 ) (8)
Wherein y is t-1 Representing a prediction of time t-1; c t Representing a semantic vector; h is a t-1 Represents the hidden state of LSTM at time t-1; LSTM is long-term memory network;
the final Loss function Loss is calculated as follows:
wherein X is i Representing a training picture; y is Y i Representing a predictive label;
and constructing a deep convolution network model according to the content, and sending the training set into the network model for training until the network model reaches convergence.
2. The method for recognizing irregular natural scene text based on an attention mechanism according to claim 1, wherein the specific process of the step (2) is as follows:
(21) Extracting channel attention map M based on channel attention mechanism c The method comprises the steps of carrying out a first treatment on the surface of the The channel attention mechanism comprises 1 maximum pooling layer, 1 average pooling layer and a multi-layer perceptron, and the activation function is sigmoid; the intermediate feature diagram F is used as the input of the maximum pooling layer and the average pooling layer respectively, the output obtained by the two pooling layers is forwarded to the multi-layer perceptron respectively, and finally the channel attention diagram M is extracted c :
M c (F) =σ (MLP (AvgPool (F))+mlp (MaxPool (F))) formula (1)
Wherein F represents an intermediate feature map; avgpool is average pooling; maxPool is max pooling; MLP is a multi-layer perceptron; sigma represents a sigmoid activation function;
(22) Multiplying the channel attention map obtained in step (21) with the intermediate feature map to obtain F':
(23) Obtaining a spatial attention map Ms based on a spatial attention mechanism; the spatial attention mechanism comprises 1 maximum pooling layer, 1 average pooling layer and 1 convolution layer, F' obtained in the step (22) is taken as input to obtain maximum pooling characteristics and average pooling characteristics, the maximum pooling characteristics and the average pooling characteristics are integrated through the convolution layer, and finally the spatial attention force map M is obtained s :
M s (F′)=σ(f 7×7 ([AvgPool(F′);MaxPool(F′)]) Formula (3);
wherein f 7×7 Convolution operation with a filter size of 7 x 7; sigma represents the Relu activation function;
(24) Multiplying the output of step (23) by said F 'to yield F':
(25) Adding the inputs x and F' of the overall spatial-channel mixed attention mechanism together to the Relu activation function yields the visual feature map F of the output v :
F v =σ (F "+x) (5)
Wherein σ represents the Relu activation function.
3. The method for recognizing irregular natural scene text based on an attention mechanism according to claim 1, wherein the specific process of the step (3) is as follows:
the method comprises the steps of coding feature graphs with different sizes by utilizing a downsampling method in a convolution process, wherein the convolution process comprises convolution layers with the same layer number and deconvolution layers, the sizes of output of each layer of convolution layers are different, and the output of each layer of deconvolution layer is added with the output of the convolution layer with the corresponding size to be used as the input of the next deconvolution layer; finally, activating the Relu function to obtain a visual attention diagram; f (F) v Representing visual feature map, A att Representing a visual attention map obtained by attention alignment, a visual attention profile V is obtained by the following formula:
4. the method for recognizing irregular natural scene text based on an attention mechanism according to claim 1, wherein the training of the deep convolutional network model is set as follows:
the epoch of the deep convolution network model is 10;
the optimizer of the deep convolution network model is Adadelta;
the learning rate of the deep convolution network model is 0.1;
the number of pictures read in each batch of the depth convolution network model is 64;
the parameter initialization mode of the deep convolution network model is Kaiming initialization.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111043808.4A CN113807340B (en) | 2021-09-07 | 2021-09-07 | Attention mechanism-based irregular natural scene text recognition method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111043808.4A CN113807340B (en) | 2021-09-07 | 2021-09-07 | Attention mechanism-based irregular natural scene text recognition method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113807340A CN113807340A (en) | 2021-12-17 |
CN113807340B true CN113807340B (en) | 2024-03-15 |
Family
ID=78940697
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111043808.4A Active CN113807340B (en) | 2021-09-07 | 2021-09-07 | Attention mechanism-based irregular natural scene text recognition method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113807340B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114241467A (en) * | 2021-12-21 | 2022-03-25 | 北京有竹居网络技术有限公司 | Text recognition method and related equipment thereof |
CN114937277B (en) * | 2022-05-18 | 2023-04-11 | 北京百度网讯科技有限公司 | Image-based text acquisition method and device, electronic equipment and storage medium |
CN114863407B (en) * | 2022-07-06 | 2022-10-04 | 宏龙科技(杭州)有限公司 | Multi-task cold start target detection method based on visual language deep fusion |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109165697A (en) * | 2018-10-12 | 2019-01-08 | 福州大学 | A kind of natural scene character detecting method based on attention mechanism convolutional neural networks |
CN109543667A (en) * | 2018-11-14 | 2019-03-29 | 北京工业大学 | A kind of text recognition method based on attention mechanism |
WO2019192397A1 (en) * | 2018-04-04 | 2019-10-10 | 华中科技大学 | End-to-end recognition method for scene text in any shape |
CN110378334A (en) * | 2019-06-14 | 2019-10-25 | 华南理工大学 | A kind of natural scene text recognition method based on two dimensional character attention mechanism |
CN110427938A (en) * | 2019-07-26 | 2019-11-08 | 中科视语(北京)科技有限公司 | A kind of irregular character recognition device and method based on deep learning |
CN111967470A (en) * | 2020-08-20 | 2020-11-20 | 华南理工大学 | Text recognition method and system based on decoupling attention mechanism |
CN111985369A (en) * | 2020-08-07 | 2020-11-24 | 西北工业大学 | Course field multi-modal document classification method based on cross-modal attention convolution neural network |
CN112215236A (en) * | 2020-10-21 | 2021-01-12 | 科大讯飞股份有限公司 | Text recognition method and device, electronic equipment and storage medium |
CN112329760A (en) * | 2020-11-17 | 2021-02-05 | 内蒙古工业大学 | Method for recognizing and translating Mongolian in printed form from end to end based on space transformation network |
WO2021115490A1 (en) * | 2020-06-22 | 2021-06-17 | 平安科技(深圳)有限公司 | Seal character detection and recognition method, device, and medium for complex environments |
CN113011304A (en) * | 2021-03-12 | 2021-06-22 | 山东大学 | Human body posture estimation method and system based on attention multi-resolution network |
CN113343707A (en) * | 2021-06-04 | 2021-09-03 | 北京邮电大学 | Scene text recognition method based on robustness characterization learning |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3598339A1 (en) * | 2018-07-19 | 2020-01-22 | Tata Consultancy Services Limited | Systems and methods for end-to-end handwritten text recognition using neural networks |
-
2021
- 2021-09-07 CN CN202111043808.4A patent/CN113807340B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019192397A1 (en) * | 2018-04-04 | 2019-10-10 | 华中科技大学 | End-to-end recognition method for scene text in any shape |
CN109165697A (en) * | 2018-10-12 | 2019-01-08 | 福州大学 | A kind of natural scene character detecting method based on attention mechanism convolutional neural networks |
CN109543667A (en) * | 2018-11-14 | 2019-03-29 | 北京工业大学 | A kind of text recognition method based on attention mechanism |
CN110378334A (en) * | 2019-06-14 | 2019-10-25 | 华南理工大学 | A kind of natural scene text recognition method based on two dimensional character attention mechanism |
CN110427938A (en) * | 2019-07-26 | 2019-11-08 | 中科视语(北京)科技有限公司 | A kind of irregular character recognition device and method based on deep learning |
WO2021115490A1 (en) * | 2020-06-22 | 2021-06-17 | 平安科技(深圳)有限公司 | Seal character detection and recognition method, device, and medium for complex environments |
CN111985369A (en) * | 2020-08-07 | 2020-11-24 | 西北工业大学 | Course field multi-modal document classification method based on cross-modal attention convolution neural network |
CN111967470A (en) * | 2020-08-20 | 2020-11-20 | 华南理工大学 | Text recognition method and system based on decoupling attention mechanism |
CN112215236A (en) * | 2020-10-21 | 2021-01-12 | 科大讯飞股份有限公司 | Text recognition method and device, electronic equipment and storage medium |
CN112329760A (en) * | 2020-11-17 | 2021-02-05 | 内蒙古工业大学 | Method for recognizing and translating Mongolian in printed form from end to end based on space transformation network |
CN113011304A (en) * | 2021-03-12 | 2021-06-22 | 山东大学 | Human body posture estimation method and system based on attention multi-resolution network |
CN113343707A (en) * | 2021-06-04 | 2021-09-03 | 北京邮电大学 | Scene text recognition method based on robustness characterization learning |
Non-Patent Citations (4)
Title |
---|
Representative Batch Normalization for Scene Text Recognition;Yajie Sun等;《KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS》;第16卷(第07期);2390-2406 * |
What Is Wrong With Scene Text Recognition Model Comparisons?Dataset and Model Analysis;Jeonghun Baek等;《arXiv:1904.01906v4》;20191218;1-19 * |
基于双注意力机制的场景中文文本识别;陈炫颖;《中国优秀硕士学位论文全文数据库_信息科技辑》;20210215;I138-1782 * |
朱庆棠等 .《周围神经缺损修复材料的生物制造与临床评估》.中山大学出版社,2018,第136-139页. * |
Also Published As
Publication number | Publication date |
---|---|
CN113807340A (en) | 2021-12-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113807340B (en) | Attention mechanism-based irregular natural scene text recognition method | |
CN109948691B (en) | Image description generation method and device based on depth residual error network and attention | |
CN110414498B (en) | Natural scene text recognition method based on cross attention mechanism | |
CN111160533A (en) | Neural network acceleration method based on cross-resolution knowledge distillation | |
CN111950453A (en) | Optional-shape text recognition method based on selective attention mechanism | |
CN114782694B (en) | Unsupervised anomaly detection method, system, device and storage medium | |
CN111428727B (en) | Natural scene text recognition method based on sequence transformation correction and attention mechanism | |
CN112070114B (en) | Scene character recognition method and system based on Gaussian constraint attention mechanism network | |
CN112800876A (en) | Method and system for embedding hypersphere features for re-identification | |
CN111598842A (en) | Method and system for generating model of insulator defect sample and storage medium | |
CN111368773A (en) | Mathematical formula identification method and device, terminal equipment and readable storage medium | |
CN115393396B (en) | Unmanned aerial vehicle target tracking method based on mask pre-training | |
CN113160246A (en) | Image semantic segmentation method based on depth supervision | |
CN115620010A (en) | Semantic segmentation method for RGB-T bimodal feature fusion | |
CN113591978A (en) | Image classification method, device and storage medium based on confidence penalty regularization self-knowledge distillation | |
CN116228792A (en) | Medical image segmentation method, system and electronic device | |
CN114565628B (en) | Image segmentation method and system based on boundary perception attention | |
CN116188509A (en) | High-efficiency three-dimensional image segmentation method | |
CN116363149A (en) | Medical image segmentation method based on U-Net improvement | |
CN115240259A (en) | Face detection method and face detection system based on YOLO deep network in classroom environment | |
CN114581918A (en) | Text recognition model training method and device | |
CN111310820A (en) | Foundation meteorological cloud chart classification method based on cross validation depth CNN feature integration | |
CN111814693A (en) | Marine ship identification method based on deep learning | |
CN116416649A (en) | Video pedestrian re-identification method based on multi-scale resolution alignment | |
CN113256528B (en) | Low-illumination video enhancement method based on multi-scale cascade depth residual error network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |