CN110414498B - Natural scene text recognition method based on cross attention mechanism - Google Patents

Natural scene text recognition method based on cross attention mechanism Download PDF

Info

Publication number
CN110414498B
CN110414498B CN201910517855.4A CN201910517855A CN110414498B CN 110414498 B CN110414498 B CN 110414498B CN 201910517855 A CN201910517855 A CN 201910517855A CN 110414498 B CN110414498 B CN 110414498B
Authority
CN
China
Prior art keywords
training
network
text
feature
character
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910517855.4A
Other languages
Chinese (zh)
Other versions
CN110414498A (en
Inventor
罗灿杰
金连文
黄云龙
林庆祥
周伟英
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN201910517855.4A priority Critical patent/CN110414498B/en
Publication of CN110414498A publication Critical patent/CN110414498A/en
Application granted granted Critical
Publication of CN110414498B publication Critical patent/CN110414498B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The invention discloses a natural scene text recognition method based on a cross attention mechanism, which comprises the following steps of data acquisition: downloading a sample picture in a natural scene, and synthesizing a training set for the picture by using a public code; and (3) data processing: stretching all training sample pictures, wherein the sizes of the processed sample pictures are 32 x 100, the aspect ratio is consistent with that of the original picture, and the insufficient parts are filled by black edges; and (3) label manufacturing: training the recognition model by adopting a supervised method, so that each line of text pictures has corresponding text information; training network: inputting the prepared training picture data and labels into a cross attention network for training, wherein the cross attention network consists of a vertical attention network and a horizontal attention network; and inputting test data into a trained network, and finally obtaining a recognition result and predicting the confidence coefficient of each character. The method has high recognition accuracy, strong robustness and good recognition performance for the irregular text.

Description

Natural scene text recognition method based on cross attention mechanism
Technical Field
The invention belongs to the technical field of pattern recognition and artificial intelligence, and particularly relates to a natural scene text recognition method based on a cross attention mechanism.
Background
With the rapid development of computer technology, artificial intelligence technology is gradually changing our lives, so that our lives become more convenient and efficient. The rapid development of hardware technologies such as GPU in recent years also makes practical application of deep neural networks possible.
In real life, we are free from text. Human visual information is largely carried by text. In the past or future, people rely heavily on obtaining information from words, and the word information is obtained, so that it is a crucial step to correctly identify words. It is simple for humans to recognize text from a picture, but it is not an easy task for computers. If a computer is needed to assist a human in understanding the information in the drawing, the computer is first required to correctly identify the text from the drawing. The characters existing in the natural scene are rich and varied in background, and often due to some artistic effects, the arrangement shape of the fonts is irregular, such as a curved shape, which greatly increases the difficulty of recognizing the text from the picture by a computer, and various factors make the recognition of the natural scene text difficult and heavy, so that a method for recognizing the natural scene text more effectively is urgently needed.
The research progress of the deep neural network provides tools for us, and recently, researchers propose various methods for recognizing natural scene texts by using the deep neural network, wherein the attention mechanism-based method greatly improves the recognition rate of a model based on the method by a special decoding mode and semantic deduction characteristics, and the recognition network based on the attention mechanism is currently applied to a plurality of text recognition systems, but the traditional scene text recognition method based on the attention mechanism directly compresses an original picture into a feature picture with the height of 1, so that noise is introduced to the feature picture for decoding, thereby influencing the recognition result.
Disclosure of Invention
The invention aims to provide a natural scene text recognition method based on a cross attention mechanism, which solves the problems in the prior art and enables the text with irregular arrangement shape to be correctly recognized.
In order to achieve the above object, the present invention provides the following solutions: the invention provides a natural scene text recognition method based on a cross attention mechanism, which comprises the following steps:
s1, data acquisition: downloading a sample picture in a natural scene, and synthesizing a training set for the picture by using a disclosed code, wherein the training set comprises fonts and backgrounds;
s2, data processing: stretching all training sample pictures, wherein the sizes of the processed sample pictures are 32 x 100, the aspect ratio is consistent with that of the original picture, and the insufficient parts are filled by black edges;
s3, label manufacturing: training the recognition model by adopting a supervised method, so that each line of text pictures has corresponding text information;
s4, training a network: inputting the data and the labels of the training pictures in the step S2 into a cross attention network for training, wherein the cross attention network consists of a vertical attention network and a horizontal attention network which are connected in series for training;
s5, inputting test data into the trained network, finally obtaining a recognition result, and predicting the confidence coefficient of each character according to the recognition result.
Preferably, the text of the training set in the step S1 covers multiple fonts and multiple backgrounds.
Preferably, in the step S2, the training sample picture is stretched, wherein the height is stretched into 32 pixels, the width is stretched according to the original aspect ratio, and the portion with insufficient width is filled with black edges.
Preferably, the step S3 includes the steps of:
s3.1, synthesizing a line text picture by the disclosed codes and text corpus;
s3.2, storing the text content in each text picture in a corresponding text file;
s3.3, randomly dividing the synthesized line text pictures into a training set and a verification set.
Preferably, the step S4 includes:
s4.1, constructing a convolutional neural network taking a convolutional module as a basic unit;
s4.2, constructing a cross attention network, wherein the cross attention network utilizes asymmetric convolution to calculate that a weight vector occupied by a feature vector in each sub-feature map of the two-dimensional feature map is alpha j ={α 1,j ,α 2,j ,...,α n,j }:
Figure GDA0004016949870000031
Wherein H represents the height of the feature map, g i,j The weight occupied by the j-th column vector in the i-th row of the two-dimensional feature map in the j-th column is calculated by the vertical attention network:
g i,j =conv 1×1 (conv 3×1 (X j )+X j ).
wherein conv h×w (X) represents the calculation of the convolution check feature pattern X with the height h and width w, X j Representing a j-th column sub-feature map in a two-dimensional feature map generated by the convolutional neural network;
after the weights occupied by the feature vectors of the sub-feature graphs segmented by columns are calculated, each sub-feature graph carries out weighted summation according to the weights of the feature vectors at the corresponding positions to obtain a feature vector f v,j
Figure GDA0004016949870000041
Feature vector f generated by all sub-feature graphs v,j Spliced into a characteristic sequence f v ,f v After passing through a BLSTM network composed of a two-way long-short-time memory model, a feature sequence f 'with context features is obtained' v The characteristic sequence f' v Feeding inA horizontal attention network that extracts, at each point in time, a confidence probability distribution y containing the character currently to be recognized t
y t =softmax(ψ(h t ))
Wherein, psi represents the full-connection layer for decoding the vector h t The dimension of (2) is reduced to the target character number, h t Context vector ctx by horizontal attention network t Word embedding vector emb of character decoded by last time node t-1 And (3) sending the obtained product into a gate control circulation unit to obtain:
h t =GRU([ctx t ;emb t-1 ],h t-1 ),
wherein GRU represents GRU network operation, h t-1 Hidden layer vector, context vector ctx, representing GRU network output at last point in time t The method is characterized by comprising the following steps of:
Figure GDA0004016949870000042
beta, beta j Representing the weight of the j-th feature vector in the feature sequence, and using the full connection layer to perform the feature sequence f' v And vector h used for decoding the previous character t-1 And (3) obtaining:
Figure GDA0004016949870000043
e t,j =W T tanh(Qf′ v +Vh t-1 +b)
wherein W is T Q, V, b represent the weight value of training; selecting y t The character corresponding to the value with the highest confidence coefficient in the middle is obtained to obtain the current decoding output character c t
S4.3, training parameter setting: the training data is sent into a network for training, the network traverses the training data set for 10 times, the used optimization algorithm is a self-adaptive gradient descent method, the initial learning rate is 1, the learning rate is manually adjusted to 0.1 after the network traverses the training set for 5 times, then the training is continued, and the network traverses the training set for 10 times again;
the Loss value Loss of network output can be calculated by the following formula:
Figure GDA0004016949870000051
where N represents the amount of data used for the batch optimization,
Figure GDA0004016949870000052
indicating that the character +.>
Figure GDA0004016949870000053
Probability of (2);
s4.4, initializing weight: the weight parameters in all networks are randomly initialized by using Gaussian noise at the initial stage of training;
s4.5, training a convolutional neural network: the probability that each character of the target character string is output at its corresponding time point is used as cross entropy, and the gradient descent method is used to minimize the cross entropy.
Preferably, the step S5 includes:
s5.1, inputting pictures and labels in the test set into a trained network for identification test;
s5.2, after identification is completed, calculating the accuracy rate by the program;
s5.3, randomly displaying the visual effect of the recognition process of 20 photos, wherein the character characteristics of each photo are selected by the horizontal attention network and the vertical attention network in a crossing way.
The invention discloses the following technical effects:
1. due to the adoption of the automatic learning recognition algorithm of the deep network structure, effective expression can be well learned from data, and the recognition accuracy is improved.
2. The invention adopts end-to-end training, does not need to mark the position of each character, and saves the marking cost.
3. The classification method has high recognition accuracy, strong robustness and good recognition performance for the irregular text.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the drawings that are needed in the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a flow chart of data acquisition and processing in accordance with the present invention;
FIG. 3 is a natural scene text recognition flow chart of the present invention;
FIG. 4 is an example of the result of an attention heat map in the identification process of the present invention;
fig. 5 is a table of the deep convolutional neural network structure and parameter configuration of the present invention.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
In order that the above-recited objects, features and advantages of the present application will become more readily apparent, a more particular description of the invention briefly described above will be rendered by reference to specific embodiments that are illustrated in the appended drawings.
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.
Referring to fig. 1-5, the invention provides a natural scene text recognition method based on a cross attention mechanism, which comprises the following steps:
s1, data acquisition: downloading a sample picture in a natural scene, and synthesizing a training set for the picture by using a disclosed code, wherein the training set comprises fonts and backgrounds;
s2, data processing: stretching all training sample pictures, wherein the sizes of the processed sample pictures are 32 x 100, the aspect ratio is consistent with that of the original picture, and the insufficient parts are filled by black edges;
s3, label manufacturing: training the recognition model by adopting a supervised method, so that each line of text pictures has corresponding text information;
s3.1, synthesizing a line text picture by the disclosed codes and text corpus;
s3.2, storing the text content in each text picture in a corresponding text file;
s3.3, randomly dividing the synthesized line text pictures into a training set and a verification set.
S4, training a network: inputting the data and the labels of the training pictures in the step S2 into a cross attention network for training, wherein the cross attention network consists of a vertical attention network and a horizontal attention network which are connected in series for training;
s4.1, constructing a convolutional neural network taking a convolutional module as a basic unit: the input picture first convolution block first convolution layer second convolution block second convolution layer third convolution block fourth convolution layer third convolution layer; the convolutional neural network downsamples the features through convolutional layers, and the downsampling multiple of each convolutional layer is 2;
the convolution module is represented as the following calculation process, which is participated by the convolution layer: input feature map first convolutional layer first feature map second convolutional layer second feature map third convolutional layer;
performing numerical addition operation on the characteristic diagram output by the first convolution layer and the characteristic diagram output by the third convolution layer of the convolution modules to obtain the output characteristic diagram of the convolution module, wherein each convolution module does not perform downsampling on the characteristic diagram;
s4.2, constructing a cross attention network, wherein the cross attention network utilizes asymmetric convolution to calculate weight vectors occupied by feature vectors in each sub-feature map of the two-dimensional feature map
α j ={α 1,j ,α 2,j ,...,α n,j }:
Figure GDA0004016949870000081
Wherein H represents the height of the feature map, g i,j The weight occupied by the j-th column vector in the i-th row of the two-dimensional feature map in the j-th column is calculated by the vertical attention network:
g i,j =conv 1×1 (conv 3×1 (X j )+X j ).
wherein conv h×w (X) represents the calculation of the convolution check feature pattern X with the height h and width w, X j Representing a j-th column sub-feature map in a two-dimensional feature map generated by the convolutional neural network;
after the weights occupied by the feature vectors of the sub-feature graphs segmented by columns are calculated, each sub-feature graph carries out weighted summation according to the weights of the feature vectors at the corresponding positions to obtain a feature vector f v,j
Figure GDA0004016949870000082
Feature vector f generated by all sub-feature graphs v,j Spliced into a characteristic sequence f v ,f v After passing through a BLSTM network composed of a two-way long-short-time memory model, a feature sequence f 'with context features is obtained' v The characteristic sequence f' v Feeding into a horizontal attention network, which at each point in time extracts a confidence probability distribution y containing the character currently to be recognized t
y t =softmax(ψ(h t ))
Wherein psi represents a full connectionA layer for decoding the decoded vector h t The dimension of (2) is reduced to the target character number, h t Context vector ctx by horizontal attention network t Word embedding vector emb of character decoded by last time node t-1 And (3) sending the obtained product into a gate control circulation unit to obtain:
h t =GRU([ctx t ;emb t-1 ],h t-1 ),
wherein GRU represents GRU network operation, h t-1 Hidden layer vector, context vector ctx, representing GRU network output at last point in time t The method is characterized by comprising the following steps of:
Figure GDA0004016949870000091
beta, beta j Representing the weight of the j-th feature vector in the feature sequence, and using the full connection layer to perform the feature sequence f' v And vector h used for decoding the previous character t-1 And (3) obtaining:
Figure GDA0004016949870000092
e t,j =W T tanh(Qf′ v +Vh t-1 +b)
wherein W is T Q, V, b represent the weight value of training; selecting y t The character corresponding to the value with the highest confidence coefficient in the middle is obtained to obtain the current decoding output character c t The method comprises the steps of carrying out a first treatment on the surface of the Wherein W is T Q, V, b represent trainable weight values; selecting y t The character corresponding to the value with the highest confidence coefficient in the middle is obtained to obtain the current decoding output character c t
S4.3, training parameter setting: the training data is sent into a network for training, the network traverses the training data set for 10 times, the used optimization algorithm is a self-adaptive gradient descent method, the initial learning rate is 1, the learning rate is manually adjusted to 0.1 after the network traverses the training set for 5 times, then the training is continued, and the network traverses the training set for 10 times again;
the Loss value Loss of network output can be calculated by the following formula:
Figure GDA0004016949870000101
where N represents the amount of data used for the batch optimization,
Figure GDA0004016949870000102
indicating that the character +.>
Figure GDA0004016949870000103
Probability of (2);
s4.4, initializing weight: the weight parameters in all networks are randomly initialized by using Gaussian noise at the initial stage of training;
s4.5, training a convolutional neural network: the probability that each character of the target character string is output at its corresponding time point is used as cross entropy, and the gradient descent method is used to minimize the cross entropy.
S5, inputting test data into the trained network, finally obtaining a recognition result, and predicting the confidence coefficient of each character according to the recognition result;
s5.1, inputting pictures and labels in the test set into a trained network for identification test;
s5.2, after identification is completed, calculating the accuracy rate by the program;
s5.3, randomly displaying the visual effect of the recognition process of 20 photos, wherein the character characteristics of each photo are selected by the horizontal attention network and the vertical attention network in a crossing way.
In the example shown in fig. 4, the recognition result of 5 more curved texts is displayed.
In the description of the present invention, it should be understood that the terms "longitudinal," "transverse," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like indicate or are based on the orientation or positional relationship shown in the drawings, merely to facilitate description of the present invention, and do not indicate or imply that the devices or elements referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus should not be construed as limiting the present invention. The above embodiments are only illustrative of the preferred embodiments of the present invention and are not intended to limit the scope of the present invention, and various modifications and improvements made by those skilled in the art to the technical solutions of the present invention should fall within the protection scope defined by the claims of the present invention without departing from the design spirit of the present invention.

Claims (5)

1. A natural scene text recognition method based on a cross attention mechanism is characterized by comprising the following steps of: the method comprises the following steps:
s1, data acquisition: downloading a sample picture in a natural scene, and synthesizing a training set for the picture by using a disclosed code, wherein the training set comprises fonts and backgrounds;
s2, data processing: stretching all training sample pictures, wherein the sizes of the processed sample pictures are 32 x 100, the aspect ratio is consistent with that of the original picture, and the insufficient parts are filled by black edges;
s3, label manufacturing: training the recognition model by adopting a supervised method, so that each line of text pictures has corresponding text information;
s4, training a network: inputting the data and the labels of the training pictures in the S2 into a cross attention network for training, wherein the cross attention network consists of a vertical attention network and a horizontal attention network which are connected in series for training;
s5, inputting test data into the trained network, finally obtaining a recognition result, and predicting the confidence coefficient of each character according to the recognition result;
the step S4 comprises the following steps:
s4.1, constructing a convolutional neural network taking a convolutional module as a basic unit;
s4.2, constructing a cross attention network, wherein the cross attention network utilizes asymmetric convolution to calculate weight vectors occupied by feature vectors in each sub-feature map of the two-dimensional feature map
α j ={α 1,j ,α 2,j ,...,α n,j }:
Figure FDA0004234595560000011
Wherein H represents the height of the feature map, g i,j The weight occupied by the j-th column vector in the i-th row of the two-dimensional feature map in the j-th column is calculated by the vertical attention network:
g i,j =conv 1×1 (conv 3×1 (X j )+X j )
wherein conv h×w (X) represents the calculation of the convolution check feature pattern X with the height h and width w, X j Representing a j-th column sub-feature map in a two-dimensional feature map generated by the convolutional neural network; after the weights occupied by the feature vectors of the sub-feature graphs segmented by columns are calculated, each sub-feature graph carries out weighted summation according to the weights of the feature vectors at the corresponding positions to obtain a feature vector f v,j
Figure FDA0004234595560000021
Feature vector f generated by all sub-feature graphs v,j Spliced into a characteristic sequence f v ,f v After a BLSTM network formed by a two-way long-short-time memory model, a characteristic sequence f 'with context characteristics is obtained' v The characteristic sequence f' v Feeding into a horizontal attention network, which at each point in time extracts a confidence probability distribution y containing the character currently to be recognized t
y t =softmax(ψ(h t ))
Wherein, psi represents the full-connection layer for decoding the vector h t The dimension of (2) is reduced to the target character number, h t Context vector ctx by horizontal attention network t And go upWord embedding vector emb of character decoded by time node t-1 And (3) sending the obtained product into a gate control circulation unit to obtain:
h t =GRU([ctx t ;emb t-1 ],h t-1 ),
wherein GRU represents GRU network operation, h t-1 Hidden layer vector, context vector ctx, representing GRU network output at last point in time t The method is characterized by comprising the following steps of:
Figure FDA0004234595560000022
beta, beta j Representing the weight of the j-th feature vector in the feature sequence, and using the full connection layer to perform the feature sequence f' v And vector h used for decoding the previous character t-1 And (3) obtaining:
Figure FDA0004234595560000023
e t,j =W T tanh(Qf′ v +Vh t-1 +b)
wherein W is T Q, V, b represent the weight value of training; selecting y t The character corresponding to the value with the highest confidence coefficient in the middle is obtained to obtain the current decoding output character c t
S4.3, training parameter setting: the training data is sent into a network for training, the network traverses the training data set for 10 times, the used optimization algorithm is a self-adaptive gradient descent method, the initial learning rate is 1, the learning rate is manually adjusted to 0.1 after the network traverses the training set for 5 times, then the training is continued, and the network traverses the training set for 10 times again;
the Loss value Loss of the network output is calculated by the following formula:
Figure FDA0004234595560000031
wherein N represents the present lotSub-optimizing the amount of data used, p (c) i,j |I (i) The method comprises the steps of carrying out a first treatment on the surface of the θ) means that the character c is outputted from the ith sample picture at the jth timing i,j Probability of (2);
s4.4, initializing weight: the weight parameters in all networks are randomly initialized by using Gaussian noise at the initial stage of training;
s4.5, training a convolutional neural network: the probability that each character of the target character string is output at its corresponding time point is used as cross entropy, and the gradient descent method is used to minimize the cross entropy.
2. The natural scene text recognition method based on a cross-attention mechanism of claim 1, wherein: the text of the training set in S1 covers a plurality of fonts and a plurality of backgrounds.
3. The method for recognizing natural scene text based on a cross-attention mechanism according to claim 1, wherein the step S2 is to stretch the training sample picture, wherein the height is stretched into 32 pixels, the width is stretched according to the original aspect ratio, and the portion with insufficient width is filled with black edges.
4. The natural scene text recognition method based on the cross-attention mechanism as recited in claim 1, wherein S3 includes the following:
s3.1, synthesizing a line text picture by the disclosed codes and text corpus;
s3.2, storing the text content in each text picture in a corresponding text file;
s3.3, randomly dividing the synthesized line text pictures into a training set and a verification set.
5. The natural scene text recognition method based on the cross-attention mechanism of claim 1, wherein S5 comprises:
s5.1, inputting pictures and labels in the test set into a trained network for identification test;
s5.2, after identification is completed, calculating the accuracy rate by the program;
s5.3, randomly displaying the visual effect of the recognition process of 20 photos, wherein the character characteristics of each photo are selected by the horizontal attention network and the vertical attention network in a crossing way.
CN201910517855.4A 2019-06-14 2019-06-14 Natural scene text recognition method based on cross attention mechanism Active CN110414498B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910517855.4A CN110414498B (en) 2019-06-14 2019-06-14 Natural scene text recognition method based on cross attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910517855.4A CN110414498B (en) 2019-06-14 2019-06-14 Natural scene text recognition method based on cross attention mechanism

Publications (2)

Publication Number Publication Date
CN110414498A CN110414498A (en) 2019-11-05
CN110414498B true CN110414498B (en) 2023-07-11

Family

ID=68359132

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910517855.4A Active CN110414498B (en) 2019-06-14 2019-06-14 Natural scene text recognition method based on cross attention mechanism

Country Status (1)

Country Link
CN (1) CN110414498B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111027553A (en) * 2019-12-23 2020-04-17 武汉唯理科技有限公司 Character recognition method for circular seal
CN111160341B (en) * 2019-12-27 2023-04-07 华南理工大学 Scene Chinese text recognition method based on double-attention-machine mechanism
CN111401373B (en) * 2020-03-04 2022-02-15 武汉大学 Efficient semantic segmentation method based on packet asymmetric convolution
CN111652130B (en) * 2020-06-02 2023-09-15 上海语识信息技术有限公司 Method for identifying number, symbol and letter group of non-specific font
CN111899292A (en) * 2020-06-15 2020-11-06 北京三快在线科技有限公司 Character recognition method and device, electronic equipment and storage medium
CN112115834A (en) * 2020-09-11 2020-12-22 昆明理工大学 Standard certificate photo detection method based on small sample matching network
CN112101355B (en) * 2020-09-25 2024-04-02 北京百度网讯科技有限公司 Method and device for detecting text in image, electronic equipment and computer medium
CN112149644A (en) * 2020-11-09 2020-12-29 西北工业大学 Two-dimensional attention mechanism text recognition method based on global feature guidance
CN113065432A (en) * 2021-03-23 2021-07-02 内蒙古工业大学 Handwritten Mongolian recognition method based on data enhancement and ECA-Net
CN113283336A (en) * 2021-05-21 2021-08-20 湖南大学 Text recognition method and system
CN113705713B (en) * 2021-09-03 2023-08-22 华南理工大学 Text recognition method based on global and local attention mechanisms
CN114399757A (en) * 2022-01-13 2022-04-26 福州大学 Natural scene text recognition method and system for multi-path parallel position correlation network
CN115187996B (en) * 2022-09-09 2023-01-06 中电科新型智慧城市研究院有限公司 Semantic recognition method and device, terminal equipment and storage medium

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111356997B (en) * 2017-08-03 2024-04-09 皇家飞利浦有限公司 Hierarchical neural network with granular attention
CN108829801B (en) * 2018-06-06 2020-11-20 大连理工大学 Event trigger word extraction method based on document level attention mechanism
CN111368565B (en) * 2018-09-05 2022-03-18 腾讯科技(深圳)有限公司 Text translation method, text translation device, storage medium and computer equipment
CN109492679A (en) * 2018-10-24 2019-03-19 杭州电子科技大学 Based on attention mechanism and the character recognition method for being coupled chronological classification loss
CN109753954A (en) * 2018-11-14 2019-05-14 安徽艾睿思智能科技有限公司 The real-time positioning identifying method of text based on deep learning attention mechanism
CN109543681A (en) * 2018-11-20 2019-03-29 中国石油大学(华东) Character recognition method under a kind of natural scene based on attention mechanism
CN109710919A (en) * 2018-11-27 2019-05-03 杭州电子科技大学 A kind of neural network event extraction method merging attention mechanism

Also Published As

Publication number Publication date
CN110414498A (en) 2019-11-05

Similar Documents

Publication Publication Date Title
CN110414498B (en) Natural scene text recognition method based on cross attention mechanism
CN110378334B (en) Natural scene text recognition method based on two-dimensional feature attention mechanism
CN109977918B (en) Target detection positioning optimization method based on unsupervised domain adaptation
US11164059B2 (en) Two-dimensional code image generation method and apparatus, storage medium and electronic device
CN107368831B (en) English words and digit recognition method in a kind of natural scene image
CN107818314B (en) Face image processing method, device and server
CN112215280B (en) Small sample image classification method based on meta-backbone network
CN108764195A (en) Handwriting model training method, hand-written character recognizing method, device, equipment and medium
CN110533737A (en) The method generated based on structure guidance Chinese character style
CN112784763B (en) Expression recognition method and system based on local and overall feature adaptive fusion
CN105205448A (en) Character recognition model training method based on deep learning and recognition method thereof
CN110287777B (en) Golden monkey body segmentation algorithm in natural scene
CN111428727B (en) Natural scene text recognition method based on sequence transformation correction and attention mechanism
CN109086653A (en) Handwriting model training method, hand-written character recognizing method, device, equipment and medium
CN109359608A (en) A kind of face identification method based on deep learning model
RU2665273C2 (en) Trained visual markers and the method of their production
CN107832292A (en) A kind of conversion method based on the image of neural network model to Chinese ancient poetry
CN110363770A (en) A kind of training method and device of the infrared semantic segmentation model of margin guide formula
CN109840512A (en) A kind of Facial action unit recognition methods and identification device
CN109753897A (en) Based on memory unit reinforcing-time-series dynamics study Activity recognition method
CN107038419A (en) A kind of personage's behavior method for recognizing semantics based on video sequence deep learning
CN116597136A (en) Semi-supervised remote sensing image semantic segmentation method and system
CN109508640A (en) A kind of crowd's sentiment analysis method, apparatus and storage medium
Salem et al. Semantic image inpainting using self-learning encoder-decoder and adversarial loss
CN108985442A (en) Handwriting model training method, hand-written character recognizing method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Luo Canjie

Inventor after: Jin Lianwen

Inventor after: Huang Yunlong

Inventor after: Lin Qingxiang

Inventor after: Zhou Weiying

Inventor before: Huang Yunlong

Inventor before: Jin Lianwen

Inventor before: Luo Canjie

Inventor before: Lin Qingxiang

Inventor before: Zhou Weiying

GR01 Patent grant
GR01 Patent grant