CN110414498B - Natural scene text recognition method based on cross attention mechanism - Google Patents
Natural scene text recognition method based on cross attention mechanism Download PDFInfo
- Publication number
- CN110414498B CN110414498B CN201910517855.4A CN201910517855A CN110414498B CN 110414498 B CN110414498 B CN 110414498B CN 201910517855 A CN201910517855 A CN 201910517855A CN 110414498 B CN110414498 B CN 110414498B
- Authority
- CN
- China
- Prior art keywords
- training
- network
- text
- feature
- character
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/22—Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Abstract
The invention discloses a natural scene text recognition method based on a cross attention mechanism, which comprises the following steps of data acquisition: downloading a sample picture in a natural scene, and synthesizing a training set for the picture by using a public code; and (3) data processing: stretching all training sample pictures, wherein the sizes of the processed sample pictures are 32 x 100, the aspect ratio is consistent with that of the original picture, and the insufficient parts are filled by black edges; and (3) label manufacturing: training the recognition model by adopting a supervised method, so that each line of text pictures has corresponding text information; training network: inputting the prepared training picture data and labels into a cross attention network for training, wherein the cross attention network consists of a vertical attention network and a horizontal attention network; and inputting test data into a trained network, and finally obtaining a recognition result and predicting the confidence coefficient of each character. The method has high recognition accuracy, strong robustness and good recognition performance for the irregular text.
Description
Technical Field
The invention belongs to the technical field of pattern recognition and artificial intelligence, and particularly relates to a natural scene text recognition method based on a cross attention mechanism.
Background
With the rapid development of computer technology, artificial intelligence technology is gradually changing our lives, so that our lives become more convenient and efficient. The rapid development of hardware technologies such as GPU in recent years also makes practical application of deep neural networks possible.
In real life, we are free from text. Human visual information is largely carried by text. In the past or future, people rely heavily on obtaining information from words, and the word information is obtained, so that it is a crucial step to correctly identify words. It is simple for humans to recognize text from a picture, but it is not an easy task for computers. If a computer is needed to assist a human in understanding the information in the drawing, the computer is first required to correctly identify the text from the drawing. The characters existing in the natural scene are rich and varied in background, and often due to some artistic effects, the arrangement shape of the fonts is irregular, such as a curved shape, which greatly increases the difficulty of recognizing the text from the picture by a computer, and various factors make the recognition of the natural scene text difficult and heavy, so that a method for recognizing the natural scene text more effectively is urgently needed.
The research progress of the deep neural network provides tools for us, and recently, researchers propose various methods for recognizing natural scene texts by using the deep neural network, wherein the attention mechanism-based method greatly improves the recognition rate of a model based on the method by a special decoding mode and semantic deduction characteristics, and the recognition network based on the attention mechanism is currently applied to a plurality of text recognition systems, but the traditional scene text recognition method based on the attention mechanism directly compresses an original picture into a feature picture with the height of 1, so that noise is introduced to the feature picture for decoding, thereby influencing the recognition result.
Disclosure of Invention
The invention aims to provide a natural scene text recognition method based on a cross attention mechanism, which solves the problems in the prior art and enables the text with irregular arrangement shape to be correctly recognized.
In order to achieve the above object, the present invention provides the following solutions: the invention provides a natural scene text recognition method based on a cross attention mechanism, which comprises the following steps:
s1, data acquisition: downloading a sample picture in a natural scene, and synthesizing a training set for the picture by using a disclosed code, wherein the training set comprises fonts and backgrounds;
s2, data processing: stretching all training sample pictures, wherein the sizes of the processed sample pictures are 32 x 100, the aspect ratio is consistent with that of the original picture, and the insufficient parts are filled by black edges;
s3, label manufacturing: training the recognition model by adopting a supervised method, so that each line of text pictures has corresponding text information;
s4, training a network: inputting the data and the labels of the training pictures in the step S2 into a cross attention network for training, wherein the cross attention network consists of a vertical attention network and a horizontal attention network which are connected in series for training;
s5, inputting test data into the trained network, finally obtaining a recognition result, and predicting the confidence coefficient of each character according to the recognition result.
Preferably, the text of the training set in the step S1 covers multiple fonts and multiple backgrounds.
Preferably, in the step S2, the training sample picture is stretched, wherein the height is stretched into 32 pixels, the width is stretched according to the original aspect ratio, and the portion with insufficient width is filled with black edges.
Preferably, the step S3 includes the steps of:
s3.1, synthesizing a line text picture by the disclosed codes and text corpus;
s3.2, storing the text content in each text picture in a corresponding text file;
s3.3, randomly dividing the synthesized line text pictures into a training set and a verification set.
Preferably, the step S4 includes:
s4.1, constructing a convolutional neural network taking a convolutional module as a basic unit;
s4.2, constructing a cross attention network, wherein the cross attention network utilizes asymmetric convolution to calculate that a weight vector occupied by a feature vector in each sub-feature map of the two-dimensional feature map is alpha j ={α 1,j ,α 2,j ,...,α n,j }:
Wherein H represents the height of the feature map, g i,j The weight occupied by the j-th column vector in the i-th row of the two-dimensional feature map in the j-th column is calculated by the vertical attention network:
g i,j =conv 1×1 (conv 3×1 (X j )+X j ).
wherein conv h×w (X) represents the calculation of the convolution check feature pattern X with the height h and width w, X j Representing a j-th column sub-feature map in a two-dimensional feature map generated by the convolutional neural network;
after the weights occupied by the feature vectors of the sub-feature graphs segmented by columns are calculated, each sub-feature graph carries out weighted summation according to the weights of the feature vectors at the corresponding positions to obtain a feature vector f v,j :
Feature vector f generated by all sub-feature graphs v,j Spliced into a characteristic sequence f v ,f v After passing through a BLSTM network composed of a two-way long-short-time memory model, a feature sequence f 'with context features is obtained' v The characteristic sequence f' v Feeding inA horizontal attention network that extracts, at each point in time, a confidence probability distribution y containing the character currently to be recognized t :
y t =softmax(ψ(h t ))
Wherein, psi represents the full-connection layer for decoding the vector h t The dimension of (2) is reduced to the target character number, h t Context vector ctx by horizontal attention network t Word embedding vector emb of character decoded by last time node t-1 And (3) sending the obtained product into a gate control circulation unit to obtain:
h t =GRU([ctx t ;emb t-1 ],h t-1 ),
wherein GRU represents GRU network operation, h t-1 Hidden layer vector, context vector ctx, representing GRU network output at last point in time t The method is characterized by comprising the following steps of:
beta, beta j Representing the weight of the j-th feature vector in the feature sequence, and using the full connection layer to perform the feature sequence f' v And vector h used for decoding the previous character t-1 And (3) obtaining:
e t,j =W T tanh(Qf′ v +Vh t-1 +b)
wherein W is T Q, V, b represent the weight value of training; selecting y t The character corresponding to the value with the highest confidence coefficient in the middle is obtained to obtain the current decoding output character c t ;
S4.3, training parameter setting: the training data is sent into a network for training, the network traverses the training data set for 10 times, the used optimization algorithm is a self-adaptive gradient descent method, the initial learning rate is 1, the learning rate is manually adjusted to 0.1 after the network traverses the training set for 5 times, then the training is continued, and the network traverses the training set for 10 times again;
the Loss value Loss of network output can be calculated by the following formula:
where N represents the amount of data used for the batch optimization,indicating that the character +.>Probability of (2);
s4.4, initializing weight: the weight parameters in all networks are randomly initialized by using Gaussian noise at the initial stage of training;
s4.5, training a convolutional neural network: the probability that each character of the target character string is output at its corresponding time point is used as cross entropy, and the gradient descent method is used to minimize the cross entropy.
Preferably, the step S5 includes:
s5.1, inputting pictures and labels in the test set into a trained network for identification test;
s5.2, after identification is completed, calculating the accuracy rate by the program;
s5.3, randomly displaying the visual effect of the recognition process of 20 photos, wherein the character characteristics of each photo are selected by the horizontal attention network and the vertical attention network in a crossing way.
The invention discloses the following technical effects:
1. due to the adoption of the automatic learning recognition algorithm of the deep network structure, effective expression can be well learned from data, and the recognition accuracy is improved.
2. The invention adopts end-to-end training, does not need to mark the position of each character, and saves the marking cost.
3. The classification method has high recognition accuracy, strong robustness and good recognition performance for the irregular text.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the drawings that are needed in the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a flow chart of data acquisition and processing in accordance with the present invention;
FIG. 3 is a natural scene text recognition flow chart of the present invention;
FIG. 4 is an example of the result of an attention heat map in the identification process of the present invention;
fig. 5 is a table of the deep convolutional neural network structure and parameter configuration of the present invention.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
In order that the above-recited objects, features and advantages of the present application will become more readily apparent, a more particular description of the invention briefly described above will be rendered by reference to specific embodiments that are illustrated in the appended drawings.
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.
Referring to fig. 1-5, the invention provides a natural scene text recognition method based on a cross attention mechanism, which comprises the following steps:
s1, data acquisition: downloading a sample picture in a natural scene, and synthesizing a training set for the picture by using a disclosed code, wherein the training set comprises fonts and backgrounds;
s2, data processing: stretching all training sample pictures, wherein the sizes of the processed sample pictures are 32 x 100, the aspect ratio is consistent with that of the original picture, and the insufficient parts are filled by black edges;
s3, label manufacturing: training the recognition model by adopting a supervised method, so that each line of text pictures has corresponding text information;
s3.1, synthesizing a line text picture by the disclosed codes and text corpus;
s3.2, storing the text content in each text picture in a corresponding text file;
s3.3, randomly dividing the synthesized line text pictures into a training set and a verification set.
S4, training a network: inputting the data and the labels of the training pictures in the step S2 into a cross attention network for training, wherein the cross attention network consists of a vertical attention network and a horizontal attention network which are connected in series for training;
s4.1, constructing a convolutional neural network taking a convolutional module as a basic unit: the input picture first convolution block first convolution layer second convolution block second convolution layer third convolution block fourth convolution layer third convolution layer; the convolutional neural network downsamples the features through convolutional layers, and the downsampling multiple of each convolutional layer is 2;
the convolution module is represented as the following calculation process, which is participated by the convolution layer: input feature map first convolutional layer first feature map second convolutional layer second feature map third convolutional layer;
performing numerical addition operation on the characteristic diagram output by the first convolution layer and the characteristic diagram output by the third convolution layer of the convolution modules to obtain the output characteristic diagram of the convolution module, wherein each convolution module does not perform downsampling on the characteristic diagram;
s4.2, constructing a cross attention network, wherein the cross attention network utilizes asymmetric convolution to calculate weight vectors occupied by feature vectors in each sub-feature map of the two-dimensional feature map
α j ={α 1,j ,α 2,j ,...,α n,j }:
Wherein H represents the height of the feature map, g i,j The weight occupied by the j-th column vector in the i-th row of the two-dimensional feature map in the j-th column is calculated by the vertical attention network:
g i,j =conv 1×1 (conv 3×1 (X j )+X j ).
wherein conv h×w (X) represents the calculation of the convolution check feature pattern X with the height h and width w, X j Representing a j-th column sub-feature map in a two-dimensional feature map generated by the convolutional neural network;
after the weights occupied by the feature vectors of the sub-feature graphs segmented by columns are calculated, each sub-feature graph carries out weighted summation according to the weights of the feature vectors at the corresponding positions to obtain a feature vector f v,j :
Feature vector f generated by all sub-feature graphs v,j Spliced into a characteristic sequence f v ,f v After passing through a BLSTM network composed of a two-way long-short-time memory model, a feature sequence f 'with context features is obtained' v The characteristic sequence f' v Feeding into a horizontal attention network, which at each point in time extracts a confidence probability distribution y containing the character currently to be recognized t :
y t =softmax(ψ(h t ))
Wherein psi represents a full connectionA layer for decoding the decoded vector h t The dimension of (2) is reduced to the target character number, h t Context vector ctx by horizontal attention network t Word embedding vector emb of character decoded by last time node t-1 And (3) sending the obtained product into a gate control circulation unit to obtain:
h t =GRU([ctx t ;emb t-1 ],h t-1 ),
wherein GRU represents GRU network operation, h t-1 Hidden layer vector, context vector ctx, representing GRU network output at last point in time t The method is characterized by comprising the following steps of:
beta, beta j Representing the weight of the j-th feature vector in the feature sequence, and using the full connection layer to perform the feature sequence f' v And vector h used for decoding the previous character t-1 And (3) obtaining:
e t,j =W T tanh(Qf′ v +Vh t-1 +b)
wherein W is T Q, V, b represent the weight value of training; selecting y t The character corresponding to the value with the highest confidence coefficient in the middle is obtained to obtain the current decoding output character c t The method comprises the steps of carrying out a first treatment on the surface of the Wherein W is T Q, V, b represent trainable weight values; selecting y t The character corresponding to the value with the highest confidence coefficient in the middle is obtained to obtain the current decoding output character c t ;
S4.3, training parameter setting: the training data is sent into a network for training, the network traverses the training data set for 10 times, the used optimization algorithm is a self-adaptive gradient descent method, the initial learning rate is 1, the learning rate is manually adjusted to 0.1 after the network traverses the training set for 5 times, then the training is continued, and the network traverses the training set for 10 times again;
the Loss value Loss of network output can be calculated by the following formula:
where N represents the amount of data used for the batch optimization,indicating that the character +.>Probability of (2);
s4.4, initializing weight: the weight parameters in all networks are randomly initialized by using Gaussian noise at the initial stage of training;
s4.5, training a convolutional neural network: the probability that each character of the target character string is output at its corresponding time point is used as cross entropy, and the gradient descent method is used to minimize the cross entropy.
S5, inputting test data into the trained network, finally obtaining a recognition result, and predicting the confidence coefficient of each character according to the recognition result;
s5.1, inputting pictures and labels in the test set into a trained network for identification test;
s5.2, after identification is completed, calculating the accuracy rate by the program;
s5.3, randomly displaying the visual effect of the recognition process of 20 photos, wherein the character characteristics of each photo are selected by the horizontal attention network and the vertical attention network in a crossing way.
In the example shown in fig. 4, the recognition result of 5 more curved texts is displayed.
In the description of the present invention, it should be understood that the terms "longitudinal," "transverse," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like indicate or are based on the orientation or positional relationship shown in the drawings, merely to facilitate description of the present invention, and do not indicate or imply that the devices or elements referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus should not be construed as limiting the present invention. The above embodiments are only illustrative of the preferred embodiments of the present invention and are not intended to limit the scope of the present invention, and various modifications and improvements made by those skilled in the art to the technical solutions of the present invention should fall within the protection scope defined by the claims of the present invention without departing from the design spirit of the present invention.
Claims (5)
1. A natural scene text recognition method based on a cross attention mechanism is characterized by comprising the following steps of: the method comprises the following steps:
s1, data acquisition: downloading a sample picture in a natural scene, and synthesizing a training set for the picture by using a disclosed code, wherein the training set comprises fonts and backgrounds;
s2, data processing: stretching all training sample pictures, wherein the sizes of the processed sample pictures are 32 x 100, the aspect ratio is consistent with that of the original picture, and the insufficient parts are filled by black edges;
s3, label manufacturing: training the recognition model by adopting a supervised method, so that each line of text pictures has corresponding text information;
s4, training a network: inputting the data and the labels of the training pictures in the S2 into a cross attention network for training, wherein the cross attention network consists of a vertical attention network and a horizontal attention network which are connected in series for training;
s5, inputting test data into the trained network, finally obtaining a recognition result, and predicting the confidence coefficient of each character according to the recognition result;
the step S4 comprises the following steps:
s4.1, constructing a convolutional neural network taking a convolutional module as a basic unit;
s4.2, constructing a cross attention network, wherein the cross attention network utilizes asymmetric convolution to calculate weight vectors occupied by feature vectors in each sub-feature map of the two-dimensional feature map
α j ={α 1,j ,α 2,j ,...,α n,j }:
Wherein H represents the height of the feature map, g i,j The weight occupied by the j-th column vector in the i-th row of the two-dimensional feature map in the j-th column is calculated by the vertical attention network:
g i,j =conv 1×1 (conv 3×1 (X j )+X j )
wherein conv h×w (X) represents the calculation of the convolution check feature pattern X with the height h and width w, X j Representing a j-th column sub-feature map in a two-dimensional feature map generated by the convolutional neural network; after the weights occupied by the feature vectors of the sub-feature graphs segmented by columns are calculated, each sub-feature graph carries out weighted summation according to the weights of the feature vectors at the corresponding positions to obtain a feature vector f v,j :
Feature vector f generated by all sub-feature graphs v,j Spliced into a characteristic sequence f v ,f v After a BLSTM network formed by a two-way long-short-time memory model, a characteristic sequence f 'with context characteristics is obtained' v The characteristic sequence f' v Feeding into a horizontal attention network, which at each point in time extracts a confidence probability distribution y containing the character currently to be recognized t :
y t =softmax(ψ(h t ))
Wherein, psi represents the full-connection layer for decoding the vector h t The dimension of (2) is reduced to the target character number, h t Context vector ctx by horizontal attention network t And go upWord embedding vector emb of character decoded by time node t-1 And (3) sending the obtained product into a gate control circulation unit to obtain:
h t =GRU([ctx t ;emb t-1 ],h t-1 ),
wherein GRU represents GRU network operation, h t-1 Hidden layer vector, context vector ctx, representing GRU network output at last point in time t The method is characterized by comprising the following steps of:
beta, beta j Representing the weight of the j-th feature vector in the feature sequence, and using the full connection layer to perform the feature sequence f' v And vector h used for decoding the previous character t-1 And (3) obtaining:
e t,j =W T tanh(Qf′ v +Vh t-1 +b)
wherein W is T Q, V, b represent the weight value of training; selecting y t The character corresponding to the value with the highest confidence coefficient in the middle is obtained to obtain the current decoding output character c t ;
S4.3, training parameter setting: the training data is sent into a network for training, the network traverses the training data set for 10 times, the used optimization algorithm is a self-adaptive gradient descent method, the initial learning rate is 1, the learning rate is manually adjusted to 0.1 after the network traverses the training set for 5 times, then the training is continued, and the network traverses the training set for 10 times again;
the Loss value Loss of the network output is calculated by the following formula:
wherein N represents the present lotSub-optimizing the amount of data used, p (c) i,j |I (i) The method comprises the steps of carrying out a first treatment on the surface of the θ) means that the character c is outputted from the ith sample picture at the jth timing i,j Probability of (2);
s4.4, initializing weight: the weight parameters in all networks are randomly initialized by using Gaussian noise at the initial stage of training;
s4.5, training a convolutional neural network: the probability that each character of the target character string is output at its corresponding time point is used as cross entropy, and the gradient descent method is used to minimize the cross entropy.
2. The natural scene text recognition method based on a cross-attention mechanism of claim 1, wherein: the text of the training set in S1 covers a plurality of fonts and a plurality of backgrounds.
3. The method for recognizing natural scene text based on a cross-attention mechanism according to claim 1, wherein the step S2 is to stretch the training sample picture, wherein the height is stretched into 32 pixels, the width is stretched according to the original aspect ratio, and the portion with insufficient width is filled with black edges.
4. The natural scene text recognition method based on the cross-attention mechanism as recited in claim 1, wherein S3 includes the following:
s3.1, synthesizing a line text picture by the disclosed codes and text corpus;
s3.2, storing the text content in each text picture in a corresponding text file;
s3.3, randomly dividing the synthesized line text pictures into a training set and a verification set.
5. The natural scene text recognition method based on the cross-attention mechanism of claim 1, wherein S5 comprises:
s5.1, inputting pictures and labels in the test set into a trained network for identification test;
s5.2, after identification is completed, calculating the accuracy rate by the program;
s5.3, randomly displaying the visual effect of the recognition process of 20 photos, wherein the character characteristics of each photo are selected by the horizontal attention network and the vertical attention network in a crossing way.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910517855.4A CN110414498B (en) | 2019-06-14 | 2019-06-14 | Natural scene text recognition method based on cross attention mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910517855.4A CN110414498B (en) | 2019-06-14 | 2019-06-14 | Natural scene text recognition method based on cross attention mechanism |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110414498A CN110414498A (en) | 2019-11-05 |
CN110414498B true CN110414498B (en) | 2023-07-11 |
Family
ID=68359132
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910517855.4A Active CN110414498B (en) | 2019-06-14 | 2019-06-14 | Natural scene text recognition method based on cross attention mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110414498B (en) |
Families Citing this family (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111027553A (en) * | 2019-12-23 | 2020-04-17 | 武汉唯理科技有限公司 | Character recognition method for circular seal |
CN111160341B (en) * | 2019-12-27 | 2023-04-07 | 华南理工大学 | Scene Chinese text recognition method based on double-attention-machine mechanism |
CN111401373B (en) * | 2020-03-04 | 2022-02-15 | 武汉大学 | Efficient semantic segmentation method based on packet asymmetric convolution |
CN111652130B (en) * | 2020-06-02 | 2023-09-15 | 上海语识信息技术有限公司 | Method for identifying number, symbol and letter group of non-specific font |
CN111899292A (en) * | 2020-06-15 | 2020-11-06 | 北京三快在线科技有限公司 | Character recognition method and device, electronic equipment and storage medium |
CN112115834A (en) * | 2020-09-11 | 2020-12-22 | 昆明理工大学 | Standard certificate photo detection method based on small sample matching network |
CN112101355B (en) * | 2020-09-25 | 2024-04-02 | 北京百度网讯科技有限公司 | Method and device for detecting text in image, electronic equipment and computer medium |
CN112149644A (en) * | 2020-11-09 | 2020-12-29 | 西北工业大学 | Two-dimensional attention mechanism text recognition method based on global feature guidance |
CN113065432A (en) * | 2021-03-23 | 2021-07-02 | 内蒙古工业大学 | Handwritten Mongolian recognition method based on data enhancement and ECA-Net |
CN113283336A (en) * | 2021-05-21 | 2021-08-20 | 湖南大学 | Text recognition method and system |
CN113705713B (en) * | 2021-09-03 | 2023-08-22 | 华南理工大学 | Text recognition method based on global and local attention mechanisms |
CN114399757A (en) * | 2022-01-13 | 2022-04-26 | 福州大学 | Natural scene text recognition method and system for multi-path parallel position correlation network |
CN115187996B (en) * | 2022-09-09 | 2023-01-06 | 中电科新型智慧城市研究院有限公司 | Semantic recognition method and device, terminal equipment and storage medium |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111356997B (en) * | 2017-08-03 | 2024-04-09 | 皇家飞利浦有限公司 | Hierarchical neural network with granular attention |
CN108829801B (en) * | 2018-06-06 | 2020-11-20 | 大连理工大学 | Event trigger word extraction method based on document level attention mechanism |
CN111368565B (en) * | 2018-09-05 | 2022-03-18 | 腾讯科技(深圳)有限公司 | Text translation method, text translation device, storage medium and computer equipment |
CN109492679A (en) * | 2018-10-24 | 2019-03-19 | 杭州电子科技大学 | Based on attention mechanism and the character recognition method for being coupled chronological classification loss |
CN109753954A (en) * | 2018-11-14 | 2019-05-14 | 安徽艾睿思智能科技有限公司 | The real-time positioning identifying method of text based on deep learning attention mechanism |
CN109543681A (en) * | 2018-11-20 | 2019-03-29 | 中国石油大学(华东) | Character recognition method under a kind of natural scene based on attention mechanism |
CN109710919A (en) * | 2018-11-27 | 2019-05-03 | 杭州电子科技大学 | A kind of neural network event extraction method merging attention mechanism |
-
2019
- 2019-06-14 CN CN201910517855.4A patent/CN110414498B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN110414498A (en) | 2019-11-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110414498B (en) | Natural scene text recognition method based on cross attention mechanism | |
CN110378334B (en) | Natural scene text recognition method based on two-dimensional feature attention mechanism | |
CN109977918B (en) | Target detection positioning optimization method based on unsupervised domain adaptation | |
US11164059B2 (en) | Two-dimensional code image generation method and apparatus, storage medium and electronic device | |
CN107368831B (en) | English words and digit recognition method in a kind of natural scene image | |
CN107818314B (en) | Face image processing method, device and server | |
CN112215280B (en) | Small sample image classification method based on meta-backbone network | |
CN108764195A (en) | Handwriting model training method, hand-written character recognizing method, device, equipment and medium | |
CN110533737A (en) | The method generated based on structure guidance Chinese character style | |
CN112784763B (en) | Expression recognition method and system based on local and overall feature adaptive fusion | |
CN105205448A (en) | Character recognition model training method based on deep learning and recognition method thereof | |
CN110287777B (en) | Golden monkey body segmentation algorithm in natural scene | |
CN111428727B (en) | Natural scene text recognition method based on sequence transformation correction and attention mechanism | |
CN109086653A (en) | Handwriting model training method, hand-written character recognizing method, device, equipment and medium | |
CN109359608A (en) | A kind of face identification method based on deep learning model | |
RU2665273C2 (en) | Trained visual markers and the method of their production | |
CN107832292A (en) | A kind of conversion method based on the image of neural network model to Chinese ancient poetry | |
CN110363770A (en) | A kind of training method and device of the infrared semantic segmentation model of margin guide formula | |
CN109840512A (en) | A kind of Facial action unit recognition methods and identification device | |
CN109753897A (en) | Based on memory unit reinforcing-time-series dynamics study Activity recognition method | |
CN107038419A (en) | A kind of personage's behavior method for recognizing semantics based on video sequence deep learning | |
CN116597136A (en) | Semi-supervised remote sensing image semantic segmentation method and system | |
CN109508640A (en) | A kind of crowd's sentiment analysis method, apparatus and storage medium | |
Salem et al. | Semantic image inpainting using self-learning encoder-decoder and adversarial loss | |
CN108985442A (en) | Handwriting model training method, hand-written character recognizing method, device, equipment and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB03 | Change of inventor or designer information | ||
CB03 | Change of inventor or designer information |
Inventor after: Luo Canjie Inventor after: Jin Lianwen Inventor after: Huang Yunlong Inventor after: Lin Qingxiang Inventor after: Zhou Weiying Inventor before: Huang Yunlong Inventor before: Jin Lianwen Inventor before: Luo Canjie Inventor before: Lin Qingxiang Inventor before: Zhou Weiying |
|
GR01 | Patent grant | ||
GR01 | Patent grant |