CN110674777A - Optical character recognition method in patent text scene - Google Patents

Optical character recognition method in patent text scene Download PDF

Info

Publication number
CN110674777A
CN110674777A CN201910940612.1A CN201910940612A CN110674777A CN 110674777 A CN110674777 A CN 110674777A CN 201910940612 A CN201910940612 A CN 201910940612A CN 110674777 A CN110674777 A CN 110674777A
Authority
CN
China
Prior art keywords
text
lstm
output
network model
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910940612.1A
Other languages
Chinese (zh)
Inventor
饶云波
郭毅
程亦茗
张孟涵
王艺霖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201910940612.1A priority Critical patent/CN110674777A/en
Publication of CN110674777A publication Critical patent/CN110674777A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Character Discrimination (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the technical field of computer vision, image processing and convolutional neural networks, and particularly relates to an optical character recognition method in a patent text scene. The invention combines the CNN and the LSTM, has the advantages of the CNN and the LSTM, and solves the problems that the CNN has weak processing on the sequence correlation and the LSTM has insufficient extraction on the image characteristics. The invention combines a new lost function computing method CTC to solve the problem that the sample data is difficult to align in the text recognition process in a mode of not needing alignment.

Description

Optical character recognition method in patent text scene
Technical Field
The invention belongs to the technical field of computer vision, image processing and convolutional neural networks, and particularly relates to an optical character recognition method in a patent text scene.
Background
With the continuous update of computer hardware and software and the gradual aging of Artificial Intelligence (AI), the deep learning is applied to the field of optical character recognition, which has very practical significance. The optical character recognition is to convert the characters of various bills, newspapers, books, manuscripts and other printed matters into image information by means of optical input methods such as scanning and the like, and then to convert the image information into information which can be recognized by a computer by utilizing a character recognition technology. Because of the influence factors including the habit of the writer, the printing quality of the document, the scanning quality of the scanner, the recognition method, the learning and testing samples, etc., the accuracy of the method is affected. From image to result output, the character with wrong recognition is corrected by image input, image pre-processing, character feature extraction, comparison and recognition, and manual correction, and the result is finally output.
The OCR technology has wide application prospect, the algorithm of the current text recognition is already applied in the industry, and a plurality of software aiming at the optical character recognition are provided in the market, so that the optical character field has great application value.
Current OCR technologies can be classified into two categories according to feature extraction methods:
(1) the traditional method comprises the following steps: firstly, a method based on connected domain analysis is used for positioning the text position in the picture, then row and column segmentation is carried out through binarization, row and column projection analysis and rules, and finally output is obtained through semantic error correction. The disadvantages are mainly: (1) it takes a lot of time to extract features, and usually artificially designed features (such as histogram of oriented gradient, etc.) are used to train a character recognition model, and the generalization capability of such single features is rapidly reduced when the font is changed. (2) The accuracy is seriously reduced under the conditions of overlapping and noise interference due to excessive dependence on the character segmentation result. (3) Generally, a good effect can be obtained only in a simple scene, and the effect is poor in a complex scene.
(2) Deep learning based optical character recognition: training of character recognition engines is a typical image classification problem. The current deep learning-based method utilizes the advantages of CNN in the aspect of extracting high-level semantics of images and the advantages of LSTM in processing time sequences, abandons the way of matching manually designed features and design templates, and carries out an End-to-End (End to End) recognition network model through a neural network, the recognition effect of the recognition network model can reach over 90% generally in a simple scene, and compared with the traditional method, the text recognition effect in a complex scene is improved more remarkably. However, the parameters are too many, the calculation amount is too large, a deeper network structure is often required to be constructed to realize accurate feature extraction, and the problem of gradient disappearance exists in the too-deep network structure. Since text information generally has a pre-and post-sequence correlation, CNN is much weaker than LSTM in extracting sequence correlation features. LSTM can handle feature extraction for existing time series, however, traditional LSTM can only handle short term memory because a sequence that is too long results in the disappearance of the gradient.
Disclosure of Invention
The invention aims to realize efficient and accurate recognition of texts in a patent scene and improve the automation degree of patent entry by fully utilizing the effectiveness of LSTM (one of RNN) in processing and predicting events with time sequences and the advantage of CNN in extracting deep semantics based on deep learning.
The technical scheme of the invention mainly comprises two parts, wherein the first step is to build and train a network model, the whole network model is divided into a text detection network and a text recognition network, the second step is to recognize by using the network model, and an overall algorithm frame diagram is shown as an attached figure 1. The method is realized by the following steps:
preparing a sample set, namely preparing a sample set by taking patent text pictures in tif format as the sample set, wherein the patent text pictures contain Chinese, English, numbers and punctuations, and simultaneously performing data enhancement by image processing methods such as stretching, blurring, random cutting, perspective transformation, reverse color and the like to obtain the sample set.
A deep neural network model is built, the whole network model is built by CNN and Bi-LSTM (long-short term memory neural network), a text region is generated first and then a detection result is generated, and the network structure is shown as an attached figure 2.
A text detection network model is characterized in that a brand-new infrastructure network is built by using 3 convolution layers and 3 compression Excitation modules (SE blocks), each compression Excitation module comprises two output branches, one branch does not carry out any treatment, the other branch passes through a pooling layer, a full connection layer, a Relu Excitation layer, a full connection layer and a sigmoid Excitation layer, and finally two branch results are added and then output. When the new network is used for calculation, different weights are given to the features of each channel, so that the feature extraction is more consistent with the application of an actual scene. The infrastructure network is shown in figure 3.
The text detection network is a problem in the field of target detection. Lower layer networks are better able to feel small targets and higher layer networks are better able to feel large targets, including context. Therefore, the feature extraction network takes a plurality of feature outputs into consideration during design, and forms a multi-scale feature extraction network. In practical problems, the extracted features of different channels should not have the same weight, so in the network extraction process, we set the features of different channels to have different weight outputs.
And (3) a text recognition network model is built by using Bi-LSTM and CNN, and a CTC algorithm is used for replacing the traditional smoothLoss loss function. The network is built using 4 depth separable modules and 1 Bi-LSTM module. The input data is a text sequence picture output by a text detection network, firstly, feature extraction is carried out through a depth separable module, the feature sequence is input into Bi-LSTM to carry out frame sequence prediction, then translation is carried out through CTC, and finally output is carried out. The structure of the text recognition network model is shown in fig. 4.
And training the network model by using the data set, and iteratively updating the network parameters to obtain an optimal model.
The model training comprises two parts, namely training of a text detection network and training of a text recognition network.
Text detection network training:
1. through forward propagation, text picture feature information is fully extracted by a convolution module, and the size of a feature map provided by a basic network module is W, H and C. W is the feature map width, H is the feature map height, and C is the number of channels output.
2. After C3X 3 convolution kernels, the data are input into a Bi-LSTM network to obtain W X256 dimensional output. And then through a 512-dimensional fully connected layer. And outputting, wherein the output layer is divided into 2 parts, the first part is subjected to coordinate regression by using 512 x (4+10), 512 represents that each point has 512 feature numbers, 10 represents that each point has 10 prediction box sizes, 10 candidate boxes with different scales are generated, 4 represents that one prediction box scale is described by a quadruple, and the quadruple represents coordinates of two points (xmin, xmax, ymin and ymax). The second part uses 512 × (2+10) for class prediction, 512 and 10 meaning the same as the first part, 2 indicating background or not.
3. A total of W × H × 10 prediction frames are generated for each picture, and the frames are deleted using an NMS (maximum suppression) method, with the threshold set to 0.7.
4. And calculating the offset of each candidate frame relative to the real frame for predicting frame regression.
5. Obtaining a final prediction frame according to the category score and the coordinates; the overall loss function consists of the addition of the classification loss function and the regression prediction function,
Figure BDA0002222787150000031
represents a function of the loss of classification,
Figure BDA0002222787150000032
representing the regression loss function, first part
Figure BDA0002222787150000033
Supervised learning of anchors using softmax function to learn whether text information is contained, siScore, s, representing the ith category *1 denotes whether or not the value is true; the second part
Figure BDA0002222787150000034
Is an L1smooth function and is used for learning the bias regression of anchors containing texts in the y direction, wherein vjFor the jth texted prediction box size, beta represents the task weight, NsAnd NvIs a normalization parameter, which represents the number of samples of the corresponding task; the formula is as follows:
Figure BDA0002222787150000035
Figure BDA0002222787150000041
Figure BDA0002222787150000042
6. and combining the obtained prediction boxes by using a text line construction method. Recursively merges the two boxes into a group until no merge is possible. The merging conditions are as follows: 1) closest to the target frame and less than 50 pixels away; 2) the cross-over ratio is more than 0.7.
7. And updating the weight parameters of each network layer through back propagation according to the loss function.
And finishing the text detection network training.
Text recognition network training:
1. by forward propagation, the size of an input picture is 1 multiplied by W multiplied by 32, the feature information of the text picture is extracted through four depth separable convolution modules, and the final output size is
Figure BDA0002222787150000043
2. Because the features extracted by the CNN can not be directly output to the Bi-LSTM, a feature vector sequence needs to be extracted, each feature vector is generated on a feature map from left to right according to rows, each column contains 512 features, each feature vector is 512-dimensional, and the feature vectors are obtained together here
Figure BDA0002222787150000044
A feature vector.
3. Then, through 1 Bi-LSTM module with 256 hidden nodes, a feature vector is transmitted into each time step in the Bi-LSTM, and the feature vectors share the same
Figure BDA0002222787150000045
Finally obtaining the softmax probability distribution of the character to form a character
Figure BDA0002222787150000046
The posterior probability matrix of x character class number is used as input to the CTC algorithm.
4. And (4) finding the label sequence with the highest probability combination through a CTC algorithm, and outputting.
5. The loss function O is formulated as follows, where X is the input sequence, Y is the output sequence, and p (l | X) represents the probability of the output sequence l under X characters.
Figure BDA0002222787150000047
6. And similarly, performing back propagation according to the loss function, and updating the network weight parameter.
The method has the advantages that the method is different from the traditional method, the Bi-LSTM and the CNN are used for carrying out feature training, the CNN and the LSTM are combined, a new network structure model is provided, the CTC algorithm is used for carrying out probability prediction at the final stage of character output, and finally the image is processed by the traditional method, so that the recognition effect of the optical character in the final patent scene is greatly improved. With the development of technologies such as artificial intelligence and the like, methods such as deep learning and the like are introduced into the industry from the academic world, so that the method has strong practical significance. Due to advances in hardware and algorithms, the current demand for accuracy and degree of automation in recognition is also increasing.
The invention combines the CNN and the LSTM, has the advantages of the CNN and the LSTM, and solves the problems that the CNN has weak processing on the sequence correlation and the LSTM has insufficient extraction on the image characteristics. The invention combines a new lost function computing method CTC to solve the problem that the sample data is difficult to align in the text recognition process in a mode of not needing alignment. Aiming at the problem of optical character recognition in a patent scene, a traditional method is introduced for preprocessing and splitting a characteristic region, most of current OCR applications do not perform operations such as background detection and character direction adjustment on irregular pictures, and optimization of optical character recognition on patent pictures is lacked. As can be seen from the previous diagrams, the presence or absence of the targeted processing has a great influence on the final effect. The application prospect shown by the invention is wide, and the deep learning-based method has better practical value and research significance for OCR application and research under a specific scene.
Drawings
FIG. 1 is an algorithmic framework of the present invention;
FIG. 2 is a diagram of a neural network model of the present invention;
FIG. 3 is a diagram of an infrastructure network architecture;
FIG. 4 is a diagram of a text recognition network architecture;
FIG. 5 is a data set and tag map, (a) is a data tag map, and (b) is a data presentation map;
FIG. 6 is a flow chart of the Train algorithm;
FIG. 7 is a graph of network operation results;
FIG. 8 is a feature region segmentation map;
FIG. 9 is an Excel class screenshot;
FIG. 10 is a write module effect display diagram;
FIG. 11 is a graph testing FIG. 1, (a) is the raw input graph, and (b) is the model test result graph;
FIG. 12 is a graph testing FIG. 2, (a) is the raw input graph, and (b) is the model test result graph;
fig. 13 is a graph showing comparison of model effects, (a) is an original input graph, (b) is a graph showing a result of model test using a preprocessing method, and (c) is a graph showing a result of model test without using a preprocessing method.
Detailed Description
The following describes the applicability of the invention in connection with a simulation example.
Defining a training environment:
CPU-i7 8700k、GPU NVIDIA GeForce 2080Ti、OS ubuntu 16.0.4。
data verification environment:
CPU 2.7GHz Intel Core i5、GPU Intel Iris Graphics 6100、Mac OS X10.14.6。
the development language used python3.5 and open-source framework Keras, tensirflow as the back-end, and third-party libraries such as Opencv, Numpy, etc. were introduced.
1. Data set preparation
Patent text pictures in tif format are adopted, a data set comprises 50 ten thousand original pictures, wherein the original pictures comprise Chinese, English, numbers and punctuation, and data enhancement is carried out by image processing methods such as stretching, blurring, random cutting, perspective transformation, reverse color and the like, so that about 300 ten thousand pictures in the final data set are obtained. The data set is divided into a training set and a verification set according to the proportion of 99:1, a data label is made through a text _ render tool, and a label file train.txt and picture data are generated, as shown in the attached figure 5.
2. Begin training
The iteration number epoch is set to 4, the batch-size is set to 16, and the picture length and width are respectively limited to 280 × 32. The learning rate lr dynamically changes according to epoch, and the specific formula is as follows.
lr=0.0006×0.3epoch
Py file is run, a session is created first, then a network structure and a data set path are loaded, and a training algorithm flowchart is shown in fig. 6. The screenshot of the run results is shown in FIG. 7.
And obtaining a weight.h5 file after training is finished, and then performing a step of recognizing and writing the characters of the patent text.
3. And (4) preprocessing the layout of the patent picture, and then identifying.
1) The input picture is first scaled and cropped to a standard 224 x 224 size picture. This step is to prevent the lack of precision caused by the irregular pictures.
2) And removing the noise of the irregular picture by using a filter, and performing binarization, rotation and the like to protrude the characteristics of the optical characters.
3) And establishing a coordinate system by taking the upper left corner as an origin, and extracting the coordinates of the area where the content to be identified is located. The corresponding area is cut to generate an intermediate picture, so that the characteristic area is enlarged as shown in figure 8, and a large amount of irrelevant information is reduced.
4) Writing into an Excel document, and reading and writing the Excel document by using python package openpyxl. The classes of data that need to be written are shown in fig. 9. First, whether a document is newly created or added is judged by a compare _ excel (self, sheet) - > pool function. Because the types of patent pictures are more, data may need to be written into an existing line or a new line, and the invention carries out multiple judgment through the collection of keyword patent numbers, data names and the like. And finally, writing the data into an Excel document, as shown in the figure 10.
After a series of image processing, the network model is used for identification, and the test effect is good, as shown in fig. 11 and fig. 12.
As can be seen from the above diagrams, the method obviously improves the recognition accuracy, the algorithm recognition result can be displayed as an intermediate output stream and can be manually modified, and the final result can be automatically stored in an Excel table. The final identification precision is very high, and industrial deployment application can be basically carried out.
The invention provides a novel network structure and an algorithm model: Bi-LSTM + CNN + CTC algorithm. The text detection network adopts an SE-block structure, a new basic network structure is constructed for feature extraction, the module fully considers the influence of different channel dimensions on features in the feature extraction, and the feature extraction effect is better compared with other feature extraction network models. The text recognition network uses a new depth-Conv module to construct a CNN module, and the loss function calculates the character probability by using a brand-new CTC algorithm to replace the smoothLoss function. Under the condition of keeping the model precision, the model parameters are greatly reduced, and the calculated amount is reduced.
In the recognition stage, the picture is preprocessed to unify the size of the picture, then the characteristic region is recognized and cut, and the trained network model is used for recognition to generate an intermediate result. Since optical character recognition currently has no way to achieve one hundred percent accuracy, manual review is still necessary. If the image is directly input into the network model for identification without image preprocessing, the effect is poor, the importance of the image preprocessing on the final result can be seen through the graph 13, and the accuracy of identification directly without preprocessing is improved qualitatively compared with the accuracy of identification directly without preprocessing.

Claims (1)

1. An optical character recognition method under a patent text scene is characterized by comprising the following steps:
s1, obtaining patent text pictures in tif format, and preprocessing the patent text pictures to be used as a sample set;
s2, establishing a deep neural network model, including a text detection network model and a text recognition network model;
the text detection network model consists of 3 convolutional layers, 3 compression excitation modules and 1 Bi-LSTM, wherein one convolutional layer is connected with one compression excitation module; each compression excitation module comprises two output branches, one branch does not carry out any treatment, the other branch sequentially passes through a pooling layer, a full connection layer, a Relu excitation layer, a full connection layer and a sigmoid excitation layer, and finally two branch results are added and then output; the last compression excitation module is connected with the Bi-LSTM after passing through a convolution kernel of 3 multiplied by 3, and finally output through a full connection layer;
the text recognition network model is composed of Bi-LSTM and CNN, the network model firstly passes through a depth separable module composed of CNN, the module comprises 3 x 3 convolution layers with the same number as input channels, batch normalization is carried out after superposition, then a 1 x 1 convolution layer is passed through, and finally the depth separable module is output after batch normalization, activation function and maximum pooling layer; the last depth separable module is connected with the Bi-LSTM module and is connected with the sequence translation module;
s3, training the deep neural network model in the step S2 by using the sample set obtained in the step S1 to obtain a trained neural network model, which specifically comprises the following steps:
training a text detection network model: through forward propagation, extracting text picture characteristic information by using a convolution module, wherein the sizes of characteristic graphs proposed by a basic network module are W, H and C; w is the width of the feature map, H is the height of the feature map, and C is the number of output channels;
extracting target candidate region features through C3 × 3 convolution kernels and preset preselected frame sizes, inputting the target candidate region features into a Bi-LSTM network to obtain W × 256-dimensional output, outputting the W × 256-dimensional output through a 512-dimensional full-connection layer, wherein the output layer is divided into 2 parts, the first part is subjected to coordinate regression by using 512 × (4+10), 512 represents 512 feature numbers of each point, 10 represents 10 groups of preselected frame sizes of each point, and 4 represents the composition of the preselected frame sizes (xmin, xmax, ymin, ymax) and represents coordinates of the two points; the second part uses 512 × (2+10) for category prediction, 512 and 10 have the same meaning as the first part, and 2 indicates both the background and not the background;
generating WXH multiplied by 10 different preselection frames in all the pictures, deleting the frames by using a maximum value inhibition method, and setting a threshold value to be 0.7;
calculating the offset of each candidate frame relative to the real frame for predicting frame regression;
obtaining a final prediction frame according to the category score and the coordinates; the overall loss function consists of the addition of the classification loss function and the regression prediction function,
Figure FDA0002222787140000021
represents a function of the loss of classification,
Figure FDA0002222787140000022
representing the regression loss function, first part
Figure FDA0002222787140000023
Supervised learning of the prediction box using the softmax function whether the prediction box contains text information, siScore, s, representing the ith category*1 denotes whether or not the value is true; the second part
Figure FDA0002222787140000024
Is an L1smooth function for learning biased regression of a prediction box containing text in the y direction, where vjFor the jth texted prediction box size, beta represents the task weight, NsAnd NvIs a normalization parameter, which represents the number of samples of the corresponding task; the formula is as follows:
Figure FDA0002222787140000025
Figure FDA0002222787140000026
Figure FDA0002222787140000027
combining the obtained prediction frames by a text line construction method, recursively combining the two frames into a group until the two frames cannot be combined, wherein the combination conditions are as follows: 1) closest to the target frame and less than 50 pixels away; 2) the cross-mixing ratio is more than 0.7;
updating the weight parameters of each network layer through back propagation according to the loss function;
training a text recognition network model:
by forward propagation, the size of an input picture is 1 multiplied by W multiplied by 32, the feature information of the text picture is extracted through four depth separable convolution modules, and the final output size is
Figure FDA0002222787140000028
Because the features extracted by the CNN can not be directly output to the Bi-LSTM, a feature vector sequence needs to be extracted, each feature vector is generated on a feature map from left to right according to rows, each column contains 512 features, each feature vector is 512-dimensional, and the feature vectors are obtained together
Figure FDA0002222787140000029
A feature vector;
then, through 1 Bi-LSTM module with 256 hidden nodes, a feature vector is transmitted into each time step in the Bi-LSTM, and the feature vectors share the sameFinally, obtaining the softmax probability distribution of the character,
Figure FDA00022227871400000211
Figure FDA00022227871400000212
as input to the CTC algorithm;
through a CTC algorithm, finding a tag sequence with the highest probability combination, and outputting the tag sequence;
the loss function O is expressed as follows, where X is the input sequence, Y is the output sequence, and p (l | X) represents the probability of the output sequence l at X characters:
performing backward propagation according to the loss function, and updating the network weight parameter;
and S4, inputting the patent text picture to be recognized into the trained neural network model to obtain an optical character recognition result.
CN201910940612.1A 2019-09-30 2019-09-30 Optical character recognition method in patent text scene Pending CN110674777A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910940612.1A CN110674777A (en) 2019-09-30 2019-09-30 Optical character recognition method in patent text scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910940612.1A CN110674777A (en) 2019-09-30 2019-09-30 Optical character recognition method in patent text scene

Publications (1)

Publication Number Publication Date
CN110674777A true CN110674777A (en) 2020-01-10

Family

ID=69080609

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910940612.1A Pending CN110674777A (en) 2019-09-30 2019-09-30 Optical character recognition method in patent text scene

Country Status (1)

Country Link
CN (1) CN110674777A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111414908A (en) * 2020-03-16 2020-07-14 湖南快乐阳光互动娱乐传媒有限公司 Method and device for recognizing caption characters in video
CN111985484A (en) * 2020-08-11 2020-11-24 云南电网有限责任公司电力科学研究院 CNN-LSTM-based temperature instrument digital identification method and device
CN112052853A (en) * 2020-09-09 2020-12-08 国家气象信息中心 Text positioning method of handwritten meteorological archive data based on deep learning
CN112052852A (en) * 2020-09-09 2020-12-08 国家气象信息中心 Character recognition method of handwritten meteorological archive data based on deep learning
CN112270174A (en) * 2020-11-10 2021-01-26 清华大学深圳国际研究生院 Rumor detection method and computer readable storage medium
CN112287934A (en) * 2020-08-12 2021-01-29 北京京东尚科信息技术有限公司 Method and device for recognizing characters and obtaining character image feature extraction model
CN112348007A (en) * 2020-10-21 2021-02-09 杭州师范大学 Optical character recognition method based on neural network

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109447078A (en) * 2018-10-23 2019-03-08 四川大学 A kind of detection recognition method of natural scene image sensitivity text
CN109902622A (en) * 2019-02-26 2019-06-18 中国科学院重庆绿色智能技术研究院 A kind of text detection recognition methods for boarding pass information verifying
CN109977950A (en) * 2019-03-22 2019-07-05 上海电力学院 A kind of character recognition method based on mixing CNN-LSTM network
WO2019144575A1 (en) * 2018-01-24 2019-08-01 中山大学 Fast pedestrian detection method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019144575A1 (en) * 2018-01-24 2019-08-01 中山大学 Fast pedestrian detection method and device
CN109447078A (en) * 2018-10-23 2019-03-08 四川大学 A kind of detection recognition method of natural scene image sensitivity text
CN109902622A (en) * 2019-02-26 2019-06-18 中国科学院重庆绿色智能技术研究院 A kind of text detection recognition methods for boarding pass information verifying
CN109977950A (en) * 2019-03-22 2019-07-05 上海电力学院 A kind of character recognition method based on mixing CNN-LSTM network

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
BOLAN SU .ETC: ""Accurate recognition of words in scenes without character segmentation using recurrent neural network"", 《PATTERN RECOGNITION》 *
YANHUA SHAO .ETC: ""Using Multi-Scale Infrared Optical Flow-based Crowd motion estimation for Autonomous Monitoring UAV"", 《2018 CHINESE AUTOMATION CONGRESS(CAC)》 *
曾劲松 等: ""基于冲突博弈算法的海量信息智能分类"", 《计算机科学》 *
谭咏梅 等: ""基于CNN与双向LSTM的中文文本蕴含识别方法"", 《中文信息学报》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111414908A (en) * 2020-03-16 2020-07-14 湖南快乐阳光互动娱乐传媒有限公司 Method and device for recognizing caption characters in video
CN111414908B (en) * 2020-03-16 2023-08-29 湖南快乐阳光互动娱乐传媒有限公司 Method and device for recognizing caption characters in video
CN111985484A (en) * 2020-08-11 2020-11-24 云南电网有限责任公司电力科学研究院 CNN-LSTM-based temperature instrument digital identification method and device
CN112287934A (en) * 2020-08-12 2021-01-29 北京京东尚科信息技术有限公司 Method and device for recognizing characters and obtaining character image feature extraction model
CN112052853A (en) * 2020-09-09 2020-12-08 国家气象信息中心 Text positioning method of handwritten meteorological archive data based on deep learning
CN112052852A (en) * 2020-09-09 2020-12-08 国家气象信息中心 Character recognition method of handwritten meteorological archive data based on deep learning
CN112052852B (en) * 2020-09-09 2023-12-29 国家气象信息中心 Character recognition method of handwriting meteorological archive data based on deep learning
CN112052853B (en) * 2020-09-09 2024-02-02 国家气象信息中心 Text positioning method of handwriting meteorological archive data based on deep learning
CN112348007A (en) * 2020-10-21 2021-02-09 杭州师范大学 Optical character recognition method based on neural network
CN112348007B (en) * 2020-10-21 2023-12-19 杭州师范大学 Optical character recognition method based on neural network
CN112270174A (en) * 2020-11-10 2021-01-26 清华大学深圳国际研究生院 Rumor detection method and computer readable storage medium
CN112270174B (en) * 2020-11-10 2022-04-29 清华大学深圳国际研究生院 Rumor detection method and computer readable storage medium

Similar Documents

Publication Publication Date Title
CN110674777A (en) Optical character recognition method in patent text scene
Zhao et al. Document image binarization with cascaded generators of conditional generative adversarial networks
CN111652332B (en) Deep learning handwritten Chinese character recognition method and system based on two classifications
Wilkinson et al. Neural Ctrl-F: segmentation-free query-by-string word spotting in handwritten manuscript collections
CN112070768B (en) Anchor-Free based real-time instance segmentation method
CN109033978B (en) Error correction strategy-based CNN-SVM hybrid model gesture recognition method
CN113537227B (en) Structured text recognition method and system
CN112069900A (en) Bill character recognition method and system based on convolutional neural network
CN111563563B (en) Method for enhancing combined data of handwriting recognition
CN111666937A (en) Method and system for recognizing text in image
CN113673338A (en) Natural scene text image character pixel weak supervision automatic labeling method, system and medium
CN110929746A (en) Electronic file title positioning, extracting and classifying method based on deep neural network
CN114155244A (en) Defect detection method, device, equipment and storage medium
CN114187595A (en) Document layout recognition method and system based on fusion of visual features and semantic features
CN110503090B (en) Character detection network training method based on limited attention model, character detection method and character detector
US8340428B2 (en) Unsupervised writer style adaptation for handwritten word spotting
Vinokurov Using a convolutional neural network to recognize text elements in poor quality scanned images
Dipu et al. Bangla optical character recognition (ocr) using deep learning based image classification algorithms
CN111144469B (en) End-to-end multi-sequence text recognition method based on multi-dimensional associated time sequence classification neural network
CN115640401A (en) Text content extraction method and device
Kasi et al. A deep learning based cross model text to image generation using DC-GAN
Lai et al. Robust text line detection in equipment nameplate images
Zulkarnain et al. Table information extraction using data augmentation on deep learning and image processing
Ahmed et al. Sub-sampling approach for unconstrained Arabic scene text analysis by implicit segmentation based deep learning classifier
Bureš et al. Semantic text segmentation from synthetic images of full-text documents

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200110