CN107463928A - Word sequence error correction algorithm, system and its equipment based on OCR and two-way LSTM - Google Patents
Word sequence error correction algorithm, system and its equipment based on OCR and two-way LSTM Download PDFInfo
- Publication number
- CN107463928A CN107463928A CN201710630581.0A CN201710630581A CN107463928A CN 107463928 A CN107463928 A CN 107463928A CN 201710630581 A CN201710630581 A CN 201710630581A CN 107463928 A CN107463928 A CN 107463928A
- Authority
- CN
- China
- Prior art keywords
- ocr
- way lstm
- linguistic context
- error correction
- context vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/98—Detection or correction of errors, e.g. by rescanning the pattern or by human intervention; Evaluation of the quality of the acquired patterns
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/62—Text, e.g. of license plates, overlay texts or captions on TV images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Abstract
Based on OCR and two-way LSTM word sequence error correction algorithm, system and its equipment, methods described includes:S1, obtain character image;S2, the character image pre-process to obtain First ray set X={ x by OCR0,x1,...,xm};S3 the, by { x of positive sequence0,x1,…,xmAnd inverted sequence { xm,xm‑1,...,x0Input two-way LSTM structure encoder in obtain linguistic context vector c;S4, the decoder decoding that the linguistic context vector c is built through two-way LSTM obtain the second arrangement set Y respectively.The system includes image capture module, OCR processing modules, the encoder of two-way LSTM structures, the decoder of two-way LSTM structures.The equipment is used for the configuration processor for carrying methods described.
Description
Technical field
The present invention relates to machine translation field in pictograph identification process, more particularly to based on OCR's and two-way LSTM
Word sequence error correction algorithm, system and its equipment.
Background technology
In recent years, as the fast development of machine learning, various machine translation algorithms emerge in an endless stream, what is be widely used has
OCR Text region algorithms.OCR (Optical Character Recognition, optical character identification) refers to electronic equipment
(such as scanner or digital camera) checks the character printed on paper, and its shape, Ran Houyong are determined by detecting dark, bright pattern
Character identifying method translates into shape the process of computword;That is, for printed character, using optical mode by paper
Text conversion in matter document turns into the image file of black and white lattice, and by identification software that the text conversion in image is written
This form, the technology further edited and processed for word processor.
However, because image irradiation, angle etc. influence, OCR identifications word arithmetic accuracy is extremely difficult to it is expected.
The content of the invention
In order to solve the above-mentioned technical problem, the present invention proposes the word sequence error correction algorithm based on OCR and two-way LSTM.System
System and its equipment, it can effectively improve the degree of accuracy of word sequence identification.
To achieve these goals, the technical scheme is that:
Based on OCR and two-way LSTM word sequence error correction algorithm, the identification of word suitable for image, including step:
S1, obtain character image;
S2, the character image pre-process to obtain First ray set X={ x by OCR0,x1,...,xm};
S3 the, by { x of positive sequence0,x1,…,xmAnd inverted sequence { xm,xm-1,...,x0Input the coding that two-way LSTM is built
Linguistic context vector c is obtained in device;
S4, the decoder decoding that the linguistic context vector c is built through two-way LSTM obtain the second arrangement set Y respectively.
Linguistic context vector c described in step S3 is:
C=Φ ({ h1,h2,…,hTS});
ht=f (xt,ht-1)。
The second arrangement set Y described in step S4 is:
Y=(y0,y1,…,yn);
st=f (yt-1,st-1,c);
p(yt|y<T, X)=g (yt-1,st,c)。
Character image described in step S1 is express delivery single image.
The threshold value of OCR pretreatments described in step S2 is the minimum reliability threshold values that system allows.
Based on OCR and two-way LSTM word sequence error correction system, including:
Image capture module, for obtaining character image;
OCR processing modules, pre-process to obtain First ray set X={ x for carrying out OCR to the character image0,
x1,...,xm};
The encoder of two-way LSTM structures the, for { x to positive sequence0,x1,…,xmAnd inverted sequence { xm,xm-1,...,x0}
Encoded to obtain linguistic context vector c;
The decoder of two-way LSTM structures, the second arrangement set is obtained for being decoded to the linguistic context vector c respectively
Y。
Based on OCR and two-way LSTM word sequence error correction apparatus, including it is stored with the computer-readable of computer program
Medium, described program are run for performing:
S1, obtain character image;
S2, the character image pre-process to obtain First ray set X={ x by OCR0,x1,...,xm};
S3 the, by { x of positive sequence0,x1,…,xmAnd inverted sequence { xm,xm-1,...,x0Input the coding that two-way LSTM is built
Linguistic context vector c is obtained in device;
S4, the decoder decoding that the linguistic context vector c is built through two-way LSTM obtain the second arrangement set Y respectively.
The beneficial effects of the invention are as follows:By integrated use OCR and two-way LSTM algorithms, the accurate of Text region is improved
Degree.
Brief description of the drawings
Fig. 1 shows the flow chart according to embodiments herein.
Fig. 2 shows the operational flowchart of the two-way LSTM according to embodiments herein;
Fig. 3 shows the coding flow chart of the two-way LSTM according to embodiments herein.
Embodiment
In order to be better understood by technical scheme, the invention will be further described by 1- Fig. 3 below in conjunction with the accompanying drawings.
As shown in figure 1, the word sequence error correction algorithm based on OCR and two-way LSTM, the identification of word suitable for image,
Integrated use artificial intelligence and big data, the text queue to input carry out real time data, realized to the real-time of text information
Processing and application.Including step:
Obtain character image and carry out OCR pretreatments.
It is originally inputted as express delivery single image information, is pre-processed via Text region OCR, obtained OCR result queue, OCR
Input of the result queue as Language Model, and combine mass text dictionary, obtain desired output sequence.
In order to improve the disadvantage of OCR technique identification word sequence precision accuracy rate relatively low (Exemplary statistical data 29.65%)
End, using the method for setting minimum reliability threshold values, the value that will be greater than the threshold value takes out as OCR most this algorithm
Whole output character queue, is input in language model and carries out computing.
Because the Recognition with Recurrent Neural Network RNN of the standard contextual informations that can be accessed are limited in scope, cause the defeated of hidden layer
Entering the output for network influences to weaken with the continuous recurrence of network.As shown in Fig. 2 to solve this problem, by double
To LSTM models (length memory network) using one as input sequence mapping for one as export sequence, this process
It is made up of two links of coding input and coding output.Such as existing sequence " x0,x1,...,xm", after being passed to model successively, reflect
It is " y to penetrate output0,y1,…,yn”。
Two-way LSTM core frame is Encoder-Decoder.From the point of view of simple, after list entries is passed to model, first
The vector of a regular length, i.e. linguistic context vector are compiled it as by encoder.After the completion of coding, linguistic context variable will enter solution
Code device is decoded, and by using local optimum resolving Algorithm, chooses a kind of module, the retrieval dictionary before equipment exports, from
And obtain optimal selection.
From the point of view of specific, for given input First ray set X, expectation is generated by Encoder-Decoder frameworks
The second arrangement set of target Y.X, Y are made up of respective sequence respectively.
X={ x0,x1,...,xm, its order is character string order in itself;
Y=(y0,y1,…,yn)。
M herein and n is positive integer, and m is that length -1, n of list entries is length -1 of output sequence, and wherein m and n is not
It is certain equal, stop output when decoder Decoder end of output symbols.First, shown in equation below, list entries
{x0,x1,…,xmAnd inverted sequence { xm,xm-1,...,x0Via two-way LSTM structure encoder recurrence obtains each hidden section one by one
Point ht, each hidden node htWeighted sum be linguistic context vector c.The concept of the hidden node is:Input is removed in neutral net
And all nodes of output node can be referred to as hidden node, more should accurately be changed to " linguistic context caused by each moment to
Amount ".Fig. 3 is that two-way LSTM encodes to obtain c1, c2 process.Wherein c1, c2 are two linguistic context vectors, represent respectively positive sequence with
And backward.
ht=f (xt,ht-1)
C=Φ ({ h1,h2,…,hTS})
Wherein h refers to the linguistic context vector of each moment encoder output, and TS refers to last moment.Φ refers to the h at all moment
Pass through the stacking fusion process at each moment on the encoder.F refer to encoder a moment according to last moment linguistic context to
Amount and input produce the function (process) of current time linguistic context vector,.
Linguistic context the vector c1, c2 of positive sequence backward coding generation are after the completion of coding, by merging (being usually direct splicing),
Final linguistic context vector as encoder is input to decoder, obtains ultimate sequence set Y, the output sequence as needed.
st=f (yt-1,st-1,c);
p(yt|y<T, X)=g (yt-1,st,c)。
Wherein, s refers to linguistic context vector caused by each moment decoder.F refers to decoder at current time according to last moment
Function (the mistake for the linguistic context vector structure current time linguistic context vector that decoder linguistic context vector, output and encoder finally export
Journey).G refers to decoder and finally exported according to current time decoder linguistic context vector, the output of last moment decoder and encoder
Linguistic context vector, produce the process that currently exports.Wherein p is represented and next output is produced on the premise of all inputs before
Probability;X refers to the input dictionary vector at each moment that encoder receives.Parameter t in above-mentioned is the moment, and value is:
T value is 0≤t≤m, in a decoder t 0≤t of value≤n in encoder.
Based on OCR and two-way LSTM word sequence error correction system, including:
Image capture module, for obtaining character image;
OCR processing modules, pre-process to obtain First ray set X={ x for carrying out OCR to the character image0,
x1,...,xm};
The encoder of two-way LSTM structures the, for { x to positive sequence0,x1,…,xmAnd inverted sequence { xm,xm-1,...,x0}
Encoded to obtain linguistic context vector c;
The decoder of two-way LSTM structures, the second arrangement set is obtained for being decoded to the linguistic context vector c respectively
Y。
Based on OCR and two-way LSTM word sequence error correction apparatus, including it is stored with the computer-readable of computer program
Medium, described program are run for performing:
S1, obtain character image;
S2, the character image pre-process to obtain First ray set X={ x by OCR0,x1,...,xm};
S3 the, by { x of positive sequence0,x1,…,xmAnd inverted sequence { xm,xm-1,...,x0Input the coding that two-way LSTM is built
Linguistic context vector c is obtained in device;
S4, the decoder decoding that the linguistic context vector c is built through two-way LSTM obtain the second arrangement set Y respectively.
Above description is only the preferred embodiment of the application and the explanation to institute's application technology principle.People in the art
Member should be appreciated that invention scope involved in the application, however it is not limited to the technology that the particular combination of above-mentioned technical characteristic forms
Scheme, while should also cover in the case where not departing from the inventive concept, carried out by above-mentioned technical characteristic or its equivalent feature
The other technical schemes for being combined and being formed.Such as features described above has similar work(with (but not limited to) disclosed herein
The technical scheme that the technical characteristic of energy is replaced mutually and formed.
Claims (7)
1. based on OCR and two-way LSTM word sequence error correction algorithm, the identification of word suitable for image, it is characterised in that
Including step:
S1, obtain character image;
S2, the character image pre-process to obtain First ray set X={ x by OCR0,x1,...,xm};
S3 the, by { x of positive sequence0,x1,…,xmAnd inverted sequence { xm,xm-1,...,x0Input in the encoder of two-way LSTM structures
Obtain linguistic context vector c;
S4, the decoder decoding that the linguistic context vector c is built through two-way LSTM obtain the second arrangement set Y respectively.
2. the word sequence error correction algorithm according to claim 1 based on OCR and two-way LSTM, it is characterised in that step
Linguistic context vector c described in S3 is:
C=Φ ({ h1,h2,…,hTS});
ht=f (xt,ht-1)。
3. the word sequence error correction algorithm according to claim 1 based on OCR and two-way LSTM, it is characterised in that step
The second arrangement set Y described in S4 is:
Y=(y0,y1,…,yn);
st=f (yt-1,st-1,c);
p(yt|y<T, X)=g (yt-1,st,c)。
4. the word sequence error correction algorithm based on OCR and two-way LSTM according to Claims 2 or 3, it is characterised in that step
Character image described in rapid S1 is express delivery single image.
5. the word sequence error correction algorithm based on OCR and two-way LSTM according to Claims 2 or 3, it is characterised in that step
The threshold value of OCR pretreatments described in rapid S2 is the minimum reliability threshold values that system allows.
6. the word sequence error correction system based on OCR and two-way LSTM, it is characterised in that including:
Image capture module, for obtaining character image;
OCR processing modules, pre-process to obtain First ray set X={ x for carrying out OCR to the character image0,x1,...,
xm};
The encoder of two-way LSTM structures the, for { x to positive sequence0,x1,…,xmAnd inverted sequence { xm,xm-1,...,x0Carry out
Coding obtains linguistic context vector c;
The decoder of two-way LSTM structures, the second arrangement set Y is obtained for being decoded to the linguistic context vector c respectively.
7. based on OCR and two-way LSTM word sequence error correction apparatus, including it is stored with computer-readable Jie of computer program
Matter, it is characterised in that described program is run for performing:
S1, obtain character image;
S2, the character image pre-process to obtain First ray set X={ x by OCR0,x1,...,xm};
S3 the, by { x of positive sequence0,x1,…,xmAnd inverted sequence { xm,xm-1,...,x0Input in the encoder of two-way LSTM structures
Obtain linguistic context vector c;
S4, the decoder decoding that the linguistic context vector c is built through two-way LSTM obtain the second arrangement set Y respectively.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710630581.0A CN107463928A (en) | 2017-07-28 | 2017-07-28 | Word sequence error correction algorithm, system and its equipment based on OCR and two-way LSTM |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710630581.0A CN107463928A (en) | 2017-07-28 | 2017-07-28 | Word sequence error correction algorithm, system and its equipment based on OCR and two-way LSTM |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107463928A true CN107463928A (en) | 2017-12-12 |
Family
ID=60547822
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710630581.0A Pending CN107463928A (en) | 2017-07-28 | 2017-07-28 | Word sequence error correction algorithm, system and its equipment based on OCR and two-way LSTM |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107463928A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108416349A (en) * | 2018-01-30 | 2018-08-17 | 顺丰科技有限公司 | Identify deviation-rectifying system and method |
CN109711412A (en) * | 2018-12-27 | 2019-05-03 | 信雅达系统工程股份有限公司 | A kind of optical character identification error correction method based on dictionary |
CN110377591A (en) * | 2019-06-12 | 2019-10-25 | 北京百度网讯科技有限公司 | Training data cleaning method, device, computer equipment and storage medium |
CN112507080A (en) * | 2020-12-16 | 2021-03-16 | 北京信息科技大学 | Character recognition and correction method |
WO2021164310A1 (en) * | 2020-02-21 | 2021-08-26 | 华为技术有限公司 | Text error correction method and apparatus, and terminal device and computer storage medium |
US11842524B2 (en) | 2021-04-30 | 2023-12-12 | International Business Machines Corporation | Multi-modal learning based intelligent enhancement of post optical character recognition error correction |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150161991A1 (en) * | 2013-12-10 | 2015-06-11 | Google Inc. | Generating representations of acoustic sequences using projection layers |
CN105046289A (en) * | 2015-08-07 | 2015-11-11 | 北京旷视科技有限公司 | Text field type identification method and text field type identification system |
CN105512692A (en) * | 2015-11-30 | 2016-04-20 | 华南理工大学 | BLSTM-based online handwritten mathematical expression symbol recognition method |
CN105740226A (en) * | 2016-01-15 | 2016-07-06 | 南京大学 | Method for implementing Chinese segmentation by using tree neural network and bilateral neural network |
CN105930314A (en) * | 2016-04-14 | 2016-09-07 | 清华大学 | Text summarization generation system and method based on coding-decoding deep neural networks |
CN106604125A (en) * | 2016-12-29 | 2017-04-26 | 北京奇艺世纪科技有限公司 | Video subtitle determining method and video subtitle determining device |
CN106960206A (en) * | 2017-02-08 | 2017-07-18 | 北京捷通华声科技股份有限公司 | Character identifying method and character recognition system |
-
2017
- 2017-07-28 CN CN201710630581.0A patent/CN107463928A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150161991A1 (en) * | 2013-12-10 | 2015-06-11 | Google Inc. | Generating representations of acoustic sequences using projection layers |
CN105046289A (en) * | 2015-08-07 | 2015-11-11 | 北京旷视科技有限公司 | Text field type identification method and text field type identification system |
CN105512692A (en) * | 2015-11-30 | 2016-04-20 | 华南理工大学 | BLSTM-based online handwritten mathematical expression symbol recognition method |
CN105740226A (en) * | 2016-01-15 | 2016-07-06 | 南京大学 | Method for implementing Chinese segmentation by using tree neural network and bilateral neural network |
CN105930314A (en) * | 2016-04-14 | 2016-09-07 | 清华大学 | Text summarization generation system and method based on coding-decoding deep neural networks |
CN106604125A (en) * | 2016-12-29 | 2017-04-26 | 北京奇艺世纪科技有限公司 | Video subtitle determining method and video subtitle determining device |
CN106960206A (en) * | 2017-02-08 | 2017-07-18 | 北京捷通华声科技股份有限公司 | Character identifying method and character recognition system |
Non-Patent Citations (1)
Title |
---|
商俊蓓: "基于双向长短时记忆递归神经网络的联机手写数字公式字符识别", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108416349A (en) * | 2018-01-30 | 2018-08-17 | 顺丰科技有限公司 | Identify deviation-rectifying system and method |
CN109711412A (en) * | 2018-12-27 | 2019-05-03 | 信雅达系统工程股份有限公司 | A kind of optical character identification error correction method based on dictionary |
CN110377591A (en) * | 2019-06-12 | 2019-10-25 | 北京百度网讯科技有限公司 | Training data cleaning method, device, computer equipment and storage medium |
WO2021164310A1 (en) * | 2020-02-21 | 2021-08-26 | 华为技术有限公司 | Text error correction method and apparatus, and terminal device and computer storage medium |
CN112507080A (en) * | 2020-12-16 | 2021-03-16 | 北京信息科技大学 | Character recognition and correction method |
US11842524B2 (en) | 2021-04-30 | 2023-12-12 | International Business Machines Corporation | Multi-modal learning based intelligent enhancement of post optical character recognition error correction |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107463928A (en) | Word sequence error correction algorithm, system and its equipment based on OCR and two-way LSTM | |
Jiang et al. | Learning to guide decoding for image captioning | |
CN110738090A (en) | System and method for end-to-end handwritten text recognition using neural networks | |
CN112084841B (en) | Cross-mode image multi-style subtitle generating method and system | |
CN112528637B (en) | Text processing model training method, device, computer equipment and storage medium | |
CN112115687B (en) | Method for generating problem by combining triplet and entity type in knowledge base | |
CN111143563A (en) | Text classification method based on integration of BERT, LSTM and CNN | |
CN111914076B (en) | User image construction method, system, terminal and storage medium based on man-machine conversation | |
CN111581970B (en) | Text recognition method, device and storage medium for network context | |
CN114820871B (en) | Font generation method, model training method, device, equipment and medium | |
CN112070114A (en) | Scene character recognition method and system based on Gaussian constraint attention mechanism network | |
CN115082693A (en) | Multi-granularity multi-mode fused artwork image description generation method | |
CN116884391B (en) | Multimode fusion audio generation method and device based on diffusion model | |
CN112560456A (en) | Generation type abstract generation method and system based on improved neural network | |
CN114863539A (en) | Portrait key point detection method and system based on feature fusion | |
CN116206314A (en) | Model training method, formula identification method, device, medium and equipment | |
Agrawal et al. | Image Caption Generator Using Attention Mechanism | |
CN112349288A (en) | Chinese speech recognition method based on pinyin constraint joint learning | |
CN115064154A (en) | Method and device for generating mixed language voice recognition model | |
CN113297374B (en) | Text classification method based on BERT and word feature fusion | |
CN114694255A (en) | Sentence-level lip language identification method based on channel attention and time convolution network | |
CN116702765A (en) | Event extraction method and device and electronic equipment | |
CN115496134A (en) | Traffic scene video description generation method and device based on multi-modal feature fusion | |
CN112434143B (en) | Dialog method, storage medium and system based on hidden state constraint of GRU (generalized regression Unit) | |
CN115719072A (en) | Chapter-level neural machine translation method and system based on mask mechanism |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20171212 |
|
RJ01 | Rejection of invention patent application after publication |