SG10201904825XA - Automatic optical character recognition (ocr) correction - Google Patents

Automatic optical character recognition (ocr) correction

Info

Publication number
SG10201904825XA
SG10201904825XA SG10201904825XA SG10201904825XA SG 10201904825X A SG10201904825X A SG 10201904825XA SG 10201904825X A SG10201904825X A SG 10201904825XA SG 10201904825X A SG10201904825X A SG 10201904825XA
Authority
SG
Singapore
Prior art keywords
machine
correction
readable text
ocr
device configured
Prior art date
Application number
Inventor
Ruoyu Li
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to SG10201904825X priority Critical patent/SG10201904825XA/en
Publication of SG10201904825XA publication Critical patent/SG10201904825XA/en
Priority to MYPI2019007147A priority patent/MY189247A/en
Priority to PH12019000478A priority patent/PH12019000478B1/en
Priority to US16/791,936 priority patent/US11023766B2/en
Priority to CN202010314192.9A priority patent/CN112016553B/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19147Obtaining sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/26Techniques for post-processing, e.g. correcting the recognition result
    • G06V30/262Techniques for post-processing, e.g. correcting the recognition result using context analysis, e.g. lexical, syntactic or semantic context
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Multimedia (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Character Discrimination (AREA)

Abstract

AUTOMATIC OPTICAL CHARACTER RECOGNITION (OCR) CORRECTION An Optical Character Recognition (OCR) system, including: an acquisition device configured to obtain a digital image of a physical document; an image conversion device configured to convert the digital image of the physical document into corresponding machine-readable text; a correction device configured to: evaluate the machine-readable text using a trained Long short-term memory (LSTM) neural network language model to determine whether correction to the machine-readable text is required; if correction to the machine-readable text is required, determine a most similar text relative to the machine-readable text from a name and address corpus using a modified edit distance technique; and correct the machine-readable text with the determined most similar text; and an output device configured to output the corrected machine-readable text. Figure 2
SG10201904825X 2019-05-28 2019-05-28 Automatic optical character recognition (ocr) correction SG10201904825XA (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
SG10201904825X SG10201904825XA (en) 2019-05-28 2019-05-28 Automatic optical character recognition (ocr) correction
MYPI2019007147A MY189247A (en) 2019-05-28 2019-12-03 Automatic optical character recognition (ocr) correction
PH12019000478A PH12019000478B1 (en) 2019-05-28 2019-12-13 Automatic optical character recognition (ocr) correction
US16/791,936 US11023766B2 (en) 2019-05-28 2020-02-14 Automatic optical character recognition (OCR) correction
CN202010314192.9A CN112016553B (en) 2019-05-28 2020-04-17 Optical Character Recognition (OCR) system, automatic OCR correction system, method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
SG10201904825X SG10201904825XA (en) 2019-05-28 2019-05-28 Automatic optical character recognition (ocr) correction

Publications (1)

Publication Number Publication Date
SG10201904825XA true SG10201904825XA (en) 2019-10-30

Family

ID=68342310

Family Applications (1)

Application Number Title Priority Date Filing Date
SG10201904825X SG10201904825XA (en) 2019-05-28 2019-05-28 Automatic optical character recognition (ocr) correction

Country Status (5)

Country Link
US (1) US11023766B2 (en)
CN (1) CN112016553B (en)
MY (1) MY189247A (en)
PH (1) PH12019000478B1 (en)
SG (1) SG10201904825XA (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112464845B (en) * 2020-12-04 2022-09-16 山东产研鲲云人工智能研究院有限公司 Bill recognition method, equipment and computer storage medium
JP2022095391A (en) * 2020-12-16 2022-06-28 富士フイルムビジネスイノベーション株式会社 Information processing apparatus and information processing program
CN112966681B (en) * 2021-04-12 2022-05-10 深圳市秦丝科技有限公司 Method, equipment and storage medium for intelligent recognition, filing and retrieval of commodity photographing
CN113420546A (en) * 2021-06-24 2021-09-21 平安国际智慧城市科技股份有限公司 Text error correction method and device, electronic equipment and readable storage medium
KR20230006203A (en) 2021-07-02 2023-01-10 한국전력공사 System and Method for recognizing optical character related to power based on deep learning
US11763585B2 (en) 2021-07-14 2023-09-19 Bank Of America Corporation Multi-layer neural network and convolutional neural network for context sensitive optical character recognition
CN113704403A (en) * 2021-08-25 2021-11-26 深圳市网联安瑞网络科技有限公司 Word stock-based OCR semantic correction method, system, medium, equipment and terminal

Family Cites Families (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8370361B2 (en) * 2011-01-17 2013-02-05 Lnx Research, Llc Extracting and normalizing organization names from text
US9390460B2 (en) * 2011-11-04 2016-07-12 Document Security Systems, Inc. System and method for dynamic generation of embedded security features in a document
US9519641B2 (en) 2012-09-18 2016-12-13 Abbyy Development Llc Photography recognition translation
CN105046289B (en) * 2015-08-07 2019-04-26 北京旷视科技有限公司 A kind of domain of discourse kind identification method and domain of discourse identification system
US9747281B2 (en) * 2015-12-07 2017-08-29 Linkedin Corporation Generating multi-language social network user profiles by translation
US10366283B2 (en) * 2016-03-18 2019-07-30 Siemens Industry, Inc. Systems and methods of reading and processing change-of-address forms in a cloud-based architecture
US9990544B1 (en) * 2016-03-31 2018-06-05 Intuit Inc. Data accuracy in OCR by leveraging user data and business rules to improve data accuracy at field level
EP3358471A1 (en) 2017-02-04 2018-08-08 Tata Consultancy Services Limited Systems and methods for assessing quality of input text using recurrent neural networks
CN107220648B (en) * 2017-04-11 2018-06-22 平安科技(深圳)有限公司 The character identifying method and server of Claims Resolution document
CN107491730A (en) 2017-07-14 2017-12-19 浙江大学 A kind of laboratory test report recognition methods based on image procossing
CN107480680A (en) * 2017-07-28 2017-12-15 顺丰科技有限公司 Method, system and the equipment of text information in identification image based on OCR and Bi LSTM
US20190354919A1 (en) * 2018-08-06 2019-11-21 Farrukh Mahboob Methods and systems for automating package handling tasks through deep-learning based package label parsing
CN109034147B (en) * 2018-09-11 2020-08-11 上海唯识律简信息科技有限公司 Optical character recognition optimization method and system based on deep learning and natural language
CN109376658B (en) * 2018-10-26 2022-03-08 信雅达科技股份有限公司 OCR method based on deep learning
CN109389124B (en) * 2018-10-29 2019-09-13 苏州派维斯信息科技有限公司 Receipt categories of information recognition methods
CN109271973A (en) * 2018-11-09 2019-01-25 天津新开心生活科技有限公司 Medicine text OCR method and system

Also Published As

Publication number Publication date
PH12019000478A1 (en) 2021-01-11
US20200380286A1 (en) 2020-12-03
CN112016553A (en) 2020-12-01
MY189247A (en) 2022-01-31
CN112016553B (en) 2022-01-25
US11023766B2 (en) 2021-06-01
PH12019000478B1 (en) 2021-01-11

Similar Documents

Publication Publication Date Title
MY189247A (en) Automatic optical character recognition (ocr) correction
EA200100901A1 (en) SYSTEM AND METHOD FOR AUTOMATED SPEECH RECORDING USING TWO INSTRUMENTS OF SPEECH TRANSFORMATION AND AUTOMATED CORRECTION
KR20110028123A (en) Automatic translation apparatus by using user interaction in mobile device and its method
CN102883055A (en) Mobile terminal with automatic exposure compensation function and automatic exposure compensation method
ATE440332T1 (en) ADAPTIVE MACHINE TRANSLATION
Dong et al. Data augmentation with adversarial training for cross-lingual nli
CN108062737A (en) A kind of line education systems and its method based on recognition of face
Taylor Carib, caliban, cannibal
Shimunek et al. The Earliest Attested Turkic Language: The Chieh (* Kir) Language of the Fourth Century AD
ATE326754T1 (en) HOMOPHONE CHOICE IN LANGUAGE RECOGNITION
TH1901007678A (en) The patent has not yet been advertised.
Lemmenmeier-Batinić et al. XML-Encoding of a spoken Serbian corpus targeting forms of address
CN104599670B (en) The audio recognition method of talking pen
Mooney et al. Learners change artificial languages to constraint free variation in line with typological principles
KR20120101855A (en) Result corrector for dictation speech recognition and result correction method
Shykyrynska The Image of the Danube Bird “Zеgzitsa” in English translations of the “The Song of Igor's Campaign”
Ruiz Motivation as a tool for learning Chinese as a second language
SANDALCI et al. Overview and Comparison to Words of Antiquity Through Homeric Epics
Povidaychyk et al. Development of international communication culture as an important social and pedagogical problem
Syrko RECEPTION OF THE FIGURE OF MYKHAILO SHCHEPKIN IN TARAS SHEVCHENKOS LINGUAL CONSCIOUSNESS
Jishkariani et al. PECULIARITIES OF TRANSLATING SCIENTIFIC TEXTS
Nemes The Intermediate Zone of Translation Part I.: Questions, Dilemmas, Examples from the Translator's Workshop Based on Canto Jo I La Muntanya Balla by Irene Solà
JPS5585973A (en) Picture processor
Kortlandt Baltic ē‑stems revisited
Aleksandrovic The struggle of Catalonia for independence as a crisis of Spanish Unitarianism