JP5355625B2 - 光学式文字認識用に画像を前処理するための方法およびシステム - Google Patents

光学式文字認識用に画像を前処理するための方法およびシステム Download PDF

Info

Publication number
JP5355625B2
JP5355625B2 JP2011129862A JP2011129862A JP5355625B2 JP 5355625 B2 JP5355625 B2 JP 5355625B2 JP 2011129862 A JP2011129862 A JP 2011129862A JP 2011129862 A JP2011129862 A JP 2011129862A JP 5355625 B2 JP5355625 B2 JP 5355625B2
Authority
JP
Japan
Prior art keywords
components
height
column
word
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
JP2011129862A
Other languages
English (en)
Japanese (ja)
Other versions
JP2012003756A (ja
JP2012003756A5 (enExample
Inventor
フセイン・ハリド・アル−オマリ
モハメド・スレイマン・ホルシード
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
King Abdulaziz City for Science and Technology KACST
Original Assignee
King Abdulaziz City for Science and Technology KACST
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by King Abdulaziz City for Science and Technology KACST filed Critical King Abdulaziz City for Science and Technology KACST
Publication of JP2012003756A publication Critical patent/JP2012003756A/ja
Publication of JP2012003756A5 publication Critical patent/JP2012003756A5/ja
Application granted granted Critical
Publication of JP5355625B2 publication Critical patent/JP5355625B2/ja
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/414Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/28Character recognition specially adapted to the type of the alphabet, e.g. Latin alphabet
    • G06V30/293Character recognition specially adapted to the type of the alphabet, e.g. Latin alphabet of characters other than Kanji, Hiragana or Katakana

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Character Input (AREA)
JP2011129862A 2010-06-12 2011-06-10 光学式文字認識用に画像を前処理するための方法およびシステム Expired - Fee Related JP5355625B2 (ja)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US12/814,448 2010-06-12
US12/814,448 US8218875B2 (en) 2010-06-12 2010-06-12 Method and system for preprocessing an image for optical character recognition

Publications (3)

Publication Number Publication Date
JP2012003756A JP2012003756A (ja) 2012-01-05
JP2012003756A5 JP2012003756A5 (enExample) 2013-07-18
JP5355625B2 true JP5355625B2 (ja) 2013-11-27

Family

ID=44654616

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2011129862A Expired - Fee Related JP5355625B2 (ja) 2010-06-12 2011-06-10 光学式文字認識用に画像を前処理するための方法およびシステム

Country Status (3)

Country Link
US (2) US8218875B2 (enExample)
EP (1) EP2395453A3 (enExample)
JP (1) JP5355625B2 (enExample)

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8218875B2 (en) 2010-06-12 2012-07-10 Hussein Khalid Al-Omari Method and system for preprocessing an image for optical character recognition
US8542926B2 (en) * 2010-11-19 2013-09-24 Microsoft Corporation Script-agnostic text reflow for document images
US9734132B1 (en) * 2011-12-20 2017-08-15 Amazon Technologies, Inc. Alignment and reflow of displayed character images
JP5994251B2 (ja) * 2012-01-06 2016-09-21 富士ゼロックス株式会社 画像処理装置及びプログラム
EP2836962A4 (en) * 2012-04-12 2016-07-27 Tata Consultancy Services Ltd SYSTEM AND METHOD FOR DETECTION AND SEGMENTATION OF CHARACTERISTIC MATTERS FOR OPTICAL CHARACTER RECOGNITION (OCR)
EP2662802A1 (en) * 2012-05-09 2013-11-13 King Abdulaziz City for Science & Technology (KACST) Method and system for preprocessing an image for optical character recognition
US9785240B2 (en) * 2013-03-18 2017-10-10 Fuji Xerox Co., Ltd. Systems and methods for content-aware selection
JP5986051B2 (ja) * 2013-05-12 2016-09-06 キング・アブドゥルアジズ・シティ・フォー・サイエンス・アンド・テクノロジー(ケイ・エイ・シィ・エス・ティ)King Abdulaziz City For Science And Technology (Kacst) アラビア語テキストを自動的に認識するための方法
US20160098597A1 (en) * 2013-06-18 2016-04-07 Abbyy Development Llc Methods and systems that generate feature symbols with associated parameters in order to convert images to electronic documents
US9235755B2 (en) * 2013-08-15 2016-01-12 Konica Minolta Laboratory U.S.A., Inc. Removal of underlines and table lines in document images while preserving intersecting character strokes
US9292739B1 (en) * 2013-12-12 2016-03-22 A9.Com, Inc. Automated recognition of text utilizing multiple images
US9288362B2 (en) 2014-02-03 2016-03-15 King Fahd University Of Petroleum And Minerals Technique for skew detection of printed arabic documents
US9367766B2 (en) * 2014-07-22 2016-06-14 Adobe Systems Incorporated Text line detection in images
JP2016181111A (ja) * 2015-03-24 2016-10-13 富士ゼロックス株式会社 画像処理装置、及び画像処理プログラム
CN106156766B (zh) 2015-03-25 2020-02-18 阿里巴巴集团控股有限公司 文本行分类器的生成方法及装置
US10430649B2 (en) 2017-07-14 2019-10-01 Adobe Inc. Text region detection in digital images using image tag filtering
US11366968B2 (en) * 2019-07-29 2022-06-21 Intuit Inc. Region proposal networks for automated bounding box detection and text segmentation
US11270153B2 (en) 2020-02-19 2022-03-08 Northrop Grumman Systems Corporation System and method for whole word conversion of text in image
JP7528542B2 (ja) * 2020-06-03 2024-08-06 株式会社リコー 画像処理装置、方法およびプログラム
FR3155939A1 (fr) * 2023-11-27 2025-05-30 Orange Procédé d’analyse d’au moins une image, dispositif électronique et produit programme d’ordinateur correspondant

Family Cites Families (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5058182A (en) * 1988-05-02 1991-10-15 The Research Foundation Of State Univ. Of New York Method and apparatus for handwritten character recognition
US5224179A (en) * 1988-12-20 1993-06-29 At&T Bell Laboratories Image skeletonization method
US5680479A (en) * 1992-04-24 1997-10-21 Canon Kabushiki Kaisha Method and apparatus for character recognition
JP3253356B2 (ja) * 1992-07-06 2002-02-04 株式会社リコー 文書画像の領域識別方法
US5987170A (en) * 1992-09-28 1999-11-16 Matsushita Electric Industrial Co., Ltd. Character recognition machine utilizing language processing
US5410611A (en) * 1993-12-17 1995-04-25 Xerox Corporation Method for identifying word bounding boxes in text
CA2166248C (en) * 1995-12-28 2000-01-04 Abdel Naser Al-Karmi Optical character recognition of handwritten or cursive text
JPH11232378A (ja) * 1997-12-09 1999-08-27 Canon Inc デジタルカメラ、そのデジタルカメラを用いた文書処理システム、コンピュータ可読の記憶媒体、及び、プログラムコード送出装置
JP4323606B2 (ja) * 1999-03-01 2009-09-02 理想科学工業株式会社 文書画像傾き検出装置
US7298903B2 (en) * 2001-06-28 2007-11-20 Microsoft Corporation Method and system for separating text and drawings in digital ink
US7062090B2 (en) * 2002-06-28 2006-06-13 Microsoft Corporation Writing guide for a free-form document editor
US20040096102A1 (en) * 2002-11-18 2004-05-20 Xerox Corporation Methodology for scanned color document segmentation
US7499588B2 (en) * 2004-05-20 2009-03-03 Microsoft Corporation Low resolution OCR for camera acquired documents
US8139828B2 (en) * 2005-10-21 2012-03-20 Carestream Health, Inc. Method for enhanced visualization of medical images
JP4757001B2 (ja) * 2005-11-25 2011-08-24 キヤノン株式会社 画像処理装置、画像処理方法
US7668394B2 (en) * 2005-12-21 2010-02-23 Lexmark International, Inc. Background intensity correction of a scan of a document
US7724957B2 (en) * 2006-07-31 2010-05-25 Microsoft Corporation Two tiered text recognition
JP4988842B2 (ja) * 2007-06-28 2012-08-01 富士通株式会社 表データ生成プログラム、表データ生成方法および表データ生成装置
US20110043869A1 (en) * 2007-12-21 2011-02-24 Nec Corporation Information processing system, its method and program
US8027539B2 (en) * 2008-01-11 2011-09-27 Sharp Laboratories Of America, Inc. Method and apparatus for determining an orientation of a document including Korean characters
US8009928B1 (en) * 2008-01-23 2011-08-30 A9.Com, Inc. Method and system for detecting and recognizing text in images
US8150160B2 (en) * 2009-03-26 2012-04-03 King Fahd University Of Petroleum & Minerals Automatic Arabic text image optical character recognition method
TWI394098B (zh) * 2009-06-03 2013-04-21 Nat Univ Chung Cheng Shredding Method Based on File Image Texture Feature
US8086039B2 (en) * 2010-02-05 2011-12-27 Palo Alto Research Center Incorporated Fine-grained visual document fingerprinting for accurate document comparison and retrieval
US20110280481A1 (en) * 2010-05-17 2011-11-17 Microsoft Corporation User correction of errors arising in a textual document undergoing optical character recognition (ocr) process
US8218875B2 (en) 2010-06-12 2012-07-10 Hussein Khalid Al-Omari Method and system for preprocessing an image for optical character recognition

Also Published As

Publication number Publication date
JP2012003756A (ja) 2012-01-05
US8548246B2 (en) 2013-10-01
US20110305387A1 (en) 2011-12-15
EP2395453A2 (en) 2011-12-14
US20120219220A1 (en) 2012-08-30
EP2395453A3 (en) 2013-08-28
US8218875B2 (en) 2012-07-10

Similar Documents

Publication Publication Date Title
JP5355625B2 (ja) 光学式文字認識用に画像を前処理するための方法およびシステム
JP5355621B2 (ja) 光学式文字認識用に画像を前処理するための方法およびシステム
US8571270B2 (en) Segmentation of a word bitmap into individual characters or glyphs during an OCR process
CN113486828A (zh) 图像处理方法、装置、设备和存储介质
Dongre et al. Devnagari document segmentation using histogram approach
US20030012438A1 (en) Multiple size reductions for image segmentation
JPH0721319A (ja) 自動アジア言語決定装置
CN109598185B (zh) 图像识别翻译方法、装置、设备及可读存储介质
Shehu et al. Character recognition using correlation & hamming distance
KR101571681B1 (ko) 동질 영역을 이용한 문서 구조의 분석 방법
Jindal et al. A new method for segmentation of pre-detected Devanagari words from the scene images: Pihu method
JP2013097561A (ja) 単語間空白検出装置、単語間空白検出方法及び単語間空白検出用コンピュータプログラム
Kshetry Image preprocessing and modified adaptive thresholding for improving OCR
CN102542269B (zh) 西文单词切分方法和装置
JP6082306B2 (ja) 光学式文字認識用に画像を前処理するための方法およびシステム
JP3058489B2 (ja) 文字列抽出方法
Roy et al. An approach towards segmentation of real time handwritten text
Siddique et al. An absolute Optical Character Recognition system for Bangla script Utilizing a captured image
CN117710985B (zh) 光学字符识别方法、装置及智能终端
JP2004046528A (ja) 文書方向推定方法および文書方向推定プログラム
Zaw et al. Segmentation Method for Myanmar Character Recognition Using Block based Pixel Count and Aspect Ratio
Ajodani et al. Line Segmentation in Persian Texts in Double Columns Using Hierarchical Clustering Algorithms
Kuhl et al. Model-based character recognition in low resolution
Deivalakshmi A simple system for table extraction irrespective of boundary thickness and removal of detected spurious lines
Siddique et al. An absolute Optical Character Recognition system for Bangla script from a captured image

Legal Events

Date Code Title Description
A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20130530

A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20130530

A871 Explanation of circumstances concerning accelerated examination

Free format text: JAPANESE INTERMEDIATE CODE: A871

Effective date: 20130530

A975 Report on accelerated examination

Free format text: JAPANESE INTERMEDIATE CODE: A971005

Effective date: 20130619

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20130625

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20130722

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20130820

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20130827

R150 Certificate of patent or registration of utility model

Free format text: JAPANESE INTERMEDIATE CODE: R150

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

LAPS Cancellation because of no payment of annual fees