CN104134057B - 来自出版物的经ocr处理的文本和对应图像在客户端设备上的选择性显示 - Google Patents

来自出版物的经ocr处理的文本和对应图像在客户端设备上的选择性显示 Download PDF

Info

Publication number
CN104134057B
CN104134057B CN201410345954.6A CN201410345954A CN104134057B CN 104134057 B CN104134057 B CN 104134057B CN 201410345954 A CN201410345954 A CN 201410345954A CN 104134057 B CN104134057 B CN 104134057B
Authority
CN
China
Prior art keywords
text
document
described image
text fragments
image fragment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410345954.6A
Other languages
English (en)
Chinese (zh)
Other versions
CN104134057A (zh
Inventor
V·兰纳卡
A·波帕特
F·豪根
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Google LLC
Original Assignee
Google LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Google LLC filed Critical Google LLC
Publication of CN104134057A publication Critical patent/CN104134057A/zh
Application granted granted Critical
Publication of CN104134057B publication Critical patent/CN104134057B/zh
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/22Character recognition characterised by the type of writing
    • G06V30/224Character recognition characterised by the type of writing of printed characters having additional code marks or containing code marks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/12Detection or correction of errors, e.g. by rescanning the pattern
    • G06V30/127Detection or correction of errors, e.g. by rescanning the pattern with the intervention of an operator
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06KGRAPHICAL DATA READING; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K7/00Methods or arrangements for sensing record carriers, e.g. for reading patterns
    • G06K7/10Methods or arrangements for sensing record carriers, e.g. for reading patterns by electromagnetic radiation, e.g. optical sensing; by corpuscular radiation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/20Drawing from basic elements, e.g. lines or circles
    • G06T11/206Drawing of charts or graphs
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09GARRANGEMENTS OR CIRCUITS FOR CONTROL OF INDICATING DEVICES USING STATIC MEANS TO PRESENT VARIABLE INFORMATION
    • G09G5/00Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators
    • G09G5/14Display of multiple viewports
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/60Editing figures and text; Combining figures or text
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Computer Hardware Design (AREA)
  • Electromagnetism (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Toxicology (AREA)
  • Artificial Intelligence (AREA)
  • Character Discrimination (AREA)
  • Character Input (AREA)
  • Document Processing Apparatus (AREA)
  • Controls And Circuits For Display Device (AREA)
  • User Interface Of Digital Computer (AREA)
CN201410345954.6A 2009-01-28 2010-01-25 来自出版物的经ocr处理的文本和对应图像在客户端设备上的选择性显示 Active CN104134057B (zh)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US14790109P 2009-01-28 2009-01-28
US61/147,901 2009-01-28
US12/366,547 2009-02-05
US12/366,547 US8373724B2 (en) 2009-01-28 2009-02-05 Selective display of OCR'ed text and corresponding images from publications on a client device
CN201080005734.9A CN102301380B (zh) 2009-01-28 2010-01-25 来自出版物的经ocr处理的文本和对应图像在客户端设备上的选择性显示

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN201080005734.9A Division CN102301380B (zh) 2009-01-28 2010-01-25 来自出版物的经ocr处理的文本和对应图像在客户端设备上的选择性显示

Publications (2)

Publication Number Publication Date
CN104134057A CN104134057A (zh) 2014-11-05
CN104134057B true CN104134057B (zh) 2018-02-13

Family

ID=42353827

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201410345954.6A Active CN104134057B (zh) 2009-01-28 2010-01-25 来自出版物的经ocr处理的文本和对应图像在客户端设备上的选择性显示
CN201080005734.9A Expired - Fee Related CN102301380B (zh) 2009-01-28 2010-01-25 来自出版物的经ocr处理的文本和对应图像在客户端设备上的选择性显示

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN201080005734.9A Expired - Fee Related CN102301380B (zh) 2009-01-28 2010-01-25 来自出版物的经ocr处理的文本和对应图像在客户端设备上的选择性显示

Country Status (5)

Country Link
US (4) US8373724B2 (enExample)
JP (2) JP5324669B2 (enExample)
KR (1) KR101315472B1 (enExample)
CN (2) CN104134057B (enExample)
WO (1) WO2010088182A1 (enExample)

Families Citing this family (41)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8373724B2 (en) 2009-01-28 2013-02-12 Google Inc. Selective display of OCR'ed text and corresponding images from publications on a client device
US8442813B1 (en) 2009-02-05 2013-05-14 Google Inc. Methods and systems for assessing the quality of automatically generated text
US20120050819A1 (en) * 2010-08-30 2012-03-01 Jiang Hong Approach For Processing Scanned Document Data
US9083826B2 (en) * 2010-08-31 2015-07-14 Ricoh Company, Ltd. Tracking the processing of electronic document data by network services using trace
US8515930B2 (en) 2010-08-31 2013-08-20 Ricoh Company, Ltd. Merging a scanned document with an existing document on a server
US20120050818A1 (en) * 2010-08-31 2012-03-01 Kaoru Watanabe Sending scanned document data through a network to a mobile device
US20120159376A1 (en) * 2010-12-15 2012-06-21 Microsoft Corporation Editing data records associated with static images
TW201310355A (zh) * 2011-08-19 2013-03-01 Newsoft Technology Corp 經由資訊及指令關聯影像來瀏覽或執行指令的方法及其程式產品
US9069374B2 (en) 2012-01-04 2015-06-30 International Business Machines Corporation Web video occlusion: a method for rendering the videos watched over multiple windows
US10332213B2 (en) 2012-03-01 2019-06-25 Ricoh Company, Ltd. Expense report system with receipt image processing by delegates
US9659327B2 (en) * 2012-03-01 2017-05-23 Ricoh Company, Ltd. Expense report system with receipt image processing
US9245296B2 (en) 2012-03-01 2016-01-26 Ricoh Company Ltd. Expense report system with receipt image processing
JP5983184B2 (ja) * 2012-08-24 2016-08-31 ブラザー工業株式会社 画像処理システム、画像処理方法、画像処理装置、および画像処理プログラム
US9519641B2 (en) * 2012-09-18 2016-12-13 Abbyy Development Llc Photography recognition translation
KR20140081470A (ko) * 2012-12-21 2014-07-01 삼성전자주식회사 문자 확대 표시 방법, 상기 방법이 적용되는 장치, 및 상기 방법을 수행하는 프로그램을 저장하는 컴퓨터로 읽을 수 있는 저장 매체
WO2014154457A1 (en) * 2013-03-29 2014-10-02 Alcatel Lucent Systems and methods for context based scanning
JP6525523B2 (ja) * 2013-07-31 2019-06-05 キヤノン株式会社 情報処理装置、制御方法およびプログラム
US9275554B2 (en) 2013-09-24 2016-03-01 Jimmy M Sauz Device, system, and method for enhanced memorization of a document
US10755590B2 (en) 2015-06-18 2020-08-25 The Joan and Irwin Jacobs Technion-Cornell Institute Method and system for automatically providing graphical user interfaces for computational algorithms described in printed publications
WO2016205628A1 (en) 2015-06-18 2016-12-22 The Joan and Irwin Jacobs Technion-Cornell Institute A method and system for evaluating computational algorithms described in printed publications
US9864734B2 (en) * 2015-08-12 2018-01-09 International Business Machines Corporation Clickable links within live collaborative web meetings
US10044751B2 (en) * 2015-12-28 2018-08-07 Arbor Networks, Inc. Using recurrent neural networks to defeat DNS denial of service attacks
US9501696B1 (en) 2016-02-09 2016-11-22 William Cabán System and method for metadata extraction, mapping and execution
US10607101B1 (en) 2016-12-14 2020-03-31 Revenue Management Solutions, Llc System and method for patterned artifact removal for bitonal images
CN108628814A (zh) * 2017-03-20 2018-10-09 珠海金山办公软件有限公司 一种快速插入识别文字的方法及装置
JP6946690B2 (ja) * 2017-03-24 2021-10-06 カシオ計算機株式会社 表示装置、表示方法及びプログラム
WO2019022725A1 (en) * 2017-07-25 2019-01-31 Hewlett-Packard Development Company, L.P. DETERMINATIONS OF SHARED CHARACTER RECOGNITION
JP6891073B2 (ja) * 2017-08-22 2021-06-18 キヤノン株式会社 スキャン画像にファイル名等を設定するための装置、その制御方法及びプログラム
CN109981421B (zh) * 2017-12-27 2022-02-01 九阳股份有限公司 一种智能设备配网方法和装置
GB201804383D0 (en) 2018-03-19 2018-05-02 Microsoft Technology Licensing Llc Multi-endpoint mixed reality meetings
CN110969056B (zh) * 2018-09-29 2023-08-08 杭州海康威视数字技术股份有限公司 文档图像的文档版面分析方法、装置及存储介质
CN111475999B (zh) * 2019-01-22 2023-04-14 阿里巴巴集团控股有限公司 错误提示的生成方法、装置
CN110377885B (zh) * 2019-06-14 2023-09-26 北京百度网讯科技有限公司 转换pdf文件的方法、装置、设备和计算机存储介质
US11403162B2 (en) * 2019-10-17 2022-08-02 Dell Products L.P. System and method for transferring diagnostic data via a framebuffer
US11205084B2 (en) * 2020-02-17 2021-12-21 Wipro Limited Method and system for evaluating an image quality for optical character recognition (OCR)
US11436713B2 (en) 2020-02-19 2022-09-06 International Business Machines Corporation Application error analysis from screenshot
US11842035B2 (en) * 2020-08-04 2023-12-12 Bentley Systems, Incorporated Techniques for labeling, reviewing and correcting label predictions for PandIDS
CN112131841A (zh) * 2020-08-27 2020-12-25 北京云动智效网络科技有限公司 一种文档质量评估方法及系统
CN115016710B (zh) * 2021-11-12 2023-06-16 荣耀终端有限公司 应用程序推荐方法
US20240095452A1 (en) * 2022-09-16 2024-03-21 Citrix Systems, Inc. Unicode based estimation of text intelligibility
CN117217876B (zh) * 2023-11-08 2024-03-26 深圳市明心数智科技有限公司 基于ocr技术的订单预处理方法、装置、设备及介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5889897A (en) * 1997-04-08 1999-03-30 International Patent Holdings Ltd. Methodology for OCR error checking through text image regeneration
CN1848109A (zh) * 2005-04-13 2006-10-18 摩托罗拉公司 用于编辑光学字符识别结果的方法和系统
CN1916941A (zh) * 2005-08-18 2007-02-21 北大方正集团有限公司 一种字符识别的后处理方法

Family Cites Families (48)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5675672A (en) * 1990-06-26 1997-10-07 Seiko Epson Corporation Two dimensional linker for character string data
JPH0581467A (ja) * 1991-08-29 1993-04-02 Canon Inc 画像処理方法及び装置
US5325297A (en) * 1992-06-25 1994-06-28 System Of Multiple-Colored Images For Internationally Listed Estates, Inc. Computer implemented method and system for storing and retrieving textual data and compressed image data
JPH07249098A (ja) * 1994-03-09 1995-09-26 Toshiba Corp 情報処理装置および情報処理方法
US5764799A (en) * 1995-06-26 1998-06-09 Research Foundation Of State Of State Of New York OCR method and apparatus using image equivalents
US6137906A (en) * 1997-06-27 2000-10-24 Kurzweil Educational Systems, Inc. Closest word algorithm
US6023534A (en) * 1997-08-04 2000-02-08 Xerox Corporation Method of extracting image data from an area generated with a halftone pattern
GB9809679D0 (en) * 1998-05-06 1998-07-01 Xerox Corp Portable text capturing method and device therefor
JP2000112955A (ja) * 1998-09-30 2000-04-21 Toshiba Corp 画像表示方法および画像ファイリング装置および記録媒体
US6278969B1 (en) 1999-08-18 2001-08-21 International Business Machines Corp. Method and system for improving machine translation accuracy using translation memory
US6587583B1 (en) * 1999-09-17 2003-07-01 Kurzweil Educational Systems, Inc. Compression/decompression algorithm for image documents having text, graphical and color content
GB2359953B (en) * 2000-03-03 2004-02-11 Hewlett Packard Co Improvements relating to image capture systems
US6678415B1 (en) 2000-05-12 2004-01-13 Xerox Corporation Document image decoding using an integrated stochastic language model
US6738518B1 (en) 2000-05-12 2004-05-18 Xerox Corporation Document image decoding using text line column-based heuristic scoring
JP4613397B2 (ja) * 2000-06-28 2011-01-19 コニカミノルタビジネステクノロジーズ株式会社 画像認識装置、画像認識方法および画像認識プログラムを記録したコンピュータ読取可能な記録媒体
JP2002049890A (ja) * 2000-08-01 2002-02-15 Minolta Co Ltd 画像認識装置、画像認識方法および画像認識プログラムを記録したコンピュータ読取可能な記録媒体
US20020102966A1 (en) * 2000-11-06 2002-08-01 Lev Tsvi H. Object identification method for portable devices
US6957384B2 (en) * 2000-12-27 2005-10-18 Tractmanager, Llc Document management system
JP4421134B2 (ja) * 2001-04-18 2010-02-24 富士通株式会社 文書画像検索装置
JP2002358481A (ja) * 2001-06-01 2002-12-13 Ricoh Elemex Corp 画像処理装置
US7171061B2 (en) 2002-07-12 2007-01-30 Xerox Corporation Systems and methods for triage of passages of text output from an OCR system
US8533270B2 (en) 2003-06-23 2013-09-10 Microsoft Corporation Advanced spam detection techniques
US8301893B2 (en) * 2003-08-13 2012-10-30 Digimarc Corporation Detecting media areas likely of hosting watermarks
JP2005107684A (ja) * 2003-09-29 2005-04-21 Fuji Photo Film Co Ltd 画像処理方法及び画像入出力装置
CN1871608A (zh) * 2003-10-27 2006-11-29 皇家飞利浦电子股份有限公司 搜索结果的逐屏幕呈现
JP2005352735A (ja) * 2004-06-10 2005-12-22 Fuji Xerox Co Ltd 文書ファイル作成支援装置、文書ファイル作成支援方法及びそのプログラム
JP2006031299A (ja) * 2004-07-15 2006-02-02 Hitachi Ltd 文字認識方法、文字データの修正履歴処理方法およびシステム
US20060041503A1 (en) * 2004-08-21 2006-02-23 Blair William R Collaborative negotiation methods, systems, and apparatuses for extended commerce
US7669148B2 (en) * 2005-08-23 2010-02-23 Ricoh Co., Ltd. System and methods for portable device for mixed media system
US8156427B2 (en) * 2005-08-23 2012-04-10 Ricoh Co. Ltd. User interface for mixed media reality
US7639387B2 (en) * 2005-08-23 2009-12-29 Ricoh Co., Ltd. Authoring tools using a mixed media environment
US20060083431A1 (en) * 2004-10-20 2006-04-20 Bliss Harry M Electronic device and method for visual text interpretation
US7809722B2 (en) * 2005-05-09 2010-10-05 Like.Com System and method for enabling search and retrieval from image files based on recognized information
US7760917B2 (en) * 2005-05-09 2010-07-20 Like.Com Computer-implemented method for performing similarity searches
KR100714393B1 (ko) * 2005-09-16 2007-05-07 삼성전자주식회사 텍스트 추출 기능을 갖는 호스트 장치 및 그의 텍스트 추출방법
US7796837B2 (en) * 2005-09-22 2010-09-14 Google Inc. Processing an image map for display on computing device
US8849821B2 (en) * 2005-11-04 2014-09-30 Nokia Corporation Scalable visual search system simplifying access to network and device functionality
US7822596B2 (en) * 2005-12-05 2010-10-26 Microsoft Corporation Flexible display translation
KR20080002084A (ko) * 2006-06-30 2008-01-04 삼성전자주식회사 광학 문자 판독을 위한 시스템 및 광학 문자 판독방법
US7912700B2 (en) 2007-02-08 2011-03-22 Microsoft Corporation Context based word prediction
US8763038B2 (en) * 2009-01-26 2014-06-24 Sony Corporation Capture of stylized TV table data via OCR
US20080267504A1 (en) * 2007-04-24 2008-10-30 Nokia Corporation Method, device and computer program product for integrating code-based and optical character recognition technologies into a mobile visual search
CN101419661B (zh) * 2007-10-26 2011-08-24 国际商业机器公司 基于图像中的文本进行图像显示的方法和系统
US8331677B2 (en) * 2009-01-08 2012-12-11 Microsoft Corporation Combined image and text document
US8373724B2 (en) * 2009-01-28 2013-02-12 Google Inc. Selective display of OCR'ed text and corresponding images from publications on a client device
US8442813B1 (en) 2009-02-05 2013-05-14 Google Inc. Methods and systems for assessing the quality of automatically generated text
US8588528B2 (en) * 2009-06-23 2013-11-19 K-Nfb Reading Technology, Inc. Systems and methods for displaying scanned images with overlaid text
US20110128288A1 (en) * 2009-12-02 2011-06-02 David Petrou Region of Interest Selector for Visual Queries

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5889897A (en) * 1997-04-08 1999-03-30 International Patent Holdings Ltd. Methodology for OCR error checking through text image regeneration
CN1848109A (zh) * 2005-04-13 2006-10-18 摩托罗拉公司 用于编辑光学字符识别结果的方法和系统
CN1916941A (zh) * 2005-08-18 2007-02-21 北大方正集团有限公司 一种字符识别的后处理方法

Also Published As

Publication number Publication date
WO2010088182A1 (en) 2010-08-05
KR20110124255A (ko) 2011-11-16
JP5324669B2 (ja) 2013-10-23
US20130265325A1 (en) 2013-10-10
US20140125693A1 (en) 2014-05-08
US9280952B2 (en) 2016-03-08
JP2012516508A (ja) 2012-07-19
JP6254374B2 (ja) 2017-12-27
KR101315472B1 (ko) 2013-10-04
CN102301380A (zh) 2011-12-28
US20130002710A1 (en) 2013-01-03
JP2014032665A (ja) 2014-02-20
US8675012B2 (en) 2014-03-18
US8482581B2 (en) 2013-07-09
US8373724B2 (en) 2013-02-12
US20100188419A1 (en) 2010-07-29
CN104134057A (zh) 2014-11-05
CN102301380B (zh) 2014-08-20

Similar Documents

Publication Publication Date Title
CN104134057B (zh) 来自出版物的经ocr处理的文本和对应图像在客户端设备上的选择性显示
JP4945813B2 (ja) 印刷構造化文書
US10902193B2 (en) Automated generation of web forms using fillable electronic documents
US9619440B2 (en) Document conversion apparatus
JP5664174B2 (ja) 持ち運び可能な電子ファイルからキャラクタの外接矩形を抽出する装置及び方法
US20140250375A1 (en) Method and system for summarizing documents
US7715625B2 (en) Image processing device, image processing method, and storage medium storing program therefor
US9614984B2 (en) Electronic document generation system and recording medium
US9864750B2 (en) Objectification with deep searchability
JP4691071B2 (ja) ページアクション起動装置、ページアクション起動制御方法、および、ページアクション起動制御プログラム
JP2002169637A (ja) ドキュメント表示態様変換装置、ドキュメント表示態様変換方法、記録媒体
US9019552B2 (en) Information processing apparatus, system and method for outputting data to a medium
JP6045393B2 (ja) 情報処理システム
KR20110074422A (ko) 상세정보 이미지 파일 생성 방법 및 장치
EP4524914A1 (en) Information processing apparatus, information processing method, and program
JP2024076693A (ja) 画像処理装置、画像処理方法、及びプログラム
JP2024115651A (ja) データ処理システム、及び、その制御方法
JP2007087197A (ja) 文書処理装置、文書処理方法およびプログラム

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder

Address after: American California

Patentee after: Google limited liability company

Address before: American California

Patentee before: Google Inc.

CP01 Change in the name or title of a patent holder