CN101661458B - 电子文档处理装置及电子文档处理方法 - Google Patents

电子文档处理装置及电子文档处理方法 Download PDF

Info

Publication number
CN101661458B
CN101661458B CN2009101665726A CN200910166572A CN101661458B CN 101661458 B CN101661458 B CN 101661458B CN 2009101665726 A CN2009101665726 A CN 2009101665726A CN 200910166572 A CN200910166572 A CN 200910166572A CN 101661458 B CN101661458 B CN 101661458B
Authority
CN
China
Prior art keywords
text string
line segment
electronic document
text
document processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2009101665726A
Other languages
English (en)
Chinese (zh)
Other versions
CN101661458A (zh
Inventor
伊丹刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Canon Inc
Original Assignee
Canon Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Canon Inc filed Critical Canon Inc
Publication of CN101661458A publication Critical patent/CN101661458A/zh
Application granted granted Critical
Publication of CN101661458B publication Critical patent/CN101661458B/zh
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/137Hierarchical processing, e.g. outlines
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • G06F40/106Display of layout of documents; Previewing
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/414Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Multimedia (AREA)
  • Document Processing Apparatus (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Processing Or Creating Images (AREA)
CN2009101665726A 2008-08-29 2009-08-26 电子文档处理装置及电子文档处理方法 Expired - Fee Related CN101661458B (zh)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
JP2008221912 2008-08-29
JP2008221912A JP5247311B2 (ja) 2008-08-29 2008-08-29 電子文書処理装置および電子文書処理方法
JP2008-221912 2008-08-29

Publications (2)

Publication Number Publication Date
CN101661458A CN101661458A (zh) 2010-03-03
CN101661458B true CN101661458B (zh) 2012-08-29

Family

ID=41727109

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2009101665726A Expired - Fee Related CN101661458B (zh) 2008-08-29 2009-08-26 电子文档处理装置及电子文档处理方法

Country Status (3)

Country Link
US (1) US8225205B2 (enExample)
JP (1) JP5247311B2 (enExample)
CN (1) CN101661458B (enExample)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4881048B2 (ja) * 2006-04-03 2012-02-22 キヤノン株式会社 情報処理装置および情報処理方法および情報処理プログラム
JP5247311B2 (ja) * 2008-08-29 2013-07-24 キヤノン株式会社 電子文書処理装置および電子文書処理方法
CN101833567A (zh) * 2010-03-31 2010-09-15 北京志腾新诺科技有限公司 文档转换方法、装置及系统
CN102376078A (zh) * 2010-08-25 2012-03-14 北京中科亚创科技有限责任公司 对场景自动标记的排序方法及装置
HRP20130700B1 (hr) * 2013-07-23 2016-03-11 Microblink D.O.O. Sustav za adaptivnu detekciju i ekstrakciju struktura iz strojno generiranih dokumenata
JP2018088116A (ja) * 2016-11-29 2018-06-07 キヤノン株式会社 情報処理装置、プログラム、情報処理方法
CN117422071B (zh) * 2023-12-19 2024-03-15 中南大学 一种文本词项多重分割标注转换方法及装置

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5832531A (en) * 1994-09-12 1998-11-03 Adobe Systems Incorporated Method and apparatus for identifying words described in a page description language file
CN1744087A (zh) * 2004-09-02 2006-03-08 佳能株式会社 搜索文档的文档处理装置及其控制方法

Family Cites Families (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5182709A (en) * 1986-03-31 1993-01-26 Wang Laboratories, Inc. System for parsing multidimensional and multidirectional text into encoded units and storing each encoded unit as a separate data structure
US5748953A (en) * 1989-06-14 1998-05-05 Hitachi, Ltd. Document search method wherein stored documents and search queries comprise segmented text data of spaced, nonconsecutive text elements and words segmented by predetermined symbols
US5757983A (en) * 1990-08-09 1998-05-26 Hitachi, Ltd. Document retrieval method and system
JPH0512398A (ja) * 1990-12-28 1993-01-22 Mutoh Ind Ltd 画像編集方法および装置
US5953735A (en) * 1991-03-20 1999-09-14 Forcier; Mitchell D. Script character processing method and system with bit-mapped document editing
US5623679A (en) * 1993-11-19 1997-04-22 Waverley Holdings, Inc. System and method for creating and manipulating notes each containing multiple sub-notes, and linking the sub-notes to portions of data objects
US5745745A (en) * 1994-06-29 1998-04-28 Hitachi, Ltd. Text search method and apparatus for structured documents
JP3504054B2 (ja) * 1995-07-17 2004-03-08 株式会社東芝 文書処理装置および文書処理方法
JP4153989B2 (ja) * 1996-07-11 2008-09-24 株式会社日立製作所 文書検索配送方法および装置
US5973692A (en) * 1997-03-10 1999-10-26 Knowlton; Kenneth Charles System for the capture and indexing of graphical representations of files, information sources and the like
US6029167A (en) * 1997-07-25 2000-02-22 Claritech Corporation Method and apparatus for retrieving text using document signatures
JPH11203494A (ja) * 1998-01-16 1999-07-30 Fuji Xerox Co Ltd 文書解析装置
US20050193425A1 (en) * 2000-07-24 2005-09-01 Sanghoon Sull Delivery and presentation of content-relevant information associated with frames of audio-visual programs
US7136082B2 (en) * 2002-01-25 2006-11-14 Xerox Corporation Method and apparatus to convert digital ink images for use in a structured text/graphics editor
US20030237055A1 (en) * 2002-06-20 2003-12-25 Thomas Lange Methods and systems for processing text elements
US7386789B2 (en) * 2004-02-27 2008-06-10 Hewlett-Packard Development Company, L.P. Method for determining logical components of a document
JP2007086955A (ja) * 2005-09-21 2007-04-05 Fuji Xerox Co Ltd 情報処理装置、および情報処理方法、並びにコンピュータ・プログラム
US20080288537A1 (en) * 2007-05-16 2008-11-20 Fuji Xerox Co., Ltd. System and method for slide stream indexing based on multi-dimensional content similarity
JP5247311B2 (ja) * 2008-08-29 2013-07-24 キヤノン株式会社 電子文書処理装置および電子文書処理方法

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5832531A (en) * 1994-09-12 1998-11-03 Adobe Systems Incorporated Method and apparatus for identifying words described in a page description language file
CN1744087A (zh) * 2004-09-02 2006-03-08 佳能株式会社 搜索文档的文档处理装置及其控制方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JP特开平5-258101A 1993.10.08

Also Published As

Publication number Publication date
JP5247311B2 (ja) 2013-07-24
CN101661458A (zh) 2010-03-03
US20100058175A1 (en) 2010-03-04
JP2010055512A (ja) 2010-03-11
US8225205B2 (en) 2012-07-17

Similar Documents

Publication Publication Date Title
CN101661458B (zh) 电子文档处理装置及电子文档处理方法
US9367605B2 (en) Abstract generating search method and system
US20070180471A1 (en) Presenting digitized content on a network using a cross-linked layer of electronic documents derived from a relational database
US20090049375A1 (en) Selective processing of information from a digital copy of a document for data entry
CN112395418B (zh) 网页中的目标对象提取方法、装置、电子设备
JP6525641B2 (ja) 情報処理システム、制御方法、およびコンピュータプログラム
JP2010015554A (ja) 表構造解析装置、表構造解析方法および表構造解析プログラム
CN108763244A (zh) 在图像内搜索和注释
GB2606474A (en) Logical grouping of exported text blocks
US20130167016A1 (en) Panoptic Visualization Document Layout
US7200811B1 (en) Form processing apparatus, form processing method, recording medium and program
US9152617B2 (en) System and method for processing objects
US9141867B1 (en) Determining word segment boundaries
CN105302626B (zh) Xps结构化数据的解析方法
US20120046937A1 (en) Semantic classification of variable data campaign information
US10268761B2 (en) Panoptic visualization document collection
US12265787B2 (en) Document difference viewing and navigation
TW201416884A (zh) 字型發布系統及字型發布方法
CN104169918A (zh) 信息管理设备、信息管理方法和程序
JP2012014608A (ja) 情報処理装置、情報処理方法、及びプログラム
JPH08314966A (ja) 文書検索装置のインデックス作成方法及び文書検索装置
JP3114465B2 (ja) 情報提示装置
US20110016380A1 (en) Form editing apparatus, form editing method, and storage medium
CN108874829B (zh) 网页的处理方法、装置、智能设备及计算机存储介质
US20130254640A1 (en) Panoptic Visualization Of An Illustrated Parts Catalog

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20120829

Termination date: 20200826