ATE375561T1 - Verfahren zur identifizierung von redundantem text in elektronischen dokumenten - Google Patents

Verfahren zur identifizierung von redundantem text in elektronischen dokumenten

Info

Publication number
ATE375561T1
ATE375561T1 AT05012452T AT05012452T ATE375561T1 AT E375561 T1 ATE375561 T1 AT E375561T1 AT 05012452 T AT05012452 T AT 05012452T AT 05012452 T AT05012452 T AT 05012452T AT E375561 T1 ATE375561 T1 AT E375561T1
Authority
AT
Austria
Prior art keywords
text
redundant
text fragments
page
candidates
Prior art date
Application number
AT05012452T
Other languages
English (en)
Inventor
Serge Bronstein
Original Assignee
Pdflib Gmbh
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Pdflib Gmbh filed Critical Pdflib Gmbh
Application granted granted Critical
Publication of ATE375561T1 publication Critical patent/ATE375561T1/de

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Document Processing Apparatus (AREA)
  • Machine Translation (AREA)
AT05012452T 2005-06-09 2005-06-09 Verfahren zur identifizierung von redundantem text in elektronischen dokumenten ATE375561T1 (de)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
EP05012452A EP1732012B1 (de) 2005-06-09 2005-06-09 Verfahren zur Identifizierung von redundantem Text in elektronischen Dokumenten

Publications (1)

Publication Number Publication Date
ATE375561T1 true ATE375561T1 (de) 2007-10-15

Family

ID=35149042

Family Applications (1)

Application Number Title Priority Date Filing Date
AT05012452T ATE375561T1 (de) 2005-06-09 2005-06-09 Verfahren zur identifizierung von redundantem text in elektronischen dokumenten

Country Status (4)

Country Link
US (1) US7643682B2 (de)
EP (1) EP1732012B1 (de)
AT (1) ATE375561T1 (de)
DE (1) DE602005002835T2 (de)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4345772B2 (ja) * 2006-04-21 2009-10-14 セイコーエプソン株式会社 文書編集装置、プログラムおよび記憶媒体
US20090199087A1 (en) * 2008-02-04 2009-08-06 Microsoft Corporation Applying rich visual effects to arbitrary runs of text
US9063911B2 (en) * 2009-01-02 2015-06-23 Apple Inc. Identification of layout and content flow of an unstructured document
CN101937312B (zh) * 2010-09-15 2014-03-19 中兴通讯股份有限公司 一种电子书的标记方法及移动终端
CN101976232B (zh) * 2010-09-19 2012-06-20 深圳市万兴软件有限公司 一种识别文档中数据表格的方法及装置
US9471550B2 (en) * 2012-10-16 2016-10-18 Linkedin Corporation Method and apparatus for document conversion with font metrics adjustment for format compatibility
US9563635B2 (en) 2013-10-28 2017-02-07 International Business Machines Corporation Automated recognition of patterns in a log file having unknown grammar
US10373343B1 (en) 2015-05-28 2019-08-06 Certainteed Corporation System for visualization of a building material
JP6744571B2 (ja) * 2016-06-22 2020-08-19 富士ゼロックス株式会社 情報処理装置およびプログラム
JP6797610B2 (ja) * 2016-08-31 2020-12-09 キヤノン株式会社 装置、方法、及びプログラム
US11195324B1 (en) 2018-08-14 2021-12-07 Certainteed Llc Systems and methods for visualization of building structures
CN113298079B (zh) * 2021-06-28 2023-10-27 北京奇艺世纪科技有限公司 一种图像处理方法、装置、电子设备及存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5168147A (en) * 1990-07-31 1992-12-01 Xerox Corporation Binary image processing for decoding self-clocking glyph shape codes
US5321773A (en) * 1991-12-10 1994-06-14 Xerox Corporation Image recognition method using finite state networks
US6336124B1 (en) * 1998-10-01 2002-01-01 Bcl Computers, Inc. Conversion data representing a document to other formats for manipulation and display
US6641053B1 (en) * 2002-10-16 2003-11-04 Xerox Corp. Foreground/background document processing with dataglyphs

Also Published As

Publication number Publication date
DE602005002835T2 (de) 2008-02-07
DE602005002835D1 (de) 2007-11-22
EP1732012B1 (de) 2007-10-10
US20060282769A1 (en) 2006-12-14
US7643682B2 (en) 2010-01-05
EP1732012A1 (de) 2006-12-13

Similar Documents

Publication Publication Date Title
ATE373274T1 (de) Verfahren zur identifizierung von wörtern in einem elektronischen dokument
DE60336146D1 (de) Fontsystem und verfahren mit skalierbarem strich
ATE375561T1 (de) Verfahren zur identifizierung von redundantem text in elektronischen dokumenten
ATE392667T1 (de) Verfahren und computersystem zum indexieren strukturierter dokumente
US20100199168A1 (en) Document Generation Method and System
Meunier Optimized XY-cut for determining a page reading order
CN108268884B (zh) 一种文档对比方法及装置
WO2007038389A3 (en) Method and apparatus for identifying and classifying network documents as spam
EP1079312A3 (de) Druckwerk mit Textdaten und Verfahren und Apparat zum Ausdrucken des Druckwerks
US9430451B1 (en) Parsing author name groups in non-standardized format
US10984168B1 (en) System and method for generating a multi-modal abstract
JP6976524B2 (ja) 印刷用データの生成方法及び印刷用データを生成するためのソフトウェア
CN120911402B (zh) 基于ai和排版分析的dtp业务复杂度辅助评估方法
JP6204076B2 (ja) 文章領域読み取り順序判定装置、文章領域読み取り順序判定方法及び文章領域読み取り順序判定プログラム
Arrant Standard Tiberian Pronunciation in a Non-Standard Form: TS as 64.206
Doboš The Tale of Two Empires
O'CONNOR Handwritten Text Recognition technology and MS Turin, BNU, L. II. 14 (T). The" Rescapé" case study.
Pournader Proposal to encode four combining Arabic characters for Koranic use
Kumar Publisher’s Information
CN100565513C (zh) 文件处理方法及其相关的图案显示方法
Pandey Final proposal to encode Nandinagari in Unicode
Kumar Publisher’s Information
Irie et al. Authors’ Instructions for IVCNZ08
de Normalisation Background information
Selamat et al. Export competitiveness of the Malaysia processed food in the middle east market

Legal Events

Date Code Title Description
UEP Publication of translation of european patent specification

Ref document number: 1732012

Country of ref document: EP