DE602005002473D1 - Verfahren zur Identifizierung von Wörtern in einem elektronischen Dokument - Google Patents

Verfahren zur Identifizierung von Wörtern in einem elektronischen Dokument

Info

Publication number
DE602005002473D1
DE602005002473D1 DE602005002473T DE602005002473T DE602005002473D1 DE 602005002473 D1 DE602005002473 D1 DE 602005002473D1 DE 602005002473 T DE602005002473 T DE 602005002473T DE 602005002473 T DE602005002473 T DE 602005002473T DE 602005002473 D1 DE602005002473 D1 DE 602005002473D1
Authority
DE
Germany
Prior art keywords
semantic units
electronic document
glyphs
determining
geometric
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
DE602005002473T
Other languages
English (en)
Other versions
DE602005002473T2 (de
Inventor
Serge Bronstein
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
PDFlib GmbH
Original Assignee
PDFlib GmbH
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by PDFlib GmbH filed Critical PDFlib GmbH
Publication of DE602005002473D1 publication Critical patent/DE602005002473D1/de
Application granted granted Critical
Publication of DE602005002473T2 publication Critical patent/DE602005002473T2/de
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/414Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Machine Translation (AREA)
  • Document Processing Apparatus (AREA)
DE602005002473T 2005-07-01 2005-07-01 Verfahren zum Erkennen von semantischen Einheiten in einem elektronischen Dokument Active DE602005002473T2 (de)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
EP05014369A EP1739574B1 (de) 2005-07-01 2005-07-01 Verfahren zur Identifizierung von Wörtern in einem elektronischen Dokument

Publications (2)

Publication Number Publication Date
DE602005002473D1 true DE602005002473D1 (de) 2007-10-25
DE602005002473T2 DE602005002473T2 (de) 2008-01-10

Family

ID=35407037

Family Applications (1)

Application Number Title Priority Date Filing Date
DE602005002473T Active DE602005002473T2 (de) 2005-07-01 2005-07-01 Verfahren zum Erkennen von semantischen Einheiten in einem elektronischen Dokument

Country Status (4)

Country Link
US (1) US7705848B2 (de)
EP (1) EP1739574B1 (de)
AT (1) ATE373274T1 (de)
DE (1) DE602005002473T2 (de)

Families Citing this family (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7786994B2 (en) * 2006-10-26 2010-08-31 Microsoft Corporation Determination of unicode points from glyph elements
JP5248845B2 (ja) * 2006-12-13 2013-07-31 キヤノン株式会社 文書処理装置、文書処理方法、プログラムおよび記憶媒体
JP4834603B2 (ja) * 2007-05-09 2011-12-14 京セラミタ株式会社 画像処理装置、画像形成装置
US8229232B2 (en) * 2007-08-24 2012-07-24 CVISION Technologies, Inc. Computer vision-based methods for enhanced JBIG2 and generic bitonal compression
US8365072B2 (en) * 2009-01-02 2013-01-29 Apple Inc. Identification of compound graphic elements in an unstructured document
US8824806B1 (en) * 2010-03-02 2014-09-02 Amazon Technologies, Inc. Sequential digital image panning
US8380753B2 (en) 2011-01-18 2013-02-19 Apple Inc. Reconstruction of lists in a document
US8549399B2 (en) 2011-01-18 2013-10-01 Apple Inc. Identifying a selection of content in a structured document
TWI476761B (zh) 2011-04-08 2015-03-11 Dolby Lab Licensing Corp 用以產生可由實施不同解碼協定之解碼器所解碼的統一位元流之音頻編碼方法及系統
CA2753508C (en) * 2011-09-23 2013-07-30 Guy Le Henaff Tracing a document in an electronic publication
MX2014008560A (es) 2012-01-23 2014-09-26 Microsoft Corp Procesador de deteccion de formula.
EP2883210A4 (de) * 2012-08-10 2016-04-20 Monotype Imaging Inc Herstellung von glyphendistanzfeldern
US10002448B2 (en) 2012-08-10 2018-06-19 Monotype Imaging Inc. Producing glyph distance fields
US9330070B2 (en) 2013-03-11 2016-05-03 Microsoft Technology Licensing, Llc Detection and reconstruction of east asian layout features in a fixed format document
US20140258852A1 (en) * 2013-03-11 2014-09-11 Microsoft Corporation Detection and Reconstruction of Right-to-Left Text Direction, Ligatures and Diacritics in a Fixed Format Document
US9047511B1 (en) * 2013-05-15 2015-06-02 Amazon Technologies, Inc. Describing inter-character spacing in a font file
IN2015CH05327A (de) * 2015-10-05 2015-10-16 Wipro Ltd
US10930045B2 (en) * 2017-03-22 2021-02-23 Microsoft Technology Licensing, Llc Digital ink based visual components
US10740602B2 (en) * 2018-04-18 2020-08-11 Google Llc System and methods for assigning word fragments to text lines in optical character recognition-extracted data
US11615244B2 (en) 2020-01-30 2023-03-28 Oracle International Corporation Data extraction and ordering based on document layout analysis
US11475686B2 (en) 2020-01-31 2022-10-18 Oracle International Corporation Extracting data from tables detected in electronic documents
US11687700B1 (en) * 2022-02-01 2023-06-27 International Business Machines Corporation Generating a structure of a PDF-document

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5416898A (en) * 1992-05-12 1995-05-16 Apple Computer, Inc. Apparatus and method for generating textual lines layouts
DE69525401T2 (de) * 1994-09-12 2002-11-21 Adobe Systems Inc Verfahren und Gerät zur Identifikation von Wörtern, die in einem portablen elektronischen Dokument beschrieben sind
US5870084A (en) * 1996-11-12 1999-02-09 Thomson Consumer Electronics, Inc. System and method for efficiently storing and quickly retrieving glyphs for large character set languages in a set top box
US6327393B1 (en) * 1998-08-17 2001-12-04 Cognex Corporation Method and apparatus to transform a region within a digital image using a deformable window
US20040205568A1 (en) * 2002-03-01 2004-10-14 Breuel Thomas M. Method and system for document image layout deconstruction and redisplay system
US20040202352A1 (en) * 2003-04-10 2004-10-14 International Business Machines Corporation Enhanced readability with flowed bitmaps

Also Published As

Publication number Publication date
DE602005002473T2 (de) 2008-01-10
ATE373274T1 (de) 2007-09-15
EP1739574B1 (de) 2007-09-12
US7705848B2 (en) 2010-04-27
US20070002054A1 (en) 2007-01-04
EP1739574A1 (de) 2007-01-03

Similar Documents

Publication Publication Date Title
ATE373274T1 (de) Verfahren zur identifizierung von wörtern in einem elektronischen dokument
ATE375561T1 (de) Verfahren zur identifizierung von redundantem text in elektronischen dokumenten
CN102411587B (zh) 一种网页分类方法和装置
WO2005109178A3 (en) Extracting information from web pages
ATE432515T1 (de) Verfahren zur determinierung und unterdrückung von duplikaten
WO2006014343A3 (en) Automated evaluation systems and methods
WO2009098468A3 (en) A method and system of indexing numerical data
WO2006122059A3 (en) System and methods for identifying the potential advertising value of terms found on web pages
WO2003057648A3 (fr) Procedes et systemes de recherche et d'association de ressources d'information telles que des pages web
DE60333631D1 (de) Verhaltensbasierte anpassung von computersystemen
ATE433124T1 (de) System und verfahren zum analysieren von radarinformationen
CN104298665A (zh) 一种中文文本中评价对象的识别方法及装置
ATE439638T1 (de) Verfahren, gerät und computerprogrammprodukte zur wiederauffindung von information und dem klassifizieren von dokumenten mit einem multidimensionalem unterraum
EP1736901A3 (de) Verfahren zur Klassifizierung von Subbäumen in halbstrukturierten Dokumenten
WO2005050473A3 (en) Clustering of text for structuring of text documents and training of language models
DE602005018429D1 (de) Vorrichtung, Verfahren, Prozessoranordnung und computerlesbares Datenträgerspeicherprogramm zur Dokumentklassifizierung
DE602005021581D1 (de) Verfahren und Vorrichtung zur Klassifikation von Bildseiten mittels Zusammenfassungen
DE602004022406D1 (de) Verfahren und Vorrichtung zur Paketklassifizierung und Überschreibung
CN103309862A (zh) 一种网页类型识别方法和系统
McCollum et al. Unbounded harmony is not always myopic: Evidence from Tutrugbu
CN101877062A (zh) 图像版面区域轮廓分析方法
US20080228724A1 (en) Technical classification method for searching patents
DE502004007248D1 (de) Identifikationskarte und verfahren zu deren herstellung
CN116341489A (zh) 一种文本信息读取方法、装置及终端
DE602004021598D1 (de) Ein Verfahren, eine Netzdokument-Beschreibungssprache, ein Netzdokument-Übergangsprotokoll und ein Computer-Softwareprodukt zur Wiederauffindung von Netzdokumenten

Legal Events

Date Code Title Description
8364 No opposition during term of opposition