DE602005002473D1 - Verfahren zur Identifizierung von Wörtern in einem elektronischen Dokument - Google Patents
Verfahren zur Identifizierung von Wörtern in einem elektronischen DokumentInfo
- Publication number
- DE602005002473D1 DE602005002473D1 DE602005002473T DE602005002473T DE602005002473D1 DE 602005002473 D1 DE602005002473 D1 DE 602005002473D1 DE 602005002473 T DE602005002473 T DE 602005002473T DE 602005002473 T DE602005002473 T DE 602005002473T DE 602005002473 D1 DE602005002473 D1 DE 602005002473D1
- Authority
- DE
- Germany
- Prior art keywords
- semantic units
- electronic document
- glyphs
- determining
- geometric
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/414—Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computer Graphics (AREA)
- Geometry (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Machine Translation (AREA)
- Document Processing Apparatus (AREA)
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP05014369A EP1739574B1 (de) | 2005-07-01 | 2005-07-01 | Verfahren zur Identifizierung von Wörtern in einem elektronischen Dokument |
Publications (2)
Publication Number | Publication Date |
---|---|
DE602005002473D1 true DE602005002473D1 (de) | 2007-10-25 |
DE602005002473T2 DE602005002473T2 (de) | 2008-01-10 |
Family
ID=35407037
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
DE602005002473T Active DE602005002473T2 (de) | 2005-07-01 | 2005-07-01 | Verfahren zum Erkennen von semantischen Einheiten in einem elektronischen Dokument |
Country Status (4)
Country | Link |
---|---|
US (1) | US7705848B2 (de) |
EP (1) | EP1739574B1 (de) |
AT (1) | ATE373274T1 (de) |
DE (1) | DE602005002473T2 (de) |
Families Citing this family (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7786994B2 (en) * | 2006-10-26 | 2010-08-31 | Microsoft Corporation | Determination of unicode points from glyph elements |
JP5248845B2 (ja) * | 2006-12-13 | 2013-07-31 | キヤノン株式会社 | 文書処理装置、文書処理方法、プログラムおよび記憶媒体 |
JP4834603B2 (ja) * | 2007-05-09 | 2011-12-14 | 京セラミタ株式会社 | 画像処理装置、画像形成装置 |
US8229232B2 (en) * | 2007-08-24 | 2012-07-24 | CVISION Technologies, Inc. | Computer vision-based methods for enhanced JBIG2 and generic bitonal compression |
US8365072B2 (en) * | 2009-01-02 | 2013-01-29 | Apple Inc. | Identification of compound graphic elements in an unstructured document |
US8824806B1 (en) * | 2010-03-02 | 2014-09-02 | Amazon Technologies, Inc. | Sequential digital image panning |
US8380753B2 (en) | 2011-01-18 | 2013-02-19 | Apple Inc. | Reconstruction of lists in a document |
US8549399B2 (en) | 2011-01-18 | 2013-10-01 | Apple Inc. | Identifying a selection of content in a structured document |
TWI476761B (zh) | 2011-04-08 | 2015-03-11 | Dolby Lab Licensing Corp | 用以產生可由實施不同解碼協定之解碼器所解碼的統一位元流之音頻編碼方法及系統 |
CA2753508C (en) * | 2011-09-23 | 2013-07-30 | Guy Le Henaff | Tracing a document in an electronic publication |
MX2014008560A (es) | 2012-01-23 | 2014-09-26 | Microsoft Corp | Procesador de deteccion de formula. |
EP2883210A4 (de) * | 2012-08-10 | 2016-04-20 | Monotype Imaging Inc | Herstellung von glyphendistanzfeldern |
US10002448B2 (en) | 2012-08-10 | 2018-06-19 | Monotype Imaging Inc. | Producing glyph distance fields |
US9330070B2 (en) | 2013-03-11 | 2016-05-03 | Microsoft Technology Licensing, Llc | Detection and reconstruction of east asian layout features in a fixed format document |
US20140258852A1 (en) * | 2013-03-11 | 2014-09-11 | Microsoft Corporation | Detection and Reconstruction of Right-to-Left Text Direction, Ligatures and Diacritics in a Fixed Format Document |
US9047511B1 (en) * | 2013-05-15 | 2015-06-02 | Amazon Technologies, Inc. | Describing inter-character spacing in a font file |
IN2015CH05327A (de) * | 2015-10-05 | 2015-10-16 | Wipro Ltd | |
US10930045B2 (en) * | 2017-03-22 | 2021-02-23 | Microsoft Technology Licensing, Llc | Digital ink based visual components |
US10740602B2 (en) * | 2018-04-18 | 2020-08-11 | Google Llc | System and methods for assigning word fragments to text lines in optical character recognition-extracted data |
US11615244B2 (en) | 2020-01-30 | 2023-03-28 | Oracle International Corporation | Data extraction and ordering based on document layout analysis |
US11475686B2 (en) | 2020-01-31 | 2022-10-18 | Oracle International Corporation | Extracting data from tables detected in electronic documents |
US11687700B1 (en) * | 2022-02-01 | 2023-06-27 | International Business Machines Corporation | Generating a structure of a PDF-document |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5416898A (en) * | 1992-05-12 | 1995-05-16 | Apple Computer, Inc. | Apparatus and method for generating textual lines layouts |
DE69525401T2 (de) * | 1994-09-12 | 2002-11-21 | Adobe Systems Inc | Verfahren und Gerät zur Identifikation von Wörtern, die in einem portablen elektronischen Dokument beschrieben sind |
US5870084A (en) * | 1996-11-12 | 1999-02-09 | Thomson Consumer Electronics, Inc. | System and method for efficiently storing and quickly retrieving glyphs for large character set languages in a set top box |
US6327393B1 (en) * | 1998-08-17 | 2001-12-04 | Cognex Corporation | Method and apparatus to transform a region within a digital image using a deformable window |
US20040205568A1 (en) * | 2002-03-01 | 2004-10-14 | Breuel Thomas M. | Method and system for document image layout deconstruction and redisplay system |
US20040202352A1 (en) * | 2003-04-10 | 2004-10-14 | International Business Machines Corporation | Enhanced readability with flowed bitmaps |
-
2005
- 2005-07-01 DE DE602005002473T patent/DE602005002473T2/de active Active
- 2005-07-01 AT AT05014369T patent/ATE373274T1/de active
- 2005-07-01 EP EP05014369A patent/EP1739574B1/de active Active
-
2006
- 2006-04-18 US US11/405,782 patent/US7705848B2/en active Active
Also Published As
Publication number | Publication date |
---|---|
DE602005002473T2 (de) | 2008-01-10 |
ATE373274T1 (de) | 2007-09-15 |
EP1739574B1 (de) | 2007-09-12 |
US7705848B2 (en) | 2010-04-27 |
US20070002054A1 (en) | 2007-01-04 |
EP1739574A1 (de) | 2007-01-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
ATE373274T1 (de) | Verfahren zur identifizierung von wörtern in einem elektronischen dokument | |
ATE375561T1 (de) | Verfahren zur identifizierung von redundantem text in elektronischen dokumenten | |
CN102411587B (zh) | 一种网页分类方法和装置 | |
WO2005109178A3 (en) | Extracting information from web pages | |
ATE432515T1 (de) | Verfahren zur determinierung und unterdrückung von duplikaten | |
WO2006014343A3 (en) | Automated evaluation systems and methods | |
WO2009098468A3 (en) | A method and system of indexing numerical data | |
WO2006122059A3 (en) | System and methods for identifying the potential advertising value of terms found on web pages | |
WO2003057648A3 (fr) | Procedes et systemes de recherche et d'association de ressources d'information telles que des pages web | |
DE60333631D1 (de) | Verhaltensbasierte anpassung von computersystemen | |
ATE433124T1 (de) | System und verfahren zum analysieren von radarinformationen | |
CN104298665A (zh) | 一种中文文本中评价对象的识别方法及装置 | |
ATE439638T1 (de) | Verfahren, gerät und computerprogrammprodukte zur wiederauffindung von information und dem klassifizieren von dokumenten mit einem multidimensionalem unterraum | |
EP1736901A3 (de) | Verfahren zur Klassifizierung von Subbäumen in halbstrukturierten Dokumenten | |
WO2005050473A3 (en) | Clustering of text for structuring of text documents and training of language models | |
DE602005018429D1 (de) | Vorrichtung, Verfahren, Prozessoranordnung und computerlesbares Datenträgerspeicherprogramm zur Dokumentklassifizierung | |
DE602005021581D1 (de) | Verfahren und Vorrichtung zur Klassifikation von Bildseiten mittels Zusammenfassungen | |
DE602004022406D1 (de) | Verfahren und Vorrichtung zur Paketklassifizierung und Überschreibung | |
CN103309862A (zh) | 一种网页类型识别方法和系统 | |
McCollum et al. | Unbounded harmony is not always myopic: Evidence from Tutrugbu | |
CN101877062A (zh) | 图像版面区域轮廓分析方法 | |
US20080228724A1 (en) | Technical classification method for searching patents | |
DE502004007248D1 (de) | Identifikationskarte und verfahren zu deren herstellung | |
CN116341489A (zh) | 一种文本信息读取方法、装置及终端 | |
DE602004021598D1 (de) | Ein Verfahren, eine Netzdokument-Beschreibungssprache, ein Netzdokument-Übergangsprotokoll und ein Computer-Softwareprodukt zur Wiederauffindung von Netzdokumenten |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
8364 | No opposition during term of opposition |