DE60314806D1 - Extrahierung von Information aus strukturierten Dokumenten - Google Patents
Extrahierung von Information aus strukturierten DokumentenInfo
- Publication number
- DE60314806D1 DE60314806D1 DE60314806T DE60314806T DE60314806D1 DE 60314806 D1 DE60314806 D1 DE 60314806D1 DE 60314806 T DE60314806 T DE 60314806T DE 60314806 T DE60314806 T DE 60314806T DE 60314806 D1 DE60314806 D1 DE 60314806D1
- Authority
- DE
- Germany
- Prior art keywords
- extraction
- information
- structured documents
- structured
- documents
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/958—Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
- G06F16/986—Document structures and storage, e.g. HTML extensions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/80—Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
- G06F16/81—Indexing, e.g. XML tags; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/80—Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
- G06F16/84—Mapping; Conversion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/957—Browsing optimisation, e.g. caching or content distillation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/103—Formatting, i.e. changing of presentation of documents
- G06F40/117—Tagging; Marking up; Designating a block; Setting of attributes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/14—Tree-structured documents
- G06F40/143—Markup, e.g. Standard Generalized Markup Language [SGML] or Document Type Definition [DTD]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Document Processing Apparatus (AREA)
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2002190621A JP3937944B2 (ja) | 2002-06-28 | 2002-06-28 | 構造化文書からの情報抽出方法及び装置及び情報抽出プログラム及びコンピュータ読み取り可能な記録媒体 |
JP2002190621 | 2002-06-28 | ||
JP2002204641 | 2002-07-12 | ||
JP2002204641A JP2004046642A (ja) | 2002-07-12 | 2002-07-12 | 構造化文書の部分指定・抽出方法及び装置及び構造化文書の部分指定・抽出プログラム及び構造化文書の部分指定・抽出プログラムを格納した記憶媒体 |
Publications (2)
Publication Number | Publication Date |
---|---|
DE60314806D1 true DE60314806D1 (de) | 2007-08-23 |
DE60314806T2 DE60314806T2 (de) | 2008-03-13 |
Family
ID=29718460
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
DE60314806T Expired - Lifetime DE60314806T2 (de) | 2002-06-28 | 2003-06-17 | Extrahierung von Information aus strukturierten Dokumenten |
DE60333238T Expired - Lifetime DE60333238D1 (de) | 2002-06-28 | 2003-06-17 | Extrahierung von Information aus strukturierten Dokumenten |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
DE60333238T Expired - Lifetime DE60333238D1 (de) | 2002-06-28 | 2003-06-17 | Extrahierung von Information aus strukturierten Dokumenten |
Country Status (5)
Country | Link |
---|---|
US (2) | US7685157B2 (de) |
EP (2) | EP1686499B1 (de) |
KR (1) | KR100572576B1 (de) |
CN (1) | CN1244877C (de) |
DE (2) | DE60314806T2 (de) |
Families Citing this family (36)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1686499B1 (de) * | 2002-06-28 | 2010-06-30 | Nippon Telegraph and Telephone Corporation | Extrahierung von Information aus strukturierten Dokumenten |
WO2004068320A2 (en) * | 2003-01-27 | 2004-08-12 | Vincent Wen-Jeng Lue | Method and apparatus for adapting web contents to different display area dimensions |
US20050108630A1 (en) * | 2003-11-19 | 2005-05-19 | Wasson Mark D. | Extraction of facts from text |
GB2411017A (en) * | 2004-02-13 | 2005-08-17 | Satellite Information Services | Updating mark-up language documents from contained instructions |
WO2005114494A1 (en) * | 2004-05-21 | 2005-12-01 | Computer Associates Think, Inc. | Storing multipart xml documents |
CN100432996C (zh) * | 2004-12-07 | 2008-11-12 | 国际商业机器公司 | 基于网页页面布局提取网页核心内容的系统、方法 |
EP1681643B1 (de) * | 2005-01-14 | 2010-05-05 | TheFind, Inc. | Verfahren und System zur Informationsextraktion |
CN100395755C (zh) * | 2006-02-23 | 2008-06-18 | 无锡永中科技有限公司 | 计算机中建立树状文件结构的方法 |
US20070266309A1 (en) * | 2006-05-12 | 2007-11-15 | Royston Sellman | Document transfer between document editing software applications |
US9460064B2 (en) * | 2006-05-18 | 2016-10-04 | Oracle International Corporation | Efficient piece-wise updates of binary encoded XML data |
CN101094194B (zh) * | 2006-06-19 | 2010-06-23 | 腾讯科技(深圳)有限公司 | 一种提取Web页面中用户所需Web信息的方法 |
JP4146479B2 (ja) * | 2006-09-28 | 2008-09-10 | 株式会社東芝 | 構造化文書検索装置、構造化文書検索方法および構造化文書検索プログラム |
JP2008108096A (ja) * | 2006-10-26 | 2008-05-08 | Sony Corp | コンテンツ共有システム、コンテンツ管理サーバ、クライアント機器、コンテンツ管理方法およびコンテンツ取得方法 |
US8291310B2 (en) * | 2007-08-29 | 2012-10-16 | Oracle International Corporation | Delta-saving in XML-based documents |
KR100902674B1 (ko) * | 2007-10-10 | 2009-06-15 | 엔에이치엔(주) | 문서 탐색 서비스 제공 방법 및 시스템 |
US20090138500A1 (en) * | 2007-10-12 | 2009-05-28 | Yuan Zhiqiang | Method of compact display combined with property-table-view for a complex relational data structure |
US8515727B2 (en) * | 2008-03-19 | 2013-08-20 | International Business Machines Corporation | Automatic logic model build process with autonomous quality checking |
CN101571859B (zh) * | 2008-04-28 | 2013-01-02 | 国际商业机器公司 | 用于对文档进行标注的方法和设备 |
JP2010165272A (ja) * | 2009-01-19 | 2010-07-29 | Sony Corp | 情報処理方法、情報処理装置、及びプログラム |
WO2011041465A1 (en) * | 2009-09-30 | 2011-04-07 | Tracking.Net | Enhanced website tracking system and method |
US8255372B2 (en) | 2010-01-18 | 2012-08-28 | Oracle International Corporation | Efficient validation of binary XML data |
US9633332B2 (en) | 2010-07-13 | 2017-04-25 | Hewlett Packard Enterprise Development Lp | Generating machine-understandable representations of content |
JP4936413B1 (ja) * | 2011-03-07 | 2012-05-23 | 株式会社ショーケース・ティービー | ウェブ表示プログラム変換システム、ウェブ表示プログラム変換方法、及び、ウェブ表示プログラム変換用プログラム |
US10756759B2 (en) | 2011-09-02 | 2020-08-25 | Oracle International Corporation | Column domain dictionary compression |
US8935267B2 (en) * | 2012-06-19 | 2015-01-13 | Marklogic Corporation | Apparatus and method for executing different query language queries on tree structured data using pre-computed indices of selective document paths |
US10275398B2 (en) | 2012-09-11 | 2019-04-30 | Nippon Telegraph And Telephone Corporation | Content display device, content display method, and content display program |
US8812523B2 (en) | 2012-09-28 | 2014-08-19 | Oracle International Corporation | Predicate result cache |
US9740765B2 (en) | 2012-10-08 | 2017-08-22 | International Business Machines Corporation | Building nomenclature in a set of documents while building associative document trees |
US9208254B2 (en) * | 2012-12-10 | 2015-12-08 | Microsoft Technology Licensing, Llc | Query and index over documents |
US10454752B2 (en) | 2015-11-02 | 2019-10-22 | Servicenow, Inc. | System and method for processing alerts indicative of conditions of a computing infrastructure |
JP2019066917A (ja) * | 2017-09-28 | 2019-04-25 | 京セラドキュメントソリューションズ株式会社 | 電子機器、及び翻訳支援方法 |
US10922366B2 (en) * | 2018-03-27 | 2021-02-16 | International Business Machines Corporation | Self-adaptive web crawling and text extraction |
US20220277499A1 (en) * | 2019-08-13 | 2022-09-01 | Arbi, Inc. | Systems and methods for document processing |
US11194833B2 (en) * | 2019-10-28 | 2021-12-07 | Charbel Gerges El Gemayel | Interchange data format system and method |
CN110956019B (zh) * | 2019-11-27 | 2021-10-26 | 北大方正集团有限公司 | 列表处理系统、方法、装置、计算机可读存储介质 |
CN111857737A (zh) * | 2020-07-28 | 2020-10-30 | 苏州华望信息科技有限公司 | 基于SysML模型语义web系统的动静态资源分离方法 |
Family Cites Families (36)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0713821B2 (ja) | 1991-03-08 | 1995-02-15 | 日本電気株式会社 | 編集装置 |
JPH0652161A (ja) * | 1992-08-03 | 1994-02-25 | Fuji Xerox Co Ltd | 文書処理方法及び文書処理装置 |
JP2896634B2 (ja) * | 1995-03-02 | 1999-05-31 | 富士ゼロックス株式会社 | 全文登録語検索装置および全文登録語検索方法 |
US5848186A (en) * | 1995-08-11 | 1998-12-08 | Canon Kabushiki Kaisha | Feature extraction system for identifying text within a table image |
US6546406B1 (en) * | 1995-11-03 | 2003-04-08 | Enigma Information Systems Ltd. | Client-server computer system for large document retrieval on networked computer system |
US6456308B1 (en) * | 1996-08-08 | 2002-09-24 | Agranat Systems, Inc. | Embedded web server |
US6061697A (en) * | 1996-09-11 | 2000-05-09 | Fujitsu Limited | SGML type document managing apparatus and managing method |
US5974572A (en) * | 1996-10-15 | 1999-10-26 | Mercury Interactive Corporation | Software system and methods for generating a load test using a server access log |
JPH10171800A (ja) | 1996-12-05 | 1998-06-26 | Canon Inc | 文書処理方法及びその装置 |
JP2867986B2 (ja) | 1996-12-25 | 1999-03-10 | 日本電気株式会社 | Www情報抽出システム |
JPH1185690A (ja) | 1997-09-08 | 1999-03-30 | Nippon Telegr & Teleph Corp <Ntt> | 有効情報提供方法及び有効情報提供システム |
US6628304B2 (en) * | 1998-12-09 | 2003-09-30 | Cisco Technology, Inc. | Method and apparatus providing a graphical user interface for representing and navigating hierarchical networks |
US6635089B1 (en) * | 1999-01-13 | 2003-10-21 | International Business Machines Corporation | Method for producing composite XML document object model trees using dynamic data retrievals |
JP4280360B2 (ja) | 1999-06-04 | 2009-06-17 | キヤノン株式会社 | 撮像装置及びその制御方法及び記憶媒体 |
US6529889B1 (en) * | 1999-07-27 | 2003-03-04 | Acappella Software, Inc. | System and method of knowledge architecture |
JP2001184344A (ja) * | 1999-12-21 | 2001-07-06 | Internatl Business Mach Corp <Ibm> | 情報処理システム、プロキシサーバ、ウェブページ表示制御方法、記憶媒体、及びプログラム伝送装置 |
JP2001282773A (ja) | 2000-03-29 | 2001-10-12 | Hitachi Software Eng Co Ltd | 構造化文書編集装置及び、構造化文書編集方法及び記録媒体 |
US7702995B2 (en) * | 2000-04-24 | 2010-04-20 | TVWorks, LLC. | Method and system for transforming content for execution on multiple platforms |
DE60111376T2 (de) * | 2000-05-16 | 2006-03-16 | O'carroll, Garrett | System und verfahren zur dokumentverarbeitung |
JP2002024227A (ja) * | 2000-05-22 | 2002-01-25 | Touuroomu Inc | 無線ウェブページを生成するシステム及び方法 |
US6732153B1 (en) * | 2000-05-23 | 2004-05-04 | Verizon Laboratories Inc. | Unified message parser apparatus and system for real-time event correlation |
US20020029229A1 (en) * | 2000-06-30 | 2002-03-07 | Jakopac David E. | Systems and methods for data compression |
US6678692B1 (en) * | 2000-07-10 | 2004-01-13 | Northrop Grumman Corporation | Hierarchy statistical analysis system and method |
US6842755B2 (en) * | 2000-09-25 | 2005-01-11 | Divine Technology Ventures | System and method for automatic retrieval of structured online documents |
JP2002190621A (ja) | 2000-10-12 | 2002-07-05 | Sharp Corp | 半導体発光素子およびその製造方法 |
JP2002123418A (ja) | 2000-10-13 | 2002-04-26 | Nec Corp | データ更新方法及びデータ更新装置並びにプログラムを記録した機械読み取り可能な記録媒体 |
US6961909B2 (en) * | 2001-01-05 | 2005-11-01 | Hewlett-Packard Development Company, L.P. | System for displaying a hierarchical directory |
JP2002204641A (ja) | 2001-01-10 | 2002-07-23 | Shimano Inc | スピニングリールのドラグ機構 |
US6704723B1 (en) * | 2001-06-20 | 2004-03-09 | Microstrategy, Incorporated | Method and system for providing business intelligence information over a computer network via extensible markup language |
US6799184B2 (en) * | 2001-06-21 | 2004-09-28 | Sybase, Inc. | Relational database system providing XML query support |
US20030220914A1 (en) * | 2002-05-23 | 2003-11-27 | Mindflash Technologies, Inc. | Method for managing data in a network |
EP1686499B1 (de) * | 2002-06-28 | 2010-06-30 | Nippon Telegraph and Telephone Corporation | Extrahierung von Information aus strukturierten Dokumenten |
US20050125419A1 (en) * | 2002-09-03 | 2005-06-09 | Fujitsu Limited | Search processing system, its search server, client, search processing method, program, and recording medium |
US7644361B2 (en) * | 2002-12-23 | 2010-01-05 | Canon Kabushiki Kaisha | Method of using recommendations to visually create new views of data across heterogeneous sources |
WO2004068320A2 (en) * | 2003-01-27 | 2004-08-12 | Vincent Wen-Jeng Lue | Method and apparatus for adapting web contents to different display area dimensions |
US20050108630A1 (en) * | 2003-11-19 | 2005-05-19 | Wasson Mark D. | Extraction of facts from text |
-
2003
- 2003-06-17 EP EP06010490A patent/EP1686499B1/de not_active Expired - Lifetime
- 2003-06-17 DE DE60314806T patent/DE60314806T2/de not_active Expired - Lifetime
- 2003-06-17 DE DE60333238T patent/DE60333238D1/de not_active Expired - Lifetime
- 2003-06-17 EP EP03253818A patent/EP1376408B1/de not_active Expired - Lifetime
- 2003-06-18 US US10/463,521 patent/US7685157B2/en not_active Expired - Lifetime
- 2003-06-18 CN CNB031486614A patent/CN1244877C/zh not_active Expired - Lifetime
- 2003-06-27 KR KR1020030042628A patent/KR100572576B1/ko active IP Right Grant
-
2004
- 2004-11-08 US US10/982,865 patent/US7730104B2/en active Active
Also Published As
Publication number | Publication date |
---|---|
US20040044963A1 (en) | 2004-03-04 |
CN1469276A (zh) | 2004-01-21 |
CN1244877C (zh) | 2006-03-08 |
US20050066271A1 (en) | 2005-03-24 |
EP1376408A2 (de) | 2004-01-02 |
EP1376408A3 (de) | 2005-10-12 |
EP1686499B1 (de) | 2010-06-30 |
EP1686499A8 (de) | 2006-11-08 |
US7685157B2 (en) | 2010-03-23 |
US7730104B2 (en) | 2010-06-01 |
EP1376408B1 (de) | 2007-07-11 |
KR100572576B1 (ko) | 2006-04-24 |
DE60333238D1 (de) | 2010-08-12 |
EP1686499A3 (de) | 2007-12-12 |
DE60314806T2 (de) | 2008-03-13 |
KR20040002791A (ko) | 2004-01-07 |
EP1686499A2 (de) | 2006-08-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
DE60314806D1 (de) | Extrahierung von Information aus strukturierten Dokumenten | |
DE602004024733D1 (de) | Informationsendgerät | |
NO20052047D0 (no) | Informasjonsplukker | |
DE60213011D1 (de) | Rückwärtskompatible verkleinerte chipkarte | |
ATE516375T1 (de) | Metallgewinnung aus schwefelstoffen | |
DE60331199D1 (de) | Trennung und rückgewinnung von bor | |
DE60336251D1 (de) | Identifikation von aufzeichnungsmedien | |
DE50309905D1 (de) | Etikett zur Verdeckung von Informationen | |
DE60321298D1 (de) | Kartenprozessor | |
DE60331481D1 (de) | Ic-karte | |
DE602004006328D1 (de) | Informationsbereitstellungssystem | |
DE60303869D1 (de) | Kartensteckverbinder | |
DE602004012130D1 (de) | Kontaktlose chipkarte | |
DE602004004846D1 (de) | Kodierung von Audiodaten | |
DE60208254D1 (de) | Datentrennschaltung | |
DE60304794D1 (de) | Kugelschreiber | |
DE502004008448D1 (de) | Chipkarte | |
DE60322879D1 (de) | Informationssicherheit | |
DE602004030205D1 (de) | Computerkarte | |
DE50301046D1 (de) | Datenträgerkarte | |
UA6223S (uk) | Футляр для карток | |
UA8469S (uk) | Картка страхування | |
UA10080S (uk) | Рекламна листівка | |
UA6354S (uk) | Телефонна картка | |
UA7366S (uk) | Скринька для візитних карток |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
8364 | No opposition during term of opposition |