JP4299963B2 - 意味的まとまりに基づいて文書を分割する装置および方法 - Google Patents

意味的まとまりに基づいて文書を分割する装置および方法 Download PDF

Info

Publication number
JP4299963B2
JP4299963B2 JP2000302321A JP2000302321A JP4299963B2 JP 4299963 B2 JP4299963 B2 JP 4299963B2 JP 2000302321 A JP2000302321 A JP 2000302321A JP 2000302321 A JP2000302321 A JP 2000302321A JP 4299963 B2 JP4299963 B2 JP 4299963B2
Authority
JP
Japan
Prior art keywords
document
segment
similarity
dividing
likelihood
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
JP2000302321A
Other languages
English (en)
Japanese (ja)
Other versions
JP2002117019A (ja
JP2002117019A5 (enExample
Inventor
裕之 清水
真也 中川
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
HP Inc
Original Assignee
Hewlett Packard Co
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett Packard Co filed Critical Hewlett Packard Co
Priority to JP2000302321A priority Critical patent/JP4299963B2/ja
Priority to US10/182,779 priority patent/US7113897B2/en
Priority to PCT/US2001/030734 priority patent/WO2002029547A1/en
Priority to EP01975645A priority patent/EP1301853B1/en
Priority to DE60139323T priority patent/DE60139323D1/de
Publication of JP2002117019A publication Critical patent/JP2002117019A/ja
Publication of JP2002117019A5 publication Critical patent/JP2002117019A5/ja
Application granted granted Critical
Publication of JP4299963B2 publication Critical patent/JP4299963B2/ja
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/137Hierarchical processing, e.g. outlines
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/14Tree-structured documents
    • G06F40/143Markup, e.g. Standard Generalized Markup Language [SGML] or Document Type Definition [DTD]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99941Database schema or data structure
    • Y10S707/99943Generating database or data structure, e.g. via user interface
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99941Database schema or data structure
    • Y10S707/99944Object-oriented database structure
    • Y10S707/99945Object-oriented database structure processing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)
  • Document Processing Apparatus (AREA)
  • Facsimiles In General (AREA)
JP2000302321A 2000-10-02 2000-10-02 意味的まとまりに基づいて文書を分割する装置および方法 Expired - Fee Related JP4299963B2 (ja)

Priority Applications (5)

Application Number Priority Date Filing Date Title
JP2000302321A JP4299963B2 (ja) 2000-10-02 2000-10-02 意味的まとまりに基づいて文書を分割する装置および方法
US10/182,779 US7113897B2 (en) 2000-10-02 2001-10-02 Apparatus and method for text segmentation based on coherent units
PCT/US2001/030734 WO2002029547A1 (en) 2000-10-02 2001-10-02 Apparatus and method for text segmentation based on coherent units
EP01975645A EP1301853B1 (en) 2000-10-02 2001-10-02 Apparatus and method for text segmentation based on coherent units
DE60139323T DE60139323D1 (de) 2000-10-02 2001-10-02 Vorrichtung und verfahren zur textsegmentierung auf der grundlage kohärenter einheiten

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2000302321A JP4299963B2 (ja) 2000-10-02 2000-10-02 意味的まとまりに基づいて文書を分割する装置および方法

Publications (3)

Publication Number Publication Date
JP2002117019A JP2002117019A (ja) 2002-04-19
JP2002117019A5 JP2002117019A5 (enExample) 2007-12-06
JP4299963B2 true JP4299963B2 (ja) 2009-07-22

Family

ID=18783693

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2000302321A Expired - Fee Related JP4299963B2 (ja) 2000-10-02 2000-10-02 意味的まとまりに基づいて文書を分割する装置および方法

Country Status (5)

Country Link
US (1) US7113897B2 (enExample)
EP (1) EP1301853B1 (enExample)
JP (1) JP4299963B2 (enExample)
DE (1) DE60139323D1 (enExample)
WO (1) WO2002029547A1 (enExample)

Families Citing this family (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050120011A1 (en) * 2003-11-26 2005-06-02 Word Data Corp. Code, method, and system for manipulating texts
JP2007241902A (ja) * 2006-03-10 2007-09-20 Univ Of Tsukuba テキストデータの分割システム及びテキストデータの分割及び階層化方法
JP5084297B2 (ja) * 2007-02-21 2012-11-28 株式会社野村総合研究所 会話解析装置および会話解析プログラム
JP4646078B2 (ja) * 2007-03-08 2011-03-09 日本電信電話株式会社 相互に関係する固有表現の組抽出装置及びその方法
JP5256654B2 (ja) * 2007-06-29 2013-08-07 富士通株式会社 文章分割プログラム、文章分割装置および文章分割方法
KR101472844B1 (ko) 2007-10-23 2014-12-16 삼성전자 주식회사 적응적 문서 디스플레이 장치 및 방법
US8977539B2 (en) 2009-03-30 2015-03-10 Nec Corporation Language analysis apparatus, language analysis method, and language analysis program
US8434001B2 (en) 2010-06-03 2013-04-30 Rhonda Enterprises, Llc Systems and methods for presenting a content summary of a media item to a user based on a position within the media item
US9326116B2 (en) 2010-08-24 2016-04-26 Rhonda Enterprises, Llc Systems and methods for suggesting a pause position within electronic text
US9069754B2 (en) 2010-09-29 2015-06-30 Rhonda Enterprises, Llc Method, system, and computer readable medium for detecting related subgroups of text in an electronic document
CN104468319B (zh) * 2013-09-18 2018-11-16 阿里巴巴集团控股有限公司 一种会话内容合并方法和系统
CN104090918B (zh) * 2014-06-16 2017-02-22 北京理工大学 一种基于信息量的句子相似度计算方法
US10402473B2 (en) * 2016-10-16 2019-09-03 Richard Salisbury Comparing, and generating revision markings with respect to, an arbitrary number of text segments
JP6815184B2 (ja) * 2016-12-13 2021-01-20 株式会社東芝 情報処理装置、情報処理方法、および情報処理プログラム
EP3616090A1 (en) * 2017-04-26 2020-03-04 Piksel, Inc. Multimedia stream analysis and retrieval
JP6564811B2 (ja) * 2017-05-18 2019-08-21 日本電信電話株式会社 パッセージ提示制御装置、パッセージ提示方法、及びパッセージ提示プログラム
CN109492659B (zh) * 2018-09-25 2021-10-01 维灵(杭州)信息技术有限公司 一种用于心电、脑电波形对比的计算曲线相似度的方法
JP7148077B2 (ja) * 2019-02-28 2022-10-05 日本電信電話株式会社 木構造解析装置、方法、及びプログラム
US11748571B1 (en) * 2019-05-21 2023-09-05 Educational Testing Service Text segmentation with two-level transformer and auxiliary coherence modeling
CN111797634B (zh) * 2020-06-04 2023-09-08 语联网(武汉)信息技术有限公司 文档分割方法及装置
CN112597422A (zh) * 2020-12-30 2021-04-02 深圳市世强元件网络有限公司 一种pdf文件分割方法和网页中pdf文件加载方法
CN118446213B (zh) * 2024-04-29 2025-01-14 北京医二科技有限公司 文本切分方法及装置、计算机程序产品、电子设备

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
ATE260486T1 (de) * 1992-07-31 2004-03-15 Ibm Auffindung von zeichenketten in einer datenbank von zeichenketten
US5778397A (en) * 1995-06-28 1998-07-07 Xerox Corporation Automatic method of generating feature probabilities for automatic extracting summarization
US5761191A (en) * 1995-11-28 1998-06-02 Telecommunications Techniques Corporation Statistics collection for ATM networks
US6052657A (en) * 1997-09-09 2000-04-18 Dragon Systems, Inc. Text segmentation and identification of topic using language models
JPH11235574A (ja) 1998-02-24 1999-08-31 Hitachi Kasei Techno Plant Kk リサイクル装置及び廃パトローネのリサイクル装置
JP3578618B2 (ja) 1998-02-26 2004-10-20 株式会社リコー 文書分割装置
JP3597697B2 (ja) * 1998-03-20 2004-12-08 富士通株式会社 文書要約装置およびその方法
US6714909B1 (en) * 1998-08-13 2004-03-30 At&T Corp. System and method for automated multimedia content indexing and retrieval
US6185524B1 (en) * 1998-12-31 2001-02-06 Lernout & Hauspie Speech Products N.V. Method and apparatus for automatic identification of word boundaries in continuous text and computation of word boundary scores
US6317708B1 (en) * 1999-01-07 2001-11-13 Justsystem Corporation Method for producing summaries of text document
JP2000235574A (ja) 1999-02-16 2000-08-29 Ricoh Co Ltd 文書処理装置
US6611825B1 (en) * 1999-06-09 2003-08-26 The Boeing Company Method and system for text mining using multidimensional subspaces
US6411962B1 (en) * 1999-11-29 2002-06-25 Xerox Corporation Systems and methods for organizing text
US6675174B1 (en) * 2000-02-02 2004-01-06 International Business Machines Corp. System and method for measuring similarity between a set of known temporal media segments and a one or more temporal media streams

Also Published As

Publication number Publication date
US7113897B2 (en) 2006-09-26
JP2002117019A (ja) 2002-04-19
WO2002029547A1 (en) 2002-04-11
EP1301853A1 (en) 2003-04-16
DE60139323D1 (de) 2009-09-03
US20030081811A1 (en) 2003-05-01
EP1301853B1 (en) 2009-07-22
WO2002029547A9 (en) 2005-03-17
EP1301853A4 (en) 2007-03-14

Similar Documents

Publication Publication Date Title
JP4299963B2 (ja) 意味的まとまりに基づいて文書を分割する装置および方法
US12039284B2 (en) Real-time in-context smart summarizer
US20040049374A1 (en) Translation aid for multilingual Web sites
CN100568242C (zh) 用于提取新复合词的系统和方法
Song et al. A hybrid approach for content extraction with text density and visual importance of DOM nodes
US20090313536A1 (en) Dynamically Providing Relevant Browser Content
US20160085740A1 (en) Generating training data for disambiguation
US7284006B2 (en) Method and apparatus for browsing document content
US9244891B2 (en) Adjusting search result rankings based on multiple user highlighting of documents
WO2019015133A1 (zh) 一种输入法的词库管理方法及装置
JP2020098596A (ja) ウェブページから情報を抽出する方法、装置及び記憶媒体
US20100042915A1 (en) Personalized Document Creation
CN112380337A (zh) 基于富文本的高亮方法及装置
Ohba et al. Toward mining" concept keywords" from identifiers in large software projects
JP4030624B2 (ja) 文書処理装置、文書処理プログラムが記憶された記憶媒体および文書処理方法
JP2007072646A (ja) 検索装置、検索方法およびプログラム
CN113761906B (zh) 解析文档的方法、装置、设备和计算机可读介质
WO2022265744A1 (en) Smart browser history search
Mukherjee et al. Browsing fatigue in handhelds: semantic bookmarking spells relief
KR101909537B1 (ko) 소셜 데이터 분류 시스템 및 방법
CN116127181B (zh) 一种获取用户喜爱功能评论的方法及装置
Brüggemann et al. Topic Detection and Tracking System
Veselovsky et al. Web2Wiki: Characterizing Wikipedia Linking Across the Web
WO2023162129A1 (ja) 学習用データ生成装置、リスク検知装置、学習用データ生成方法、リスク検知方法、学習用データ生成プログラム及びリスク検知プログラム
Kumar et al. Design of Methodology and a Comparative Analysis of Trigram Technique in Similarity of Textual Data

Legal Events

Date Code Title Description
A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20071001

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20071024

A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20081212

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20090106

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20090318

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20090414

A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20090420

R150 Certificate of patent or registration of utility model

Free format text: JAPANESE INTERMEDIATE CODE: R150

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20120424

Year of fee payment: 3

LAPS Cancellation because of no payment of annual fees