JP2011238221A - パッセージシーケンスの再使用を介して文書の展開を推測することによる文書類似性の測定 - Google Patents
パッセージシーケンスの再使用を介して文書の展開を推測することによる文書類似性の測定 Download PDFInfo
- Publication number
- JP2011238221A JP2011238221A JP2011099059A JP2011099059A JP2011238221A JP 2011238221 A JP2011238221 A JP 2011238221A JP 2011099059 A JP2011099059 A JP 2011099059A JP 2011099059 A JP2011099059 A JP 2011099059A JP 2011238221 A JP2011238221 A JP 2011238221A
- Authority
- JP
- Japan
- Prior art keywords
- document
- passage
- hmm
- state
- passages
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 claims abstract description 15
- 230000007704 transition Effects 0.000 claims description 37
- 230000007246 mechanism Effects 0.000 description 20
- 238000004422 calculation algorithm Methods 0.000 description 15
- 230000000007 visual effect Effects 0.000 description 8
- 230000008569 process Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 238000010276 construction Methods 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 238000009499 grossing Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 230000007423 decrease Effects 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000006798 recombination Effects 0.000 description 2
- 238000005215 recombination Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000007717 exclusion Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000037361 pathway Effects 0.000 description 1
- 230000008707 rearrangement Effects 0.000 description 1
- 238000013179 statistical model Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3325—Reformulation based on results of preceding query
- G06F16/3326—Reformulation based on results of preceding query using relevance feedback from the user, e.g. relevance feedback on documents, documents sets, document terms or passages
- G06F16/3328—Reformulation based on results of preceding query using relevance feedback from the user, e.g. relevance feedback on documents, documents sets, document terms or passages using graphical result space presentation or visualisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/80—Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/80—Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
- G06F16/84—Mapping; Conversion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/957—Browsing optimisation, e.g. caching or content distillation
- G06F16/9577—Optimising the visualization of content, e.g. distillation of HTML documents
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/958—Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
- G06F16/986—Document structures and storage, e.g. HTML extensions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Mathematical Optimization (AREA)
- Mathematical Analysis (AREA)
- Pure & Applied Mathematics (AREA)
- Software Systems (AREA)
- Computational Mathematics (AREA)
- Human Computer Interaction (AREA)
- Algebra (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
【解決手段】動作の間、パッセージの第1の集合を含む文書コレクションを選択し、パッセージの第1の集合を基礎としてパッセージシーケンスモデルを構築し、パッセージの第2の集合を含む新しい文書を受信し、かつ構築されたパッセージシーケンスモデルを基礎として文書のコレクションに対して新しい文書に関連づけられる動作シーケンスを決定する。
【選択図】図3
Description
[bj(k)]に設定できることに留意されたい。即ち、新しい状態rの放出確率は、HMMの状態の何れによっても(状態rを除く)観測が生成されていない残りの確率に設定される。遷移確率の場合と同様に、放出確率の設定に際しても、観測された全ての状態に渡る放出確率の合計は1に等しい、という制約が充足されなければならないことに留意されたい。
Claims (4)
- パッセージの第1の集合を含む文書のコレクションを選択することと、
前記パッセージの第1の集合を基礎としてパッセージシーケンスモデルを構築することと、
パッセージの第2の集合を含む新しい文書を受信することと、
前記構築されたパッセージシーケンスモデルを基礎として、前記文書のコレクションに対して前記新しい文書に関連づけられる動作シーケンスを決定することを含む方法。 - 前記決定される動作シーケンスを基礎として、前記新しい文書と前記コレクション内の少なくとも1つの文書との間の類似性を推定することをさらに含む、請求項1に記載の方法。
- 前記パッセージシーケンスモデルは隠れマルコフモデル(HMM)であり、かつ前記方法は、前記パッセージの第1の集合のフィンガープリントを生成することをさらに含み、少なくとも1つのフィンガープリントはHMMの1つの状態に対応する、請求項1に記載の方法。
- 前記HMMの状態間の遷移確率を決定することをさらに含む、請求項3に記載の方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/774,426 | 2010-05-05 | ||
US12/774,426 US8086548B2 (en) | 2010-05-05 | 2010-05-05 | Measuring document similarity by inferring evolution of documents through reuse of passage sequences |
Publications (2)
Publication Number | Publication Date |
---|---|
JP2011238221A true JP2011238221A (ja) | 2011-11-24 |
JP5819629B2 JP5819629B2 (ja) | 2015-11-24 |
Family
ID=44262593
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP2011099059A Active JP5819629B2 (ja) | 2010-05-05 | 2011-04-27 | パッセージシーケンスの再使用を介して文書の展開を推測することによる文書類似性の測定 |
Country Status (4)
Country | Link |
---|---|
US (1) | US8086548B2 (ja) |
EP (1) | EP2385471A1 (ja) |
JP (1) | JP5819629B2 (ja) |
KR (1) | KR101711839B1 (ja) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2019046048A (ja) * | 2017-08-31 | 2019-03-22 | 富士通株式会社 | 特定プログラム、特定方法および情報処理装置 |
JP6777266B1 (ja) * | 2019-09-18 | 2020-10-28 | 三菱電機株式会社 | 作業要素分析装置及び作業要素分析方法 |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8527436B2 (en) * | 2010-08-30 | 2013-09-03 | Stratify, Inc. | Automated parsing of e-mail messages |
US9262390B2 (en) | 2010-09-02 | 2016-02-16 | Lexis Nexis, A Division Of Reed Elsevier Inc. | Methods and systems for annotating electronic documents |
US9449024B2 (en) * | 2010-11-19 | 2016-09-20 | Microsoft Technology Licensing, Llc | File kinship for multimedia data tracking |
US9256697B2 (en) * | 2012-05-11 | 2016-02-09 | Microsoft Technology Licensing, Llc | Bidirectional mapping between applications and network content |
KR101429621B1 (ko) * | 2012-10-04 | 2014-08-13 | 한양대학교 에리카산학협력단 | 중복 뉴스 결합 시스템 및 중복 뉴스 결합 방법 |
CN103530421B (zh) * | 2012-11-02 | 2017-01-04 | 中国人民解放军国防科学技术大学 | 基于微博的事件相似性度量方法及系统 |
US9965521B1 (en) * | 2014-02-05 | 2018-05-08 | Google Llc | Determining a transition probability from one or more past activity indications to one or more subsequent activity indications |
US20160110315A1 (en) * | 2014-10-20 | 2016-04-21 | Xerox Corporation | Methods and systems for digitizing a document |
EP3215943B1 (en) | 2014-11-03 | 2021-04-21 | Vectra AI, Inc. | A system for implementing threat detection using threat and risk assessment of asset-actor interactions |
US10033752B2 (en) | 2014-11-03 | 2018-07-24 | Vectra Networks, Inc. | System for implementing threat detection using daily network traffic community outliers |
CN113268959B (zh) * | 2021-05-25 | 2024-05-03 | 北京北大方正电子有限公司 | 文档处理方法、装置和电子设备 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2000003362A (ja) * | 1998-06-16 | 2000-01-07 | Dainippon Printing Co Ltd | 文書解析システム及び記録媒体 |
JP2000322450A (ja) * | 1999-03-11 | 2000-11-24 | Fuji Xerox Co Ltd | ビデオの類似性探索方法、ビデオブラウザ内にビデオを提示する方法、ウェブベースのインタフェース内にビデオを提示する方法、及びコンピュータ読取り可能記録媒体、並びにコンピュータシステム |
JP2009181170A (ja) * | 2008-01-29 | 2009-08-13 | Fujitsu Ltd | 作業手順書作成支援システム |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6363381B1 (en) * | 1998-11-03 | 2002-03-26 | Ricoh Co., Ltd. | Compressed document matching |
US6990628B1 (en) * | 1999-06-14 | 2006-01-24 | Yahoo! Inc. | Method and apparatus for measuring similarity among electronic documents |
US6542635B1 (en) * | 1999-09-08 | 2003-04-01 | Lucent Technologies Inc. | Method for document comparison and classification using document image layout |
US6772120B1 (en) * | 2000-11-21 | 2004-08-03 | Hewlett-Packard Development Company, L.P. | Computer method and apparatus for segmenting text streams |
EP2067102A2 (en) * | 2006-09-15 | 2009-06-10 | Exbiblio B.V. | Capture and display of annotations in paper and electronic documents |
-
2010
- 2010-05-05 US US12/774,426 patent/US8086548B2/en not_active Expired - Fee Related
-
2011
- 2011-04-21 EP EP11163473A patent/EP2385471A1/en not_active Withdrawn
- 2011-04-27 JP JP2011099059A patent/JP5819629B2/ja active Active
- 2011-05-02 KR KR1020110041460A patent/KR101711839B1/ko active IP Right Grant
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2000003362A (ja) * | 1998-06-16 | 2000-01-07 | Dainippon Printing Co Ltd | 文書解析システム及び記録媒体 |
JP2000322450A (ja) * | 1999-03-11 | 2000-11-24 | Fuji Xerox Co Ltd | ビデオの類似性探索方法、ビデオブラウザ内にビデオを提示する方法、ウェブベースのインタフェース内にビデオを提示する方法、及びコンピュータ読取り可能記録媒体、並びにコンピュータシステム |
JP2009181170A (ja) * | 2008-01-29 | 2009-08-13 | Fujitsu Ltd | 作業手順書作成支援システム |
Non-Patent Citations (2)
Title |
---|
CSNG200900204007; 但馬 康宏、外5名: 'HMMとテキスト分類器による対話の段落分割' 情報処理学会論文誌 論文誌トランザクション 平成20年度▲2▼ , 20090415, 70〜79, 社団法人情報処理学会 * |
JPN6015006202; 但馬 康宏、外5名: 'HMMとテキスト分類器による対話の段落分割' 情報処理学会論文誌 論文誌トランザクション 平成20年度▲2▼ , 20090415, 70〜79, 社団法人情報処理学会 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2019046048A (ja) * | 2017-08-31 | 2019-03-22 | 富士通株式会社 | 特定プログラム、特定方法および情報処理装置 |
JP6777266B1 (ja) * | 2019-09-18 | 2020-10-28 | 三菱電機株式会社 | 作業要素分析装置及び作業要素分析方法 |
WO2021053738A1 (ja) * | 2019-09-18 | 2021-03-25 | 三菱電機株式会社 | 作業要素分析装置及び作業要素分析方法 |
Also Published As
Publication number | Publication date |
---|---|
US20110276523A1 (en) | 2011-11-10 |
KR101711839B1 (ko) | 2017-03-13 |
EP2385471A1 (en) | 2011-11-09 |
US8086548B2 (en) | 2011-12-27 |
KR20110122789A (ko) | 2011-11-11 |
JP5819629B2 (ja) | 2015-11-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5819629B2 (ja) | パッセージシーケンスの再使用を介して文書の展開を推測することによる文書類似性の測定 | |
JP6972265B2 (ja) | ポインタセンチネル混合アーキテクチャ | |
US10146765B2 (en) | System and method for inputting text into electronic devices | |
CN106484777B (zh) | 一种多媒体数据处理方法以及装置 | |
CN104574192A (zh) | 在多个社交网络中识别同一用户的方法及装置 | |
CN104539514A (zh) | 消息过滤方法和装置 | |
CN111161804B (zh) | 一种用于物种基因组学数据库的查询方法及系统 | |
US20160092597A1 (en) | Method, controller, program and data storage system for performing reconciliation processing | |
KR101852527B1 (ko) | 기계학습 기반의 동적 시뮬레이션 파라미터 교정 방법 | |
CN108509793A (zh) | 一种基于用户行为日志数据的用户异常行为检测方法及装置 | |
CN113268403B (zh) | 时间序列的分析预测方法、装置、设备及存储介质 | |
CN104573031B (zh) | 一种微博突发事件检测方法 | |
WO2014020834A1 (ja) | 単語潜在トピック推定装置および単語潜在トピック推定方法 | |
JP5591772B2 (ja) | 文脈依存性推定装置、発話クラスタリング装置、方法、及びプログラム | |
CN105320525A (zh) | 一种面向移动应用软件的修改影响分析方法 | |
CN100541491C (zh) | 文档信息处理装置、文档信息处理方法和计算机可读介质 | |
CN111667018A (zh) | 一种对象聚类的方法、装置、计算机可读介质及电子设备 | |
CN103744830A (zh) | 基于语义分析的excel文档中身份信息的识别方法 | |
CN113297854A (zh) | 文本到知识图谱实体的映射方法、装置、设备及存储介质 | |
CN113836005A (zh) | 一种虚拟用户的生成方法、装置、电子设备和存储介质 | |
CN113935387A (zh) | 文本相似度的确定方法、装置和计算机可读存储介质 | |
CN111897618A (zh) | 一种ui界面的确定方法、装置及存储介质 | |
JP2007011571A (ja) | 情報処理装置、およびプログラム | |
CN110929033A (zh) | 长文本分类方法、装置、计算机设备及存储介质 | |
Wang et al. | M-estimator for estimating the Burr type III parameters with outliers |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
RD04 | Notification of resignation of power of attorney |
Free format text: JAPANESE INTERMEDIATE CODE: A7424 Effective date: 20130516 |
|
A621 | Written request for application examination |
Free format text: JAPANESE INTERMEDIATE CODE: A621 Effective date: 20140421 |
|
A977 | Report on retrieval |
Free format text: JAPANESE INTERMEDIATE CODE: A971007 Effective date: 20150210 |
|
A131 | Notification of reasons for refusal |
Free format text: JAPANESE INTERMEDIATE CODE: A131 Effective date: 20150224 |
|
A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20150522 |
|
TRDD | Decision of grant or rejection written | ||
A01 | Written decision to grant a patent or to grant a registration (utility model) |
Free format text: JAPANESE INTERMEDIATE CODE: A01 Effective date: 20150908 |
|
A61 | First payment of annual fees (during grant procedure) |
Free format text: JAPANESE INTERMEDIATE CODE: A61 Effective date: 20151001 |
|
R150 | Certificate of patent or registration of utility model |
Ref document number: 5819629 Country of ref document: JP Free format text: JAPANESE INTERMEDIATE CODE: R150 |
|
R250 | Receipt of annual fees |
Free format text: JAPANESE INTERMEDIATE CODE: R250 |
|
R250 | Receipt of annual fees |
Free format text: JAPANESE INTERMEDIATE CODE: R250 |
|
R250 | Receipt of annual fees |
Free format text: JAPANESE INTERMEDIATE CODE: R250 |
|
R250 | Receipt of annual fees |
Free format text: JAPANESE INTERMEDIATE CODE: R250 |
|
R250 | Receipt of annual fees |
Free format text: JAPANESE INTERMEDIATE CODE: R250 |