WO2007048607A3 - Automatisches, computerbasiertes ähnlichkeitsberechnungssystem zur quantifizierung der ähnlichkeit von textausdrücken - Google Patents

Automatisches, computerbasiertes ähnlichkeitsberechnungssystem zur quantifizierung der ähnlichkeit von textausdrücken Download PDF

Info

Publication number
WO2007048607A3
WO2007048607A3 PCT/EP2006/010332 EP2006010332W WO2007048607A3 WO 2007048607 A3 WO2007048607 A3 WO 2007048607A3 EP 2006010332 W EP2006010332 W EP 2006010332W WO 2007048607 A3 WO2007048607 A3 WO 2007048607A3
Authority
WO
WIPO (PCT)
Prior art keywords
similarity
expressions
automatic
computer
text
Prior art date
Application number
PCT/EP2006/010332
Other languages
English (en)
French (fr)
Other versions
WO2007048607A2 (de
Inventor
Libo Chen
Ulrich Thiel
Peter Fankhauser
Thomas Kamps
Original Assignee
Fraunhofer Ges Forschung
Libo Chen
Ulrich Thiel
Peter Fankhauser
Thomas Kamps
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fraunhofer Ges Forschung, Libo Chen, Ulrich Thiel, Peter Fankhauser, Thomas Kamps filed Critical Fraunhofer Ges Forschung
Priority to EP06818299A priority Critical patent/EP1941404A2/de
Priority to US12/091,578 priority patent/US20090157656A1/en
Priority to JP2008537004A priority patent/JP2009514076A/ja
Publication of WO2007048607A2 publication Critical patent/WO2007048607A2/de
Publication of WO2007048607A3 publication Critical patent/WO2007048607A3/de

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri

Abstract

Die vorliegende Erfindung bezieht sich auf eine Vorrichtung und ein Verfahren zur automatischen, computerbasierten Ähnlichkeitsgewichtung von Textausdrücken. Das erfindungsgemäße System bzw. Verfahren weist eine Dokumenten-Datenbankeinheit (1), eine Kandidatenausdruck-Speichereinheit (2), eine Ähnlichkeitsgewichtswert-Berechnungseinheit (3) auf und ist dadurch gekennzeichnet ,. dass die Ähnlichkeitsgewichtswerte agw(t1, t2) für die einzelnen Paare von Ausdrücken auf Basis eines Ähnlichkeitsmaßes occ_con(t1, t2) berechenbar sind, welches sowohl die Gesamthäufigkeit des gemeinsamen Vorkommens der beiden Ausdrücke eines Ausdruckspaares innerhalb ein und desselben Textsegmentes in einer Menge von mehreren Text Segmenten, als auch die Gesamtzahl unterschiedlicher Kontextausdrücke in dieser Menge von Textsegmenten berücksichtigt.
PCT/EP2006/010332 2005-10-27 2006-10-26 Automatisches, computerbasiertes ähnlichkeitsberechnungssystem zur quantifizierung der ähnlichkeit von textausdrücken WO2007048607A2 (de)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP06818299A EP1941404A2 (de) 2005-10-27 2006-10-26 Automatisches, computerbasiertes ähnlichkeitsberechnungssystem zur quantifizierung der ähnlichkeit von textausdrücken
US12/091,578 US20090157656A1 (en) 2005-10-27 2006-10-26 Automatic, computer-based similarity calculation system for quantifying the similarity of text expressions
JP2008537004A JP2009514076A (ja) 2005-10-27 2006-10-26 テキスト表現の類似性を定量化するコンピュータを用いた自動類似度計算システム

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
DE102005051617.3 2005-10-27
DE102005051617A DE102005051617B4 (de) 2005-10-27 2005-10-27 Automatisches, computerbasiertes Ähnlichkeitsberechnungssystem zur Quantifizierung der Ähnlichkeit von Textausdrücken

Publications (2)

Publication Number Publication Date
WO2007048607A2 WO2007048607A2 (de) 2007-05-03
WO2007048607A3 true WO2007048607A3 (de) 2007-06-21

Family

ID=37820638

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2006/010332 WO2007048607A2 (de) 2005-10-27 2006-10-26 Automatisches, computerbasiertes ähnlichkeitsberechnungssystem zur quantifizierung der ähnlichkeit von textausdrücken

Country Status (6)

Country Link
US (1) US20090157656A1 (de)
EP (1) EP1941404A2 (de)
JP (1) JP2009514076A (de)
CN (1) CN101361066A (de)
DE (1) DE102005051617B4 (de)
WO (1) WO2007048607A2 (de)

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100530183C (zh) * 2006-05-19 2009-08-19 华为技术有限公司 一种收集用户数据的系统及方法
US8156142B2 (en) * 2008-12-22 2012-04-10 Sap Ag Semantically weighted searching in a governed corpus of terms
US8166051B1 (en) * 2009-02-03 2012-04-24 Sandia Corporation Computation of term dominance in text documents
JP5458880B2 (ja) 2009-03-02 2014-04-02 富士通株式会社 文書検査装置、コンピュータ読み取り可能な記録媒体および文書検査方法
JP5382651B2 (ja) * 2009-09-09 2014-01-08 独立行政法人情報通信研究機構 単語対取得装置、単語対取得方法、およびプログラム
US8356045B2 (en) * 2009-12-09 2013-01-15 International Business Machines Corporation Method to identify common structures in formatted text documents
CN101908041B (zh) * 2010-05-06 2012-07-04 江苏省现代企业信息化应用支撑软件工程技术研发中心 一种基于多代理机制的多词表达抽取系统及方法
JP2013114383A (ja) * 2011-11-28 2013-06-10 Denso Corp プライバシー保護方法、車両用装置、車両用通信システムおよび携帯端末
JP2013149061A (ja) * 2012-01-19 2013-08-01 Nec Corp 文書類似性評価システム、文書類似性評価方法およびコンピュータ・プログラム
CN102622411A (zh) * 2012-02-17 2012-08-01 清华大学 一种结构化摘要的生成方法
CN102595214A (zh) * 2012-03-06 2012-07-18 浪潮(山东)电子信息有限公司 一种数字电视节目关联推荐的方法
US10691737B2 (en) * 2013-02-05 2020-06-23 Intel Corporation Content summarization and/or recommendation apparatus and method
US20160179868A1 (en) * 2014-12-18 2016-06-23 GM Global Technology Operations LLC Methodology and apparatus for consistency check by comparison of ontology models
RU2623902C2 (ru) * 2015-07-13 2017-06-29 Федеральное государственное бюджетное учреждение "4 Центральный научно-исследовательский институт" Министерства обороны Российской Федерации Устройство выявления предпочтительного средства защиты информации
CN106649650B (zh) * 2016-12-10 2020-08-18 宁波财经学院 一种需求信息双向匹配方法
CN108804617B (zh) * 2018-05-30 2021-08-10 广州杰赛科技股份有限公司 领域术语抽取方法、装置、终端设备及存储介质
CN111159499B (zh) * 2019-12-31 2022-04-29 南方电网调峰调频发电有限公司 一种基于字符串间相似性的电力系统模型搜索排序方法

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003060766A1 (en) * 2002-01-16 2003-07-24 Elucidon Ab Information data retrieval, where the data is organized in terms, documents and document corpora

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7251637B1 (en) * 1993-09-20 2007-07-31 Fair Isaac Corporation Context vector generation and retrieval
US6757646B2 (en) * 2000-03-22 2004-06-29 Insightful Corporation Extended functionality for an inverse inference engine based web search
JP2002169834A (ja) * 2000-11-20 2002-06-14 Hewlett Packard Co <Hp> 文書のベクトル解析を行うコンピュータおよび方法
US7552385B2 (en) * 2001-05-04 2009-06-23 International Business Machines Coporation Efficient storage mechanism for representing term occurrence in unstructured text documents
US7243092B2 (en) * 2001-12-28 2007-07-10 Sap Ag Taxonomy generation for electronic documents
US6847966B1 (en) * 2002-04-24 2005-01-25 Engenium Corporation Method and system for optimally searching a document database using a representative semantic space
JP3765801B2 (ja) * 2003-05-28 2006-04-12 沖電気工業株式会社 対訳表現抽出装置、対訳表現抽出方法、および対訳表現抽出プログラム

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003060766A1 (en) * 2002-01-16 2003-07-24 Elucidon Ab Information data retrieval, where the data is organized in terms, documents and document corpora

Also Published As

Publication number Publication date
EP1941404A2 (de) 2008-07-09
WO2007048607A2 (de) 2007-05-03
DE102005051617B4 (de) 2009-10-15
US20090157656A1 (en) 2009-06-18
CN101361066A (zh) 2009-02-04
JP2009514076A (ja) 2009-04-02
DE102005051617A1 (de) 2007-05-03

Similar Documents

Publication Publication Date Title
WO2007048607A3 (de) Automatisches, computerbasiertes ähnlichkeitsberechnungssystem zur quantifizierung der ähnlichkeit von textausdrücken
WO2003102764A3 (en) Behavior-based adaptation of computer systems
WO2006001906A3 (en) Graph-based ranking algorithms for text processing
WO2009036289A3 (en) Database system and method for tracking goods
WO2006115598A3 (en) Method and system for generating spelling suggestions
WO2006033765A3 (en) Real-time data localization
WO2006132759A3 (en) Method and apparatus for candidate evaluation
WO2007078389A3 (en) Heuristic supply chain modeling method and system
AU2003253405A1 (en) Method, data processing device and computer program product for processing data
WO2010080454A3 (en) Identifying comments to show in connection with a document
WO2006125138A3 (en) Searching a database including prioritizing results based on historical data
WO2006132793A3 (en) Learning facts from semi-structured text
WO2005013097A3 (en) Effectiveness of internet advertising
DK1747540T3 (da) Fremgangsmåde til genkendelse og overvågning af fiberholdige medier, samt anvendelse af fremgangsmåden indenfor informationsteknologien
WO2003021510A3 (en) Method and system for parsing purchase information from web pages
GB2458842A (en) Method and system for delivering presentations
WO2006118824A3 (en) Transaction transforms
WO2004070626A3 (en) System method and computer program product for obtaining structured data from text
WO2005015345A3 (en) Financial investment advice system and method
WO2004072778A3 (en) Method and apparatus for evaluating and monitoring collaterialized debt obligations
WO2006008733A3 (en) A method for determining near duplicate data objects
WO2006113887A3 (en) Method and system for evaluating vocabulary similarity
WO2007075658A3 (en) System and method for processing composite trading orders
WO2004083983A3 (de) Vergleich von modellen eines komplexen systems
WO2007100527A3 (en) Computerized transaction method and system

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 200680048441.2

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 2006818299

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2008537004

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 12091578

Country of ref document: US

WWP Wipo information: published in national office

Ref document number: 2006818299

Country of ref document: EP