WO2007143898A1 - Procédé pour l'extraction et le traitement d'informations selon un modèle ternaire - Google Patents
Procédé pour l'extraction et le traitement d'informations selon un modèle ternaire Download PDFInfo
- Publication number
- WO2007143898A1 WO2007143898A1 PCT/CN2007/001661 CN2007001661W WO2007143898A1 WO 2007143898 A1 WO2007143898 A1 WO 2007143898A1 CN 2007001661 W CN2007001661 W CN 2007001661W WO 2007143898 A1 WO2007143898 A1 WO 2007143898A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- relationship
- keyword
- ternary
- file
- keywords
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
Definitions
- the present invention relates to a method for information retrieval processing, and more particularly to a method for information retrieval processing based on a ternary model.
- the effective retrieval and processing of data information and documents is the core and important content in the field of database applications. It is widely used in various electronic data, literature, commercial database resources and Internet content search applications.
- the data information retrieval technology in this field is generally a keyword-based statistical method, and a Boolean expression of a keyword is used as a query statement.
- For the file database use the keyword plus keyword dictionary to appear in the location of the file, and find the corresponding file by comparing the key words of the query statement with the keywords in the file database dictionary.
- some improvements use fuzzy logic models, vector space models, and probability retrieval models.
- the current operation is to identify the entire document by keyword indexing, individual keyword annotation, and document summary, and as a retrieval keyword in the retrieval process, this method cannot fully reflect the entire document. All the knowledge information in the middle, for example, although there is a factual relationship, but the keyword is not represented, it cannot be retrieved, and the final result is that the document in the search result is missing.
- the present invention provides a method for information retrieval processing based on a ternary model, which can solve relatively complicated search requests such as "implicit referencing".
- the invention is realized by the following scheme: a method for information retrieval processing based on a ternary model, the steps of which are:
- the above ternary relationships include membership affiliation, equivalence alias relationships, and background reference relationships.
- the above ternary relationship model method can be applied multiple times and in combination, and can produce more logical results.
- the above method has the following characteristics: 1.
- the amount of basic data is greatly reduced: At present, the retrieval system needs complete basic data in order to meet different retrieval requirements. All the conclusions of the deduction need to enter the system as the basic data, and the basic data of this method can be few, but Excavate a large number of data results for retrieval.
- FIG. 1 is a schematic diagram of a ternary relationship model of the present invention
- FIG. 2 is a relationship between character index keywords in an embodiment of the present invention
- FIG. 3 is a relationship between relationship keywords in an embodiment of the present invention
- Figure 4 is a derivation path of an "inverse relationship" in an embodiment of the present invention
- Figure 5 is a derivation path of "secondary transfer" in an embodiment of the present invention
- Figure 6 is a diagram showing the "same subject" in the embodiment of the present invention.
- Fig. 7 is a derivation path of "symmetry" in the embodiment of the present invention.
- a self-contained, self-organizing ternary relationship model is established for constructing a highly flexible intelligent indexing mechanism.
- Various common languages have the main grammatical structure: (subject, predicate, object).
- the present invention simulates this ternary relationship and implements data representation, storage and retrieval based on the ternary relationship model.
- the ternary relationship model of the present invention takes the form of triples Ka, Kr, Kb, where Ka represents the keyword a , Kb represents the keyword b , and Kr represents the relationship between the keyword a and the keyword b. relationship.
- Ka represents the keyword a
- Kb represents the keyword b
- Kr represents the relationship between the keyword a and the keyword b. relationship.
- the three-tuple form represents and implements three types of associations between keywords, including member membership, equivalent alias relationships, and background reference relationships.
- Each type can be subdivided continuously, and three types of associations can still be achieved between relationships.
- the calculus can be searched for logical meanings, which is different from the simple query method of keyword combination.
- ⁇ 3 ⁇ 4 represents the relationship between the relationship keywords, such as inverse relationship, quadratic transfer, same subject, symmetry, etc.
- Kr' represents the relationship derived by Kr according to B3 ⁇ 4, whereby the Ka' keyword and Kb' keyword have new The relationship Kr'.
- Figure 2 is an example of the relationship between character index keywords: If the person keyword in the system contains the following three triples:
- the present invention adopts an indexing method, a ternary model similar to a keyword, and the indexing is represented and implemented by a (C, R, K) group and a (Ca, R, Cb) triplet, where C represents the content of the file, K represents a keyword, R represents a relationship between a file and a keyword; Ca represents the content of the file a, Cb represents the content of the file b, and R represents a relationship between the file a and the file b.
- This method records the position, length, relevance, etc. of the keywords in the file and the file Associated knowledge such as mutual reference.
- the file can be presented in a structured manner to satisfy the user's need for related information, and on the other hand, it can also be presented according to the initial mode of the knowledge source.
- the indexing method is a good solution to the "referential" relationship in the file, for example, for the pronoun "he” appearing in a file, in the triplet Determining the actual target of the target, the system can provide the user with a search for the target, not just the same or similar in text.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Description
Claims
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/918,639 US20100030761A1 (en) | 2006-05-22 | 2007-05-22 | Method of retrieving and refining information based on tri-gram |
SM200800031T SMP200800031B (it) | 2006-05-22 | 2007-05-22 | Metodo per l'elaborazione di dati di ricerca basato sul modello ternario |
DE112007000051T DE112007000051T5 (de) | 2006-05-22 | 2007-05-22 | Dreiteiliges-Modell-basiertes Verfahren zur Informationsgewinnung und -verarbeitung |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN200610081368.0 | 2006-05-22 | ||
CNA2006100813680A CN1845105A (zh) | 2006-05-22 | 2006-05-22 | 基于三元模型的信息检索加工的方法 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2007143898A1 true WO2007143898A1 (fr) | 2007-12-21 |
Family
ID=37064033
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2007/001661 WO2007143898A1 (fr) | 2006-05-22 | 2007-05-22 | Procédé pour l'extraction et le traitement d'informations selon un modèle ternaire |
Country Status (7)
Country | Link |
---|---|
US (1) | US20100030761A1 (zh) |
JP (1) | JP2007317189A (zh) |
KR (1) | KR100911910B1 (zh) |
CN (1) | CN1845105A (zh) |
DE (1) | DE112007000051T5 (zh) |
SM (1) | SMP200800031B (zh) |
WO (1) | WO2007143898A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10410123B2 (en) | 2015-11-18 | 2019-09-10 | International Business Machines Corporation | System, method, and recording medium for modeling a correlation and a causation link of hidden evidence |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102622363A (zh) * | 2011-01-28 | 2012-08-01 | 鸿富锦精密工业(深圳)有限公司 | 关联词汇搜索系统及方法 |
CN102693320B (zh) * | 2012-06-01 | 2015-03-25 | 中国科学技术大学 | 一种搜索方法及装置 |
CN103544224A (zh) * | 2013-10-07 | 2014-01-29 | 宁波芝立软件有限公司 | 一种收养关系信息存储表示方法、系统及设备 |
CN103544223A (zh) * | 2013-10-07 | 2014-01-29 | 宁波芝立软件有限公司 | 一种基本亲缘关系信息存储表示方法、系统及设备 |
CN103544233A (zh) * | 2013-10-07 | 2014-01-29 | 宁波芝立软件有限公司 | 一种完全亲缘关系信息库存储组织方法、系统及设备 |
CN103544225A (zh) * | 2013-10-07 | 2014-01-29 | 宁波芝立软件有限公司 | 一种抚养关系信息存储表示方法、系统及设备 |
CN103544236A (zh) * | 2013-10-07 | 2014-01-29 | 宁波芝立软件有限公司 | 一种通过确定未知关系人来推导亲缘关系方法 |
CN103544222A (zh) * | 2013-10-07 | 2014-01-29 | 宁波芝立软件有限公司 | 一种通用亲缘关系信息存储表示方法、系统及设备 |
CN105117115B (zh) * | 2015-08-07 | 2018-05-08 | 小米科技有限责任公司 | 一种显示电子文档的方法和装置 |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030110158A1 (en) * | 2001-11-13 | 2003-06-12 | Seals Michael P. | Search engine visibility system |
CN1696933A (zh) * | 2005-05-27 | 2005-11-16 | 清华大学 | 基于动态规划的文本概念关系自动提取方法 |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001006997A (ja) * | 1999-06-22 | 2001-01-12 | Nec Kyushu Ltd | 目合わせ露光装置システム及び目合わせ露光方法 |
JP2003040297A (ja) * | 2001-08-06 | 2003-02-13 | Toppan Printing Co Ltd | オーバーキャップ付封緘キャップ |
-
2006
- 2006-05-22 CN CNA2006100813680A patent/CN1845105A/zh active Pending
-
2007
- 2007-05-17 JP JP2007132175A patent/JP2007317189A/ja not_active Withdrawn
- 2007-05-22 SM SM200800031T patent/SMP200800031B/it unknown
- 2007-05-22 US US11/918,639 patent/US20100030761A1/en not_active Abandoned
- 2007-05-22 DE DE112007000051T patent/DE112007000051T5/de not_active Withdrawn
- 2007-05-22 KR KR1020070049689A patent/KR100911910B1/ko not_active IP Right Cessation
- 2007-05-22 WO PCT/CN2007/001661 patent/WO2007143898A1/zh active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030110158A1 (en) * | 2001-11-13 | 2003-06-12 | Seals Michael P. | Search engine visibility system |
CN1696933A (zh) * | 2005-05-27 | 2005-11-16 | 清华大学 | 基于动态规划的文本概念关系自动提取方法 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10410123B2 (en) | 2015-11-18 | 2019-09-10 | International Business Machines Corporation | System, method, and recording medium for modeling a correlation and a causation link of hidden evidence |
US11386337B2 (en) | 2015-11-18 | 2022-07-12 | International Business Machines Corporation | Modeling a correlation and a causation link of hidden evidence |
Also Published As
Publication number | Publication date |
---|---|
SMAP200800031A (it) | 2008-05-14 |
JP2007317189A (ja) | 2007-12-06 |
KR100911910B1 (ko) | 2009-08-13 |
KR20070112729A (ko) | 2007-11-27 |
US20100030761A1 (en) | 2010-02-04 |
SMP200800031B (it) | 2008-05-14 |
CN1845105A (zh) | 2006-10-11 |
DE112007000051T5 (de) | 2008-08-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2007143898A1 (fr) | Procédé pour l'extraction et le traitement d'informations selon un modèle ternaire | |
Fu et al. | Privacy-preserving smart semantic search based on conceptual graphs over encrypted outsourced data | |
Li et al. | A co-attention neural network model for emotion cause analysis with emotional context awareness | |
CN102945237B (zh) | 基于原始用户输入建议和细分用户输入的系统和方法 | |
WO2007143899A1 (fr) | Système et procédé pour l'extraction intelligente et le traitement d'informations | |
Bergamaschi et al. | QUEST: A keyword search system for relational data based on semantic and machine learning techniques | |
CN104391908B (zh) | 一种图上基于局部敏感哈希的多关键字索引方法 | |
Liu et al. | Information retrieval and Web search | |
TW202001621A (zh) | 語料庫產生方法及裝置、人機互動處理方法及裝置 | |
Hariharan et al. | Enhanced graph based approach for multi document summarization. | |
Zhou et al. | Enhanced personalized search using social data | |
Brochier et al. | New datasets and a benchmark of document network embedding methods for scientific expert finding | |
Hu et al. | Semantic‐Based Multi‐Keyword Ranked Search Schemes over Encrypted Cloud Data | |
Fatemi et al. | Record linkage to match customer names: A probabilistic approach | |
Xu et al. | Query recommendation based on improved query flow graph | |
Guo et al. | Knowledge discovery from citation networks | |
Nuray-Turan et al. | Exploiting web querying for web people search in weps2 | |
Xie et al. | Personalized query recommendation using semantic factor model | |
Zuluaga Cajiao et al. | Graph-based similarity for document retrieval in the biomedical domain | |
Zhang et al. | Using Tag Clouds to Quickly Discover Patterns in Linked Data Sets. | |
Burgers et al. | An information system organized as stratified hypermedia | |
Wang | Annotation persistence over dynamic documents | |
Navarro Bullock et al. | Tagging data as implicit feedback for learning-to-rank | |
Melzer | Semantic Assets: Latent Structures for Knowledge Management | |
Bendersky | Information retrieval with query hypergraphs |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 07721234 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1120070000511 Country of ref document: DE |
|
RET | De translation (de og part 6b) |
Ref document number: 112007000051 Country of ref document: DE Date of ref document: 20080828 Kind code of ref document: P |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS (EPO FORM 1205A DATED 20-02-2009) |
|
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8607 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 07721234 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 11918639 Country of ref document: US |