EP1649399A1 - Procede d'estimation de la pertinence d'un document par rapport a un concept - Google Patents
Procede d'estimation de la pertinence d'un document par rapport a un conceptInfo
- Publication number
- EP1649399A1 EP1649399A1 EP04785988A EP04785988A EP1649399A1 EP 1649399 A1 EP1649399 A1 EP 1649399A1 EP 04785988 A EP04785988 A EP 04785988A EP 04785988 A EP04785988 A EP 04785988A EP 1649399 A1 EP1649399 A1 EP 1649399A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- concept
- document
- relevance
- concepts
- semantic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99933—Query processing, i.e. searching
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99933—Query processing, i.e. searching
- Y10S707/99934—Query formulation, input preparation, or translation
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99933—Query processing, i.e. searching
- Y10S707/99935—Query augmenting and refining, e.g. inexact access
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99941—Database schema or data structure
- Y10S707/99943—Generating database or data structure, e.g. via user interface
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99941—Database schema or data structure
- Y10S707/99944—Object-oriented database structure
- Y10S707/99945—Object-oriented database structure processing
Definitions
- the present invention relates to a method for estimating the relevance of a document in relation to a concept.
- a conventional method for estimating the relevance of a document in relation to a concept comprises the calculation of a function of relevance of the concept in relation to this document based on the knowledge of a predetermined semantic neighborhood of this concept.
- semantic neighborhood of a concept a set of concepts linked to this concept by different semantic links in a knowledge base.
- the calculated function takes into account in its estimation the presence in the document of the concept itself, as well as that of all the concepts belonging to its neighborhood semantics.
- the result of a request to estimate a document in relation to a concept may be wrong when this concept is ambiguous, that is to say when it has several distinct meanings.
- the semantic neighborhood of the concept includes neighboring concepts with different meanings from this concept.
- This ambiguity is sometimes taken into account in the calculation of the relevance function, by reducing the result obtained by estimating the presence of the concept taken in a predetermined sense by a result obtained by estimating the presence of concepts taken in a different meaning.
- a document in which the presence of concepts taken in a different direction is greater than the presence of concepts taken in the predetermined direction is no longer considered to be relevant to the concept.
- the invention aims to remedy this drawback by providing a method for estimating the relevance of a document compared to a concept capable of taking into account the ambiguity of the concept without degrading the estimate of the relevance of the document. compared to the concept.
- the subject of the invention is a method of estimating the relevance of a document in relation to a concept, comprising the calculation of a function of relevance of the concept in relation to this document based on knowledge of a neighborhood predetermined semantics of this concept, characterized in that it also comprises the calculation of an ambiguity function of this concept in this document, distinct from the relevance function, this calculation being estimated from the presence in the document of different meanings of this concept.
- a neighborhood predetermined semantics of this concept characterized in that it also comprises the calculation of an ambiguity function of this concept in this document, distinct from the relevance function, this calculation being estimated from the presence in the document of different meanings of this concept.
- a method according to the invention may also include one or more of the following characteristics: the relevance function measures the presence of the concept and of the concepts of the semantic neighborhood of this concept in the document; the semantic neighborhood of the concept comprises several semantic clouds of distinct meanings, and the ambiguity function compares the presence of concepts belonging to a semantic cloud corresponding to a predetermined sense of the concept with the presence of concepts belonging to different semantic clouds; the presence of each of the concepts belonging to the different semantic clouds is weighted by a predetermined coefficient; the method comprises a preliminary step of detecting ambiguous concepts, that is to say concepts comprising several semantic clouds of different meanings in their same semantic neighborhood; during the preliminary detection step, two concepts are considered ambiguous if they are linked together by at least two different semantic links.
- a concept is considered to be ambiguous if it is linked to at least two semantic clouds of different meanings; the concept belongs to a knowledge base obtained by merging a first knowledge base with a second knowledge base, the prior step of detecting ambiguous concepts being carried out during the fusion.
- a concept of the first knowledge base is considered to be ambiguous if it is linked by a new link to another concept of the first knowledge base.
- a concept of the first knowledge base is considered to be ambiguous if it is linked to at least one semantic cloud of the second knowledge base.
- a semantic cloud of a concept considered is called a set made up of concepts related to the same meaning of the concept considered.
- the concept “Orange” includes in its semantic neighborhood at least two semantic clouds of different meanings, namely a semantic cloud referring to the color orange (including among others the concepts “color”, “yellow”, “red” , etc.) and the semantic cloud relating to the orange fruit (including among others the concepts “fruit”, “citrus”, “lemon”, etc.).
- FIG. 1 schematically represents a knowledge base made up of concepts and semantic links between them ;
- Figures 2 and 3 schematically represent a method of detecting ambiguous concepts, implemented in a method according to the invention and
- FIG. 4 schematically represents a method for estimating the relevance of a document in relation to a concept according to the invention.
- a knowledge base which will be designated by the general reference 10.
- the knowledge base 10 consists of a knowledge base 10A to which we have added a knowledge base 10B, according to a method of merging knowledge bases known per se.
- a concept 12 of the knowledge base 10 is linked to other concepts by semantic links 14.
- the set of concepts thus linked to the concept 12 forms a semantic neighborhood of this concept 12.
- This semantic neighborhood can comprise several semantic clouds 16 of distinct meanings, a semantic cloud 16 of the neighborhood of concept 12 being, as has been defined previously, a set made up of concepts related to the same sense of concept 12 considered.
- this concept is said to be “ambiguous”.
- Ambiguous concepts are designated in Figure 1 by the general reference 18, and by the specific references 18A, 18B and 18C, these particular references corresponding to different modes of detection of ambiguous concepts, implemented during a prior step of analysis of the knowledge base 10. This step will be detailed with reference to FIGS. 2 and 3.
- FIG. 2 represents an implementation of this preliminary step, suitable for the detection of ambiguous concepts in a given knowledge base, for example here, the knowledge base 10A.
- Each concept 12 of the knowledge base 10A is analyzed during a step
- FIG. 3 represents an implementation of the preliminary step of detection of ambiguous concepts, more particularly during the fusion of the knowledge base
- each concept 12 existing in the knowledge base 10A is then analyzed during a step 25 during which one searches for at least one new semantic link connecting this concept 12 to another existing concept of the knowledge base 10A, this new link having been created during the merger of the two bases 10A and 10B.
- a step 26 during which the concept is marked as being an ambiguous concept 18C, since the relationship between these two concepts were not foreseen in the initial knowledge base 10A, which implies that they are potentially homonyms.
- each concept 12 existing in the knowledge base 10A is again analyzed, to search for at least one semantic link connecting this concept 12 to a cloud of new concepts of the database. knowledge 10B.
- a step 28 during which the concept is marked as being an ambiguous concept 18D, since it is likely that this link to these new concepts relates to a homonym.
- the concept 12 is not considered to be ambiguous, and we pass to a step 29 at the end of the prior step of analysis of the knowledge base.
- a request to estimate the relevance of a document in relation to a concept 12 of the knowledge base 10 is issued, for example by a search engine.
- a request to estimate the relevance of a document in relation to a concept 12 of the knowledge base 10 is issued, for example by a search engine.
- a step 32 during which a calculation of a function of relevance of the document with respect to concept 12 is carried out in a manner known per se. This relevance function is calculated by taking into account the presence in the document of concept 12 and of concepts belonging to the semantic neighborhood of this concept 12.
- Relevance (Doc , 12) f [Presence (Doc, 12), coef x Presence (Doc, neighborhood (12))], where: Relevance (Doc, 12) is the relevance function of concept 12 in the document considered; Presence (Doc, 12) is a function quantifying the presence of concept 12 in the document considered, for example, the number of times concept 12 appears in the document; Presence (Doc, neighborhood (12)) is a function quantifying the presence in the document considered of concepts belonging to the neighborhood of concept 12; coef is a predetermined weighting coefficient, allowing more or less importance to be given to concepts belonging to the semantic neighborhood of concept 12; f is for example a "maximum” function, or a "sum” function.
- the document can be considered as being relevant to concept 12, for example if the calculation gives a result higher than a predetermined threshold. In this case, we go to a step 34 during which the document is marked as being relevant with respect to concept 12. In the opposite case, where the result of the calculation gives a result below the predetermined threshold, we go to a step 36 during which the document is marked as not being relevant to concept 12. In this case, the irrelevant document is not retained. In the case where the document is marked as being relevant, the method according to the invention then provides for the calculation of an ambiguity function of the concept in the document. During a step 38, it is checked whether the concept 12 to which the request relates is marked as being ambiguous or not in the knowledge base 10.
- step 40 we pass to a step 40 in which the document is marked as relevant and unambiguous. If the concept 12 is marked as being ambiguous, we pass to a step 42 during which a calculation of the ambiguity function is carried out, comparing the presence of concepts belonging to a semantic cloud corresponding to a predetermined sense of the concept 12 (the meaning of the concept in the request) with the presence of concepts belonging to different semantic clouds.
- this ambiguity score has been calculated, we go to a step 44 during which the document is marked as relevant with an ambiguity score, and it is therefore up to the user to estimate, at using this ambiguity score, if the document is likely to interest him or not. It is clear that a method for estimating the relevance of a document in relation to a given concept, as described above, provides better results than existing methods, by weighting the relevance by a calculation of ambiguity without affecting the estimate of relevance itself.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
Description
Claims
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
FR0308997A FR2858086A1 (fr) | 2003-07-23 | 2003-07-23 | Procede d'estimation de la pertinence d'un document par rapport a un concept |
PCT/FR2004/001930 WO2005010774A1 (fr) | 2003-07-23 | 2004-07-21 | Procede d’estimation de la pertinence d’un document par rapport a un concept |
Publications (1)
Publication Number | Publication Date |
---|---|
EP1649399A1 true EP1649399A1 (fr) | 2006-04-26 |
Family
ID=33561025
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP04785988A Withdrawn EP1649399A1 (fr) | 2003-07-23 | 2004-07-21 | Procede d'estimation de la pertinence d'un document par rapport a un concept |
Country Status (4)
Country | Link |
---|---|
US (1) | US7480645B2 (fr) |
EP (1) | EP1649399A1 (fr) |
FR (1) | FR2858086A1 (fr) |
WO (1) | WO2005010774A1 (fr) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2007032003A2 (fr) * | 2005-09-13 | 2007-03-22 | Yedda, Inc. | Dispositif, systeme et procede de manipulation de demandes utilisateur |
US20140012766A1 (en) * | 2009-04-09 | 2014-01-09 | Sigram Schindler | Inventive concepts enabled semi-automatic tests of patents |
US20140258145A1 (en) * | 2009-04-09 | 2014-09-11 | Sigram Schindler | Semi-automatic generation / customization of (all) confirmative legal argument chains (lacs) in a claimed invention's spl test, as enabled by its "inventive concepts" |
US20130132320A1 (en) * | 2009-04-09 | 2013-05-23 | Sigram Schindler Beteiligungsgesellschaft Mbh | Innovation expert system, ies, and its ptr data structure, ptr-ds |
US20130179386A1 (en) * | 2009-04-09 | 2013-07-11 | Sigram Schindler | Innovation expert system, ies, and its ptr data structure, ptr-ds |
US9727842B2 (en) * | 2009-08-21 | 2017-08-08 | International Business Machines Corporation | Determining entity relevance by relationships to other relevant entities |
FR3049465B1 (fr) * | 2016-03-29 | 2018-04-27 | Oxymo Technologies Inc. | Composition pharmaceutique preventive et curative a base de peroxometallate |
US10756977B2 (en) * | 2018-05-23 | 2020-08-25 | International Business Machines Corporation | Node relevance determination in an evolving network |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6978277B2 (en) * | 1989-10-26 | 2005-12-20 | Encyclopaedia Britannica, Inc. | Multimedia search system |
US5675819A (en) * | 1994-06-16 | 1997-10-07 | Xerox Corporation | Document information retrieval using global word co-occurrence patterns |
US6847960B1 (en) * | 1999-03-29 | 2005-01-25 | Nec Corporation | Document retrieval by information unit |
US6711585B1 (en) * | 1999-06-15 | 2004-03-23 | Kanisa Inc. | System and method for implementing a knowledge management system |
AU764415B2 (en) * | 1999-08-06 | 2003-08-21 | Lexis-Nexis | System and method for classifying legal concepts using legal topic scheme |
GB0018645D0 (en) * | 2000-07-28 | 2000-09-13 | Tenara Limited | Dynamic personalization via semantic networks |
US7013308B1 (en) * | 2000-11-28 | 2006-03-14 | Semscript Ltd. | Knowledge storage and retrieval system and method |
US7051022B1 (en) * | 2000-12-19 | 2006-05-23 | Oracle International Corporation | Automated extension for generation of cross references in a knowledge base |
US7184948B2 (en) * | 2001-06-15 | 2007-02-27 | Sakhr Software Company | Method and system for theme-based word sense ambiguity reduction |
US6996575B2 (en) * | 2002-05-31 | 2006-02-07 | Sas Institute Inc. | Computer-implemented system and method for text-based document processing |
US7251648B2 (en) * | 2002-06-28 | 2007-07-31 | Microsoft Corporation | Automatically ranking answers to database queries |
-
2003
- 2003-07-23 FR FR0308997A patent/FR2858086A1/fr not_active Withdrawn
-
2004
- 2004-07-21 EP EP04785988A patent/EP1649399A1/fr not_active Withdrawn
- 2004-07-21 WO PCT/FR2004/001930 patent/WO2005010774A1/fr active Application Filing
- 2004-07-21 US US10/565,707 patent/US7480645B2/en not_active Expired - Fee Related
Non-Patent Citations (1)
Title |
---|
See references of WO2005010774A1 * |
Also Published As
Publication number | Publication date |
---|---|
US7480645B2 (en) | 2009-01-20 |
US20060265367A1 (en) | 2006-11-23 |
WO2005010774A1 (fr) | 2005-02-03 |
FR2858086A1 (fr) | 2005-01-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
AU2009234120B2 (en) | Search results ranking using editing distance and document information | |
JP5117379B2 (ja) | オンライン会話コンテンツを用いて表示のために広告コンテンツ及び/又は他の関連情報を選択するシステム及び方法 | |
US7249121B1 (en) | Identification of semantic units from within a search query | |
US7660792B2 (en) | System and method for spam identification | |
US8161050B2 (en) | Visualizing hyperlinks in a search results list | |
US20060184500A1 (en) | Using content analysis to detect spam web pages | |
US8122022B1 (en) | Abbreviation detection for common synonym generation | |
JPH11191114A (ja) | メタ検索方法、画像検索方法、メタ検索エンジン及び画像検索エンジン | |
FR2821186A1 (fr) | Dispositif d'extraction d'informations d'un texte a base de connaissances | |
JP6105599B2 (ja) | 情報の検索 | |
EP1649399A1 (fr) | Procede d'estimation de la pertinence d'un document par rapport a un concept | |
US9971782B2 (en) | Document tagging and retrieval using entity specifiers | |
Cheng et al. | Fuzzy matching of web queries to structured data | |
EP1746521A1 (fr) | Procédé de classement d'un ensemble de documents électroniques du type pouvant contenir des liens hypertextes vers d'autres documents électroniques | |
EP1532550B1 (fr) | Detection d'une image de reference robuste a de grandes transformations photometriques | |
EP2227755B1 (fr) | Procede d'analyse d'un contenu multimedia, produit programme d'ordinateur et dispositif d'analyse correspondants | |
FR2845179A1 (fr) | Procede de regroupement d'images d'une sequence video | |
FR2899708A1 (fr) | Procede de de-doublonnage rapide d'un ensemble de documents ou d'un ensemble de donnees contenues dans un fichier | |
EP3774171A1 (fr) | Procédé d'évaluation de la qualité de raccordement de deux composants tubulaires | |
US9208157B1 (en) | Spam detection for user-generated multimedia items based on concept clustering | |
Kumar et al. | A comparative study of BYG search engines | |
WO2008083447A1 (fr) | Procédé et système d'obtention d'informations connexes | |
BE1013153A3 (fr) | Procede et systeme de prelevement d'information. | |
FR2975553A1 (fr) | Aide a la recherche de contenus videos sur un reseau de communication | |
FR3108190A1 (fr) | Procédé et système de détection d’un motif commun dans un ensemble de fichiers texte |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20060216 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LI LU MC NL PL PT RO SE SI SK TR |
|
DAX | Request for extension of the european patent (deleted) | ||
RIN1 | Information on inventor provided before grant (corrected) |
Inventor name: KIRSNER, DOMINIQUE Inventor name: ALLYS, GUILLAUME Inventor name: DE BOIS, LUC Inventor name: MARTIN, STEPHANE |
|
17Q | First examination report despatched |
Effective date: 20100517 |
|
RAP1 | Party data changed (applicant data changed or rights of an application transferred) |
Owner name: ORANGE |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 20160202 |