CA2423476A1 - Extended functionality for an inverse inference engine based web search - Google Patents
Extended functionality for an inverse inference engine based web search Download PDFInfo
- Publication number
- CA2423476A1 CA2423476A1 CA002423476A CA2423476A CA2423476A1 CA 2423476 A1 CA2423476 A1 CA 2423476A1 CA 002423476 A CA002423476 A CA 002423476A CA 2423476 A CA2423476 A CA 2423476A CA 2423476 A1 CA2423476 A1 CA 2423476A1
- Authority
- CA
- Canada
- Prior art keywords
- document
- term
- matrix
- natural language
- user query
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/3332—Query translation
- G06F16/3334—Selection or weighting of terms from queries, including natural language queries
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Algebra (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
An extension of an inverse inference search engine (Fig. 1) provides cross language document retrieval, in which the information matrix (52) used as input to the inverse inference engine is organized into rows of blocks (58) corresponding to languages within a predetermined set of natural languages. The information matrix (52) is organized into two column-wise partitions (60 ). The first partition consists of blocks of entries representing fully translated documents, while the second partition is a matrix of blocks of entries representing documents for which translations are not available in a ll of the predetermined languages.
Claims (12)
1. An information retrieval method comprising the steps of:
generating a term-document matrix to represent electronic information files stored in a computer system, each element in said term-document matrix indicating a number of occurrences of a term within a respective one of said electronic information files, wherein said term-document matrix includes a first partition, said first partition including entries representing at least a first version and a second version of at least one reference document within said electronic information files, wherein said first version of said reference document is in a first natural language and said second version of said reference document is a translation of said first version of said reference document into a second natural language, and wherein said term-document matrix further includes a second partition, elements in said second partition representing at least one target document within said electronic information files, wherein said target document is in one of the set of natural languages consisting of said first natural language and said second natural language;
generating, responsive to said term-document matrix, a term-spread matrix, wherein said term spread matrix is a weighted autocorrelation of said term-document matrix, said term-spread matrix indicating an amount of variation in term usage in the information files and, also, the extent to which terms are correlated;
receiving a user query from a user, said user query consisting of at least one term;
in response to said user query, generating a user query vector, wherein said user query vector has as many elements as the rows of the term-spread matrix;
generating, responsive to said user query vector, an error-covariance matrix, wherein said error-covariance matrix reflects an expected degree of uncertainty in the initial choice of keywords of said user;
formulating, responsive to said term-spread matrix, error-covariance matrix, and user query vector, a constrained optimization problem, wherein the choice of a lambda value equal to a LaGrange multiplier value in said constrained optimization problem determines the extent of a trade-off between a degree of fit and the stability of all solutions to said constrained optimization problem;
generating, responsive to said constrained optimization problem, a solution vector including a plurality of document weights, each one of said plurality of document weights corresponding to one of each said target documents, wherein each of said document weights reflects a degree of correlation between said user query and the corresponding one of said target documents; and providing an information response to said user reflecting said document weights, wherein at least one of said document weights is positive and at least one of said document weights is negative, wherein said positive document weights represent the relevance of selected ones of said target documents in said first natural language to said user query, and wherein absolute values of said negative document weights represent the relevance of selected ones of said target documents in said second natural language to said user query.
generating a term-document matrix to represent electronic information files stored in a computer system, each element in said term-document matrix indicating a number of occurrences of a term within a respective one of said electronic information files, wherein said term-document matrix includes a first partition, said first partition including entries representing at least a first version and a second version of at least one reference document within said electronic information files, wherein said first version of said reference document is in a first natural language and said second version of said reference document is a translation of said first version of said reference document into a second natural language, and wherein said term-document matrix further includes a second partition, elements in said second partition representing at least one target document within said electronic information files, wherein said target document is in one of the set of natural languages consisting of said first natural language and said second natural language;
generating, responsive to said term-document matrix, a term-spread matrix, wherein said term spread matrix is a weighted autocorrelation of said term-document matrix, said term-spread matrix indicating an amount of variation in term usage in the information files and, also, the extent to which terms are correlated;
receiving a user query from a user, said user query consisting of at least one term;
in response to said user query, generating a user query vector, wherein said user query vector has as many elements as the rows of the term-spread matrix;
generating, responsive to said user query vector, an error-covariance matrix, wherein said error-covariance matrix reflects an expected degree of uncertainty in the initial choice of keywords of said user;
formulating, responsive to said term-spread matrix, error-covariance matrix, and user query vector, a constrained optimization problem, wherein the choice of a lambda value equal to a LaGrange multiplier value in said constrained optimization problem determines the extent of a trade-off between a degree of fit and the stability of all solutions to said constrained optimization problem;
generating, responsive to said constrained optimization problem, a solution vector including a plurality of document weights, each one of said plurality of document weights corresponding to one of each said target documents, wherein each of said document weights reflects a degree of correlation between said user query and the corresponding one of said target documents; and providing an information response to said user reflecting said document weights, wherein at least one of said document weights is positive and at least one of said document weights is negative, wherein said positive document weights represent the relevance of selected ones of said target documents in said first natural language to said user query, and wherein absolute values of said negative document weights represent the relevance of selected ones of said target documents in said second natural language to said user query.
2. The method of claim 1, wherein said providing said information response further comprises organizing display objects representing said target documents associated with said document weights according to the sign of each said of said document weights, whereby said documents in said first natural language are displayed in proximity to each other and documents in said second natural language are displayed in proximity to each other.
3. The method of claim 2, wherein said providing said information response further comprises organizing said display objects representing documents associated with said document weights according to the absolute value of each said of said document weights, such that said display object are displayed in decreasing absolute value of associated document weight.
4. The method of claim 1, wherein said step of generating said term-document matrix includes generating elements in said matrix reflecting the number of occurrences of each one of said terms in each one of said information files.
5. The method of claim 1, wherein rows of said term-document matrix are each associated with a respective term, and wherein a first set of said rows are associated with terms in said first natural language, and a second set of said rows are associated with terms in said second natural language.
6. The method of claim 5, wherein said first partition including entries representing at least a first version, and a second version of said at least one reference document, wherein said first version of said reference document is in said first natural language, and said second version of said reference document is a translation of said first version of said reference document into said second natural language.
7. The method of claim 1, wherein said second version of said reference document is another document that is topically related to said first version of said reference document.
8. The method of claim 1, wherein said term-document matrix is one of a plurality of term document matrices, each of said plurality of term document matrices associated with a translation from a source language to a target foreign language, and wherein said first natural language comprises said source language and said second natural language comprises said target natural language.
9. An information retrieval method comprising the steps of:
generating a term-document matrix to represent electronic information files stored in a computer system, each element in said term-document matrix indicating a number of occurrences of a term within a respective one of said electronic information files, wherein said term-document matrix includes a first partition, said first partition including entries representing at least one reference document within said electronic information files, wherein said reference document is predetermined to contain reliable information, and wherein said term-document matrix further includes a second partition, elements in said second partition representing a plurality of search documents within said electronic information files, wherein said search documents are predetermined to contain insufficient information for establishing semantic links;
generating, responsive to said term-document matrix, a term-spread matrix, wherein said term spread matrix is a weighted autocorrelation of said term-document matrix, said term-spread matrix indicating an amount of variation in term usage in the information files and, also, the extent to which terms are correlated;
receiving a user query from a user, said user, query consisting of at least one term;
in response to said user query, generating a user query vector, wherein said user query vector has as many elements as the rows of the term-spread matrix;
generating, responsive to said user query vector, an error-covariance matrix, wherein said error-covariance matrix reflects an expected degree of uncertainty in the initial choice of keywords of said user;
formulating, responsive to said term-spread matrix, error-covariance matrix, and user query vector, a constrained optimization problem, wherein the choice of a lambda value equal to a LaGrange multiplier value in said constrained optimization problem determines the extent of a trade-off between a degree of fit and the stability of all solutions to said constrained optimization problem;
generating, responsive to said constrained optimization problem, a solution vector including a plurality of document weights, each one of said plurality of document weights corresponding to one of said plurality of search documents, wherein each of said document weights reflects a degree of correlation between said user query and the corresponding one of said plurality of search documents; and providing an information, response to said user reflecting said document weights.
generating a term-document matrix to represent electronic information files stored in a computer system, each element in said term-document matrix indicating a number of occurrences of a term within a respective one of said electronic information files, wherein said term-document matrix includes a first partition, said first partition including entries representing at least one reference document within said electronic information files, wherein said reference document is predetermined to contain reliable information, and wherein said term-document matrix further includes a second partition, elements in said second partition representing a plurality of search documents within said electronic information files, wherein said search documents are predetermined to contain insufficient information for establishing semantic links;
generating, responsive to said term-document matrix, a term-spread matrix, wherein said term spread matrix is a weighted autocorrelation of said term-document matrix, said term-spread matrix indicating an amount of variation in term usage in the information files and, also, the extent to which terms are correlated;
receiving a user query from a user, said user, query consisting of at least one term;
in response to said user query, generating a user query vector, wherein said user query vector has as many elements as the rows of the term-spread matrix;
generating, responsive to said user query vector, an error-covariance matrix, wherein said error-covariance matrix reflects an expected degree of uncertainty in the initial choice of keywords of said user;
formulating, responsive to said term-spread matrix, error-covariance matrix, and user query vector, a constrained optimization problem, wherein the choice of a lambda value equal to a LaGrange multiplier value in said constrained optimization problem determines the extent of a trade-off between a degree of fit and the stability of all solutions to said constrained optimization problem;
generating, responsive to said constrained optimization problem, a solution vector including a plurality of document weights, each one of said plurality of document weights corresponding to one of said plurality of search documents, wherein each of said document weights reflects a degree of correlation between said user query and the corresponding one of said plurality of search documents; and providing an information, response to said user reflecting said document weights.
10. The method of claim 9, further comprising periodically accumulating information from multiple sources, and adding said information to said search documents.
11. The method of claim 8, wherein said reference document comprises an encyclopedia.
12. The method of claim 8, wherein said reference document comprises a collection of news reports.
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US23525500P | 2000-09-25 | 2000-09-25 | |
US60/235,255 | 2000-09-25 | ||
PCT/US2001/029943 WO2002027536A1 (en) | 2000-09-25 | 2001-09-25 | Extended functionality for an inverse inference engine based web search |
Publications (2)
Publication Number | Publication Date |
---|---|
CA2423476A1 true CA2423476A1 (en) | 2002-04-04 |
CA2423476C CA2423476C (en) | 2010-07-20 |
Family
ID=22884742
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CA2423476A Expired - Fee Related CA2423476C (en) | 2000-09-25 | 2001-09-25 | Extended functionality for an inverse inference engine based web search |
Country Status (4)
Country | Link |
---|---|
EP (1) | EP1323067A4 (en) |
AU (1) | AU2001296304A1 (en) |
CA (1) | CA2423476C (en) |
WO (1) | WO2002027536A1 (en) |
Families Citing this family (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7526425B2 (en) | 2001-08-14 | 2009-04-28 | Evri Inc. | Method and system for extending keyword searching to syntactically and semantically annotated data |
US7398201B2 (en) | 2001-08-14 | 2008-07-08 | Evri Inc. | Method and system for enhanced data searching |
WO2006074324A1 (en) * | 2005-01-04 | 2006-07-13 | Thomson Global Resources | Systems, methods, software, and interfaces for multilingual information retrieval |
EP1949273A1 (en) | 2005-11-16 | 2008-07-30 | Evri Inc. | Extending keyword searching to syntactically and semantically annotated data |
US8954469B2 (en) | 2007-03-14 | 2015-02-10 | Vcvciii Llc | Query templates and labeled search tip system, methods, and techniques |
US8594996B2 (en) | 2007-10-17 | 2013-11-26 | Evri Inc. | NLP-based entity recognition and disambiguation |
AU2008312423B2 (en) | 2007-10-17 | 2013-12-19 | Vcvc Iii Llc | NLP-based content recommender |
US9710556B2 (en) | 2010-03-01 | 2017-07-18 | Vcvc Iii Llc | Content recommendation based on collections of entities |
US8645125B2 (en) | 2010-03-30 | 2014-02-04 | Evri, Inc. | NLP-based systems and methods for providing quotations |
US8838633B2 (en) | 2010-08-11 | 2014-09-16 | Vcvc Iii Llc | NLP-based sentiment analysis |
US9405848B2 (en) | 2010-09-15 | 2016-08-02 | Vcvc Iii Llc | Recommending mobile device activities |
US8725739B2 (en) | 2010-11-01 | 2014-05-13 | Evri, Inc. | Category-based content recommendation |
US9116995B2 (en) | 2011-03-30 | 2015-08-25 | Vcvc Iii Llc | Cluster-based identification of news stories |
CN108984647A (en) * | 2018-06-26 | 2018-12-11 | 北京工业大学 | A kind of water utilities domain knowledge map construction method based on Chinese text |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4839853A (en) * | 1988-09-15 | 1989-06-13 | Bell Communications Research, Inc. | Computer information retrieval using latent semantic structure |
US5301109A (en) * | 1990-06-11 | 1994-04-05 | Bell Communications Research, Inc. | Computerized cross-language document retrieval using latent semantic indexing |
US5619709A (en) * | 1993-09-20 | 1997-04-08 | Hnc, Inc. | System and method of context vector generation and retrieval |
EP0856175A4 (en) * | 1995-08-16 | 2000-05-24 | Univ Syracuse | Multilingual document retrieval system and method using semantic vector matching |
KR980004126A (en) * | 1997-12-16 | 1998-03-30 | 양승택 | Query Language Conversion Apparatus and Method for Searching Multilingual Web Documents |
-
2001
- 2001-09-25 EP EP01977165.8A patent/EP1323067A4/en not_active Ceased
- 2001-09-25 WO PCT/US2001/029943 patent/WO2002027536A1/en active Application Filing
- 2001-09-25 AU AU2001296304A patent/AU2001296304A1/en not_active Abandoned
- 2001-09-25 CA CA2423476A patent/CA2423476C/en not_active Expired - Fee Related
Also Published As
Publication number | Publication date |
---|---|
EP1323067A1 (en) | 2003-07-02 |
EP1323067A4 (en) | 2013-11-20 |
CA2423476C (en) | 2010-07-20 |
WO2002027536A1 (en) | 2002-04-04 |
AU2001296304A1 (en) | 2002-04-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US5293552A (en) | Method for storing bibliometric information on items from a finite source of text, and in particular document postings for use in a full-text document retrieval system | |
Zeimpekis et al. | TMG: A MATLAB toolbox for generating term-document matrices from text collections | |
US8620900B2 (en) | Method for using dual indices to support query expansion, relevance/non-relevance models, blind/relevance feedback and an intelligent search interface | |
RU2398272C2 (en) | Method and system for indexing and searching in databases | |
US8862565B1 (en) | Techniques for web site integration | |
CA2423476A1 (en) | Extended functionality for an inverse inference engine based web search | |
US8060516B2 (en) | Methods and systems for compressing indices | |
EP1596315A1 (en) | Method and system for ranking objects based on intra-type and inter-type relationships | |
DK0730765T3 (en) | Associative text search and retrieval system | |
CN101576929B (en) | Fast vocabulary entry prompting realization method | |
EP1618467A2 (en) | Information retrieval and text mining using distributed latent semantic indexing | |
Cacheda et al. | A case study of distributed information retrieval architectures to index one terabyte of text | |
WO1999064965A3 (en) | Electronic file retrieval method and system | |
Stata et al. | The term vector database: fast access to indexing terms for web pages | |
US20050027678A1 (en) | Computer executable dimension reduction and retrieval engine | |
CN101393551B (en) | Index establishing system and method for patent full text search | |
CN102915312B (en) | Information issuing method in website and system | |
Mendelzon et al. | What do the neighbours think? Computing Web page reputations | |
Duhan et al. | A novel approach for organizing web search results using ranking and clustering | |
EP0508519B1 (en) | A method for storing bibliometric information on items from a finite source of text, and in particular document postings for use in a full-text document retrieval system | |
Bueno et al. | Enrichment of text documents using information retrieval techniques in a distributed environment | |
Zhu et al. | Exploiting Semantic Association To Answer'Vague Queries'. | |
Lobo et al. | Acquiring the best page using query term synonym combination | |
Nørvåg et al. | Creating synthetic temporal document collections | |
Aktug et al. | Analysis of signature generation schemes for multiterm queries in partitioned signature file environments |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
EEER | Examination request | ||
MKLA | Lapsed |
Effective date: 20190925 |