CA2423476A1 - Extended functionality for an inverse inference engine based web search - Google Patents

Extended functionality for an inverse inference engine based web search Download PDF

Info

Publication number
CA2423476A1
CA2423476A1 CA002423476A CA2423476A CA2423476A1 CA 2423476 A1 CA2423476 A1 CA 2423476A1 CA 002423476 A CA002423476 A CA 002423476A CA 2423476 A CA2423476 A CA 2423476A CA 2423476 A1 CA2423476 A1 CA 2423476A1
Authority
CA
Canada
Prior art keywords
document
term
matrix
natural language
user query
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CA002423476A
Other languages
French (fr)
Other versions
CA2423476C (en
Inventor
Giovanni B. Marchisio
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
VCVC III LLC
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Publication of CA2423476A1 publication Critical patent/CA2423476A1/en
Application granted granted Critical
Publication of CA2423476C publication Critical patent/CA2423476C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3334Selection or weighting of terms from queries, including natural language queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Algebra (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

An extension of an inverse inference search engine (Fig. 1) provides cross language document retrieval, in which the information matrix (52) used as input to the inverse inference engine is organized into rows of blocks (58) corresponding to languages within a predetermined set of natural languages. The information matrix (52) is organized into two column-wise partitions (60 ). The first partition consists of blocks of entries representing fully translated documents, while the second partition is a matrix of blocks of entries representing documents for which translations are not available in a ll of the predetermined languages.

Claims (12)

1. An information retrieval method comprising the steps of:
generating a term-document matrix to represent electronic information files stored in a computer system, each element in said term-document matrix indicating a number of occurrences of a term within a respective one of said electronic information files, wherein said term-document matrix includes a first partition, said first partition including entries representing at least a first version and a second version of at least one reference document within said electronic information files, wherein said first version of said reference document is in a first natural language and said second version of said reference document is a translation of said first version of said reference document into a second natural language, and wherein said term-document matrix further includes a second partition, elements in said second partition representing at least one target document within said electronic information files, wherein said target document is in one of the set of natural languages consisting of said first natural language and said second natural language;
generating, responsive to said term-document matrix, a term-spread matrix, wherein said term spread matrix is a weighted autocorrelation of said term-document matrix, said term-spread matrix indicating an amount of variation in term usage in the information files and, also, the extent to which terms are correlated;
receiving a user query from a user, said user query consisting of at least one term;

in response to said user query, generating a user query vector, wherein said user query vector has as many elements as the rows of the term-spread matrix;
generating, responsive to said user query vector, an error-covariance matrix, wherein said error-covariance matrix reflects an expected degree of uncertainty in the initial choice of keywords of said user;
formulating, responsive to said term-spread matrix, error-covariance matrix, and user query vector, a constrained optimization problem, wherein the choice of a lambda value equal to a LaGrange multiplier value in said constrained optimization problem determines the extent of a trade-off between a degree of fit and the stability of all solutions to said constrained optimization problem;
generating, responsive to said constrained optimization problem, a solution vector including a plurality of document weights, each one of said plurality of document weights corresponding to one of each said target documents, wherein each of said document weights reflects a degree of correlation between said user query and the corresponding one of said target documents; and providing an information response to said user reflecting said document weights, wherein at least one of said document weights is positive and at least one of said document weights is negative, wherein said positive document weights represent the relevance of selected ones of said target documents in said first natural language to said user query, and wherein absolute values of said negative document weights represent the relevance of selected ones of said target documents in said second natural language to said user query.
2. The method of claim 1, wherein said providing said information response further comprises organizing display objects representing said target documents associated with said document weights according to the sign of each said of said document weights, whereby said documents in said first natural language are displayed in proximity to each other and documents in said second natural language are displayed in proximity to each other.
3. The method of claim 2, wherein said providing said information response further comprises organizing said display objects representing documents associated with said document weights according to the absolute value of each said of said document weights, such that said display object are displayed in decreasing absolute value of associated document weight.
4. The method of claim 1, wherein said step of generating said term-document matrix includes generating elements in said matrix reflecting the number of occurrences of each one of said terms in each one of said information files.
5. The method of claim 1, wherein rows of said term-document matrix are each associated with a respective term, and wherein a first set of said rows are associated with terms in said first natural language, and a second set of said rows are associated with terms in said second natural language.
6. The method of claim 5, wherein said first partition including entries representing at least a first version, and a second version of said at least one reference document, wherein said first version of said reference document is in said first natural language, and said second version of said reference document is a translation of said first version of said reference document into said second natural language.
7. The method of claim 1, wherein said second version of said reference document is another document that is topically related to said first version of said reference document.
8. The method of claim 1, wherein said term-document matrix is one of a plurality of term document matrices, each of said plurality of term document matrices associated with a translation from a source language to a target foreign language, and wherein said first natural language comprises said source language and said second natural language comprises said target natural language.
9. An information retrieval method comprising the steps of:
generating a term-document matrix to represent electronic information files stored in a computer system, each element in said term-document matrix indicating a number of occurrences of a term within a respective one of said electronic information files, wherein said term-document matrix includes a first partition, said first partition including entries representing at least one reference document within said electronic information files, wherein said reference document is predetermined to contain reliable information, and wherein said term-document matrix further includes a second partition, elements in said second partition representing a plurality of search documents within said electronic information files, wherein said search documents are predetermined to contain insufficient information for establishing semantic links;
generating, responsive to said term-document matrix, a term-spread matrix, wherein said term spread matrix is a weighted autocorrelation of said term-document matrix, said term-spread matrix indicating an amount of variation in term usage in the information files and, also, the extent to which terms are correlated;
receiving a user query from a user, said user, query consisting of at least one term;
in response to said user query, generating a user query vector, wherein said user query vector has as many elements as the rows of the term-spread matrix;
generating, responsive to said user query vector, an error-covariance matrix, wherein said error-covariance matrix reflects an expected degree of uncertainty in the initial choice of keywords of said user;
formulating, responsive to said term-spread matrix, error-covariance matrix, and user query vector, a constrained optimization problem, wherein the choice of a lambda value equal to a LaGrange multiplier value in said constrained optimization problem determines the extent of a trade-off between a degree of fit and the stability of all solutions to said constrained optimization problem;
generating, responsive to said constrained optimization problem, a solution vector including a plurality of document weights, each one of said plurality of document weights corresponding to one of said plurality of search documents, wherein each of said document weights reflects a degree of correlation between said user query and the corresponding one of said plurality of search documents; and providing an information, response to said user reflecting said document weights.
10. The method of claim 9, further comprising periodically accumulating information from multiple sources, and adding said information to said search documents.
11. The method of claim 8, wherein said reference document comprises an encyclopedia.
12. The method of claim 8, wherein said reference document comprises a collection of news reports.
CA2423476A 2000-09-25 2001-09-25 Extended functionality for an inverse inference engine based web search Expired - Fee Related CA2423476C (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US23525500P 2000-09-25 2000-09-25
US60/235,255 2000-09-25
PCT/US2001/029943 WO2002027536A1 (en) 2000-09-25 2001-09-25 Extended functionality for an inverse inference engine based web search

Publications (2)

Publication Number Publication Date
CA2423476A1 true CA2423476A1 (en) 2002-04-04
CA2423476C CA2423476C (en) 2010-07-20

Family

ID=22884742

Family Applications (1)

Application Number Title Priority Date Filing Date
CA2423476A Expired - Fee Related CA2423476C (en) 2000-09-25 2001-09-25 Extended functionality for an inverse inference engine based web search

Country Status (4)

Country Link
EP (1) EP1323067A4 (en)
AU (1) AU2001296304A1 (en)
CA (1) CA2423476C (en)
WO (1) WO2002027536A1 (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7526425B2 (en) 2001-08-14 2009-04-28 Evri Inc. Method and system for extending keyword searching to syntactically and semantically annotated data
US7398201B2 (en) 2001-08-14 2008-07-08 Evri Inc. Method and system for enhanced data searching
WO2006074324A1 (en) * 2005-01-04 2006-07-13 Thomson Global Resources Systems, methods, software, and interfaces for multilingual information retrieval
EP1949273A1 (en) 2005-11-16 2008-07-30 Evri Inc. Extending keyword searching to syntactically and semantically annotated data
US8954469B2 (en) 2007-03-14 2015-02-10 Vcvciii Llc Query templates and labeled search tip system, methods, and techniques
US8594996B2 (en) 2007-10-17 2013-11-26 Evri Inc. NLP-based entity recognition and disambiguation
AU2008312423B2 (en) 2007-10-17 2013-12-19 Vcvc Iii Llc NLP-based content recommender
US9710556B2 (en) 2010-03-01 2017-07-18 Vcvc Iii Llc Content recommendation based on collections of entities
US8645125B2 (en) 2010-03-30 2014-02-04 Evri, Inc. NLP-based systems and methods for providing quotations
US8838633B2 (en) 2010-08-11 2014-09-16 Vcvc Iii Llc NLP-based sentiment analysis
US9405848B2 (en) 2010-09-15 2016-08-02 Vcvc Iii Llc Recommending mobile device activities
US8725739B2 (en) 2010-11-01 2014-05-13 Evri, Inc. Category-based content recommendation
US9116995B2 (en) 2011-03-30 2015-08-25 Vcvc Iii Llc Cluster-based identification of news stories
CN108984647A (en) * 2018-06-26 2018-12-11 北京工业大学 A kind of water utilities domain knowledge map construction method based on Chinese text

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4839853A (en) * 1988-09-15 1989-06-13 Bell Communications Research, Inc. Computer information retrieval using latent semantic structure
US5301109A (en) * 1990-06-11 1994-04-05 Bell Communications Research, Inc. Computerized cross-language document retrieval using latent semantic indexing
US5619709A (en) * 1993-09-20 1997-04-08 Hnc, Inc. System and method of context vector generation and retrieval
EP0856175A4 (en) * 1995-08-16 2000-05-24 Univ Syracuse Multilingual document retrieval system and method using semantic vector matching
KR980004126A (en) * 1997-12-16 1998-03-30 양승택 Query Language Conversion Apparatus and Method for Searching Multilingual Web Documents

Also Published As

Publication number Publication date
EP1323067A1 (en) 2003-07-02
EP1323067A4 (en) 2013-11-20
CA2423476C (en) 2010-07-20
WO2002027536A1 (en) 2002-04-04
AU2001296304A1 (en) 2002-04-08

Similar Documents

Publication Publication Date Title
US5293552A (en) Method for storing bibliometric information on items from a finite source of text, and in particular document postings for use in a full-text document retrieval system
Zeimpekis et al. TMG: A MATLAB toolbox for generating term-document matrices from text collections
US8620900B2 (en) Method for using dual indices to support query expansion, relevance/non-relevance models, blind/relevance feedback and an intelligent search interface
RU2398272C2 (en) Method and system for indexing and searching in databases
US8862565B1 (en) Techniques for web site integration
CA2423476A1 (en) Extended functionality for an inverse inference engine based web search
US8060516B2 (en) Methods and systems for compressing indices
EP1596315A1 (en) Method and system for ranking objects based on intra-type and inter-type relationships
DK0730765T3 (en) Associative text search and retrieval system
CN101576929B (en) Fast vocabulary entry prompting realization method
EP1618467A2 (en) Information retrieval and text mining using distributed latent semantic indexing
Cacheda et al. A case study of distributed information retrieval architectures to index one terabyte of text
WO1999064965A3 (en) Electronic file retrieval method and system
Stata et al. The term vector database: fast access to indexing terms for web pages
US20050027678A1 (en) Computer executable dimension reduction and retrieval engine
CN101393551B (en) Index establishing system and method for patent full text search
CN102915312B (en) Information issuing method in website and system
Mendelzon et al. What do the neighbours think? Computing Web page reputations
Duhan et al. A novel approach for organizing web search results using ranking and clustering
EP0508519B1 (en) A method for storing bibliometric information on items from a finite source of text, and in particular document postings for use in a full-text document retrieval system
Bueno et al. Enrichment of text documents using information retrieval techniques in a distributed environment
Zhu et al. Exploiting Semantic Association To Answer'Vague Queries'.
Lobo et al. Acquiring the best page using query term synonym combination
Nørvåg et al. Creating synthetic temporal document collections
Aktug et al. Analysis of signature generation schemes for multiterm queries in partitioned signature file environments

Legal Events

Date Code Title Description
EEER Examination request
MKLA Lapsed

Effective date: 20190925