WO2002069203A3 - Method for identifying term importance to a sample text using reference text - Google Patents

Method for identifying term importance to a sample text using reference text Download PDF

Info

Publication number
WO2002069203A3
WO2002069203A3 PCT/US2002/006036 US0206036W WO02069203A3 WO 2002069203 A3 WO2002069203 A3 WO 2002069203A3 US 0206036 W US0206036 W US 0206036W WO 02069203 A3 WO02069203 A3 WO 02069203A3
Authority
WO
WIPO (PCT)
Prior art keywords
sample
text
terms
frequency
sample text
Prior art date
Application number
PCT/US2002/006036
Other languages
French (fr)
Other versions
WO2002069203A2 (en
Inventor
James C Mayfield
J Paul Mcnamee
Original Assignee
Univ Johns Hopkins
James C Mayfield
J Paul Mcnamee
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Univ Johns Hopkins, James C Mayfield, J Paul Mcnamee filed Critical Univ Johns Hopkins
Priority to US10/469,445 priority Critical patent/US20040098385A1/en
Publication of WO2002069203A2 publication Critical patent/WO2002069203A2/en
Publication of WO2002069203A3 publication Critical patent/WO2002069203A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/313Selection or weighting of terms for indexing

Abstract

A method and apparatus for identifying important terms in a sample text. A frequency of occurrence of terms in the sample text (sample frequency) is compared to a frequency of occurrence of those terms in a reference text (reference frequency). Terms occurring with higher frequency in the sample text than in the reference text are considered important to the sample text. A difference between the respective sample and reference frequencies of a term may be used to determine an importance score. Terms can be ranked and/or added to an affinity set as a function of importance score or rank. When there are insufficient terms for determining a sample frequency, those terms may be used in a search query to identify documents for use as sample text to determine sample frequencies. The important terms may be used for document summarization, query refinement, cross-language translation, and cross-language query expansion.
PCT/US2002/006036 2001-02-28 2002-02-26 Method for identifying term importance to a sample text using reference text WO2002069203A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US10/469,445 US20040098385A1 (en) 2002-02-26 2002-02-26 Method for indentifying term importance to sample text using reference text

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US27196201P 2001-02-28 2001-02-28
US60/271,962 2001-02-28

Publications (2)

Publication Number Publication Date
WO2002069203A2 WO2002069203A2 (en) 2002-09-06
WO2002069203A3 true WO2002069203A3 (en) 2003-09-04

Family

ID=23037833

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2002/006036 WO2002069203A2 (en) 2001-02-28 2002-02-26 Method for identifying term importance to a sample text using reference text

Country Status (1)

Country Link
WO (1) WO2002069203A2 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6545836B1 (en) 1999-11-12 2003-04-08 Acorn Technologies, Inc. Servo control apparatus and method using absolute value input signals
CN108259980A (en) * 2017-12-29 2018-07-06 伟乐视讯科技股份有限公司 A kind of dynamic signal search method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0889419A2 (en) * 1997-07-02 1999-01-07 Matsushita Electric Industrial Co., Ltd. Keyword extracting system and text retrieval system using the same
US6018733A (en) * 1997-09-12 2000-01-25 Infoseek Corporation Methods for iteratively and interactively performing collection selection in full text searches

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0889419A2 (en) * 1997-07-02 1999-01-07 Matsushita Electric Industrial Co., Ltd. Keyword extracting system and text retrieval system using the same
US6018733A (en) * 1997-09-12 2000-01-25 Infoseek Corporation Methods for iteratively and interactively performing collection selection in full text searches

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CARROLL J M ET AL: "COMPUTER SELECTION OF KEYWORDS USING WORD-FREQUENCY ANALYSIS", AMERICAN DOCUMENTATION, ASIS. WASHINGTON, US, vol. 20, no. 3, July 1969 (1969-07-01), pages 227 - 233, XP001088701 *
G. SALTON, M. J. MCGILL: "Introduction to Modern Information Retrieval", MCGRAW-HILL, NEW YORK, XP002242139 *
H. P. EDMUNDSON, R. E. WYLLYS: "Automatic Abstracting and Indexing - Survey and Recommendations", COMMUNICATIONS OF THE ACM, vol. 4, no. 5, May 1961 (1961-05-01), New York, pages 226 - 234, XP002242138 *

Also Published As

Publication number Publication date
WO2002069203A2 (en) 2002-09-06

Similar Documents

Publication Publication Date Title
Vechtomova et al. A study of the effect of term proximity on query expansion
WO2005033967A3 (en) Systems and methods for searching using queries written in a different character-set and/or language from the target pages
WO2001022251A3 (en) Apparatus for and method of searching
WO2005091825A3 (en) Keyword recommendation for internet search engines
EP1524610A3 (en) Systems and methods for performing electronic information retrieval
WO2002035316A3 (en) Method and apparatus for query and analysis
WO2007024769A3 (en) Semantic discovery engine
WO2006110684A3 (en) System and method for searching for a query
EP1638032A3 (en) Method, system and apparatus for maintaining user privacy in a knowledge interchange system
WO2001042952A3 (en) Method and system for constructing personalized result sets
WO2003065248A3 (en) Retrieving matching documents by queries in any national language
EP0964344A3 (en) Method of and apparatus for forming an index, use of an index and a storage medium
EP1122015A3 (en) Electronic manual search system, searching method, and storage medium
Ghwanmeh et al. Enhanced algorithm for extracting the root of Arabic words
WO2008022150A3 (en) Method and apparatus for identifying and classifying query intent
EP2264619A3 (en) Search engine for video and graphics
WO2002069203A3 (en) Method for identifying term importance to a sample text using reference text
Crimp et al. Refining query expansion terms using query context
Katzer The development of a semantic differential to assess users' attitudes towards an on‐line interactive reference retrieval system
Sousa et al. Exploring different methods for solving analogies with portuguese word embeddings
WO2005045696A3 (en) Method for providing a flat view of a hierarchical namespace without requiring unique leaf names
EA200100467A1 (en) METHOD OF SEARCHING FOR STORAGE OF ELECTRON DOCUMENTS AND THEIR FRAGMENTS ON STORAGE DEVICES
Uzêda et al. Evaluation of automatic text summarization methods based on rhetorical structure theory
Sadat et al. Query expansion techniques for the CLEF bilingual track
WO2004102305A3 (en) A method of providing website searching service and a system thereof

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 10469445

Country of ref document: US

121 Ep: the epo has been informed by wipo that ep was designated in this application
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP