WO2002069203A3 - Method for identifying term importance to a sample text using reference text - Google Patents
Method for identifying term importance to a sample text using reference text Download PDFInfo
- Publication number
- WO2002069203A3 WO2002069203A3 PCT/US2002/006036 US0206036W WO02069203A3 WO 2002069203 A3 WO2002069203 A3 WO 2002069203A3 US 0206036 W US0206036 W US 0206036W WO 02069203 A3 WO02069203 A3 WO 02069203A3
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sample
- text
- terms
- frequency
- sample text
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/313—Selection or weighting of terms for indexing
Abstract
A method and apparatus for identifying important terms in a sample text. A frequency of occurrence of terms in the sample text (sample frequency) is compared to a frequency of occurrence of those terms in a reference text (reference frequency). Terms occurring with higher frequency in the sample text than in the reference text are considered important to the sample text. A difference between the respective sample and reference frequencies of a term may be used to determine an importance score. Terms can be ranked and/or added to an affinity set as a function of importance score or rank. When there are insufficient terms for determining a sample frequency, those terms may be used in a search query to identify documents for use as sample text to determine sample frequencies. The important terms may be used for document summarization, query refinement, cross-language translation, and cross-language query expansion.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/469,445 US20040098385A1 (en) | 2002-02-26 | 2002-02-26 | Method for indentifying term importance to sample text using reference text |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US27196201P | 2001-02-28 | 2001-02-28 | |
US60/271,962 | 2001-02-28 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2002069203A2 WO2002069203A2 (en) | 2002-09-06 |
WO2002069203A3 true WO2002069203A3 (en) | 2003-09-04 |
Family
ID=23037833
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2002/006036 WO2002069203A2 (en) | 2001-02-28 | 2002-02-26 | Method for identifying term importance to a sample text using reference text |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2002069203A2 (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6545836B1 (en) | 1999-11-12 | 2003-04-08 | Acorn Technologies, Inc. | Servo control apparatus and method using absolute value input signals |
CN108259980A (en) * | 2017-12-29 | 2018-07-06 | 伟乐视讯科技股份有限公司 | A kind of dynamic signal search method |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0889419A2 (en) * | 1997-07-02 | 1999-01-07 | Matsushita Electric Industrial Co., Ltd. | Keyword extracting system and text retrieval system using the same |
US6018733A (en) * | 1997-09-12 | 2000-01-25 | Infoseek Corporation | Methods for iteratively and interactively performing collection selection in full text searches |
-
2002
- 2002-02-26 WO PCT/US2002/006036 patent/WO2002069203A2/en not_active Application Discontinuation
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0889419A2 (en) * | 1997-07-02 | 1999-01-07 | Matsushita Electric Industrial Co., Ltd. | Keyword extracting system and text retrieval system using the same |
US6018733A (en) * | 1997-09-12 | 2000-01-25 | Infoseek Corporation | Methods for iteratively and interactively performing collection selection in full text searches |
Non-Patent Citations (3)
Title |
---|
CARROLL J M ET AL: "COMPUTER SELECTION OF KEYWORDS USING WORD-FREQUENCY ANALYSIS", AMERICAN DOCUMENTATION, ASIS. WASHINGTON, US, vol. 20, no. 3, July 1969 (1969-07-01), pages 227 - 233, XP001088701 * |
G. SALTON, M. J. MCGILL: "Introduction to Modern Information Retrieval", MCGRAW-HILL, NEW YORK, XP002242139 * |
H. P. EDMUNDSON, R. E. WYLLYS: "Automatic Abstracting and Indexing - Survey and Recommendations", COMMUNICATIONS OF THE ACM, vol. 4, no. 5, May 1961 (1961-05-01), New York, pages 226 - 234, XP002242138 * |
Also Published As
Publication number | Publication date |
---|---|
WO2002069203A2 (en) | 2002-09-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Vechtomova et al. | A study of the effect of term proximity on query expansion | |
WO2005033967A3 (en) | Systems and methods for searching using queries written in a different character-set and/or language from the target pages | |
WO2001022251A3 (en) | Apparatus for and method of searching | |
WO2005091825A3 (en) | Keyword recommendation for internet search engines | |
EP1524610A3 (en) | Systems and methods for performing electronic information retrieval | |
WO2002035316A3 (en) | Method and apparatus for query and analysis | |
WO2007024769A3 (en) | Semantic discovery engine | |
WO2006110684A3 (en) | System and method for searching for a query | |
EP1638032A3 (en) | Method, system and apparatus for maintaining user privacy in a knowledge interchange system | |
WO2001042952A3 (en) | Method and system for constructing personalized result sets | |
WO2003065248A3 (en) | Retrieving matching documents by queries in any national language | |
EP0964344A3 (en) | Method of and apparatus for forming an index, use of an index and a storage medium | |
EP1122015A3 (en) | Electronic manual search system, searching method, and storage medium | |
Ghwanmeh et al. | Enhanced algorithm for extracting the root of Arabic words | |
WO2008022150A3 (en) | Method and apparatus for identifying and classifying query intent | |
EP2264619A3 (en) | Search engine for video and graphics | |
WO2002069203A3 (en) | Method for identifying term importance to a sample text using reference text | |
Crimp et al. | Refining query expansion terms using query context | |
Katzer | The development of a semantic differential to assess users' attitudes towards an on‐line interactive reference retrieval system | |
Sousa et al. | Exploring different methods for solving analogies with portuguese word embeddings | |
WO2005045696A3 (en) | Method for providing a flat view of a hierarchical namespace without requiring unique leaf names | |
EA200100467A1 (en) | METHOD OF SEARCHING FOR STORAGE OF ELECTRON DOCUMENTS AND THEIR FRAGMENTS ON STORAGE DEVICES | |
Uzêda et al. | Evaluation of automatic text summarization methods based on rhetorical structure theory | |
Sadat et al. | Query expansion techniques for the CLEF bilingual track | |
WO2004102305A3 (en) | A method of providing website searching service and a system thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 10469445 Country of ref document: US |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
122 | Ep: pct application non-entry in european phase | ||
NENP | Non-entry into the national phase |
Ref country code: JP |
|
WWW | Wipo information: withdrawn in national office |
Country of ref document: JP |