New! View global litigation for patent families

WO2001039038A1 - Method and device for retrieving information - Google Patents

Method and device for retrieving information

Info

Publication number
WO2001039038A1
WO2001039038A1 PCT/BE2000/000140 BE0000140W WO0139038A1 WO 2001039038 A1 WO2001039038 A1 WO 2001039038A1 BE 0000140 W BE0000140 W BE 0000140W WO 0139038 A1 WO0139038 A1 WO 0139038A1
Authority
WO
Grant status
Application
Patent type
Prior art keywords
list
descriptive
keywords
keyword
new
Prior art date
Application number
PCT/BE2000/000140
Other languages
French (fr)
Inventor
Nicolas Poncelet
Original Assignee
Datastat
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor ; File system structures therefor
    • G06F17/3061Information retrieval; Database structures therefor ; File system structures therefor of unstructured textual data
    • G06F17/30634Querying
    • G06F17/30657Query processing
    • G06F17/30675Query execution
    • G06F17/30684Query execution using natural language analysis

Abstract

The invention concerns a method for retrieving information from a database, comprising steps which consist in: a) reading at least a key word selected by the user; b) comparing the key word with the data of the database; c) generating a list of descriptive elements; and d) representing the list of descriptive elements. The method further comprises steps which consist in: e) generating a list of key words, each key word of the list of key words being associated with at least a descriptive element of the list of descriptive elements; and f) representing the list of key words indicating an assignment frequency of the key word in the list of descriptive elements. The invention also concerns a system for implementing said method.

Description

"A method and information collection system."

The present invention relates to an information collection method for a database, comprising the steps of: reading at least a keyword selected by the user; comparing the keyword with the data in the database; generating a list of descriptors, the selected keyword being associated with each of the descriptive elements; and representation of the list of descriptors.

Such a process is well known in Internet search engines, such as Yahoo ® Alta Vista ®, .... When the user wants information on a certain topic, it introduces a keyword or select a keyword corresponding to a search category. We then present the results to the user in a list of descriptive elements relevant to the topic asked, each descriptive element indicating information such as the site name, URL, a brief description of the page site, the last modified date

The known methods of problem used in large databases, such as those available on the Internet, is that the number of results on a requested topic can be considerable. Take for example the word "patent" in the search engine Alta Vista ®. By introducing this word as keyword, obtained tens of thousands of web pages. The user can then either browse the results, or refine the search by adding one or more keywords in its application in combination with the keyword initially chosen. This can make the long search to achieve a result that does not necessarily correspond to the desired information.

An object of the present invention is to provide an alternative to known methods and systems, which allows the user to get the information more effectively.

This object is achieved in the process according to the invention, by providing the further steps of generating a list of keywords, each key word in the list of keywords associated with at least one descriptive element of the list of items descriptive; and representation of the key word list indicating a frequency allocation of the keyword in the list of descriptors.

By producing and representing a list of keywords, the user a better overview on the topic. In particular, if it wishes refine the search, it is better guided to select new keywords, and so it increases the chances of getting the desired results. Indicating the frequency of keywords, the user can estimate in advance the number of results it will get.

In a preferred embodiment, the method further comprises the steps of generating a new list of descriptors, each descriptive element of the new list of descriptive elements being associated with the selected keyword and a keyword selected by other the user in the list of keywords; representation of the new list of descriptors; generating a new list of keywords, each key word of the new list being associated with at least one descriptive element of the new list of descriptors; and representation of the new list of keywords. In particular, these steps are applied repeatedly. The user thus refines, step by step, his research by combining keywords chosen at each stage. Advantageously, it represents the list respectively the new list of keywords in order of frequency, preferably in order of decreasing frequency. In many cases, it turns out that keywords with the greatest frequency are the most relevant.

According to one alternative, it is listed, respectively the new list of keywords in alphabetical order. This solution is useful when the user is looking for specific words. To group descriptive elements having the same meaning in a database, it classifies a number of descriptive elements of the list in one category.

The invention also relates to an information collection system of a database, comprising: means arranged to read at least a keyword selected by the user; means arranged to compare the keyword with the data from the database; and means arranged to generate a list of descriptors, the selected keyword being associated with each descriptive elements. The system according to the invention is characterized in that it further comprises: means arranged to generate a list of keywords, each keyword is associated with at least one descriptive element of the list of descriptors.

In particular, the system comprises a filter arranged to remove predetermined keywords in the list of keywords. This eliminates irrelevant keywords, such as "the," "the," "a," "an", "to", etc. the list of keywords. Advantageously, the filter is further arranged to delete keywords having a number of characters smaller than a certain limit. It is for example not display the keywords of 1, 2 or 3 characters, because it will more than likely irrelevant. Details of the invention are described hereinafter referring to the drawings illustrating an operational example of the process according to the invention.

Figure 1 illustrates the step of selecting a data bank.

2 shows the descriptive elements of the database selected in the list of Figure 1.

Figure 3 shows the list of keywords associated with the descriptive elements of Figure 2. Figure 4 illustrates a new list of descriptors associated with the keyword selected from the list in Figure 3.

Figure 5 illustrates a new list of keywords associated with the new list of descriptive elements of Figure 4.

6 illustrates a new list of descriptors associated with keywords selected in the previous steps.

The invention aims to provide a method and system for efficient collection of information in a database. The principle of operation will be explained below using some examples. Consider a database containing the four following descriptors:

• "I like the taste"

• "It is too expensive"

• "It is very nice" • "The taste is excellent"

According to the invention, the user can request a list of keywords associated with these phrases or descriptors. In this case, the key words consist of words that appear in the descriptive elements. In general, each descriptive element can be associated with a number of keywords. The list of keywords is made in this case as follows:

• "is" 3

• "taste" 2

• "it" 2

• "the" 2

When the user selects one of the keywords, such as "taste", it gets a new list of descriptors, each of these descriptors being associated with the keyword "taste":

• "Nike the TASTE"

• "The TASTE is excellent"

After reading these two sentences, we see that they have the same meaning. Therefore, they will be classified in a next step, in the same category called eg "Like the taste". The process allows to assist the user to filter the descriptive elements and thus more easily determine which descriptive elements having the same meaning, should be grouped under a single category. Grouped descriptive elements are removed from the list.

The process is then repeated until there are no more two descriptive elements having the same meaning.

Another example is explained below with reference to Figures 1 to 6. In a first step, illustrated in Figure 1, selects one of the available databases: Q15, Q20 or Q6A. The user selects e.g. Q15 database. This step can be considered step of selecting a first keyword. The descriptive elements of the selected database are displayed on screen, as shown in Figure 2. In the example illustrated, each descriptive element consists of a series of words. In an Internet search engine application, each descriptive element comprises for example several sets of words, such as the site title, a brief description (for example up to 60 characters), a more elaborate description (up to 256 characters ), the URL, ...

The user can then call a screen showing the key words associated with the descriptive elements. Preferably, the key words are displayed in descending order of frequency, as shown in Figure 3. Thus, the user discovers that the word "lack" is associated with 28 times the descriptive elements. Also seen in the list of irrelevant words such as "and", "in", "on" and "to". These can be removed from the list of keywords indicating the excluded in a word filter ( "excluded words").

If the user selects the keyword "lack", he gets a new list of descriptors (Figure 4). This list contains the descriptive elements of Figure 2 in which the word "lack" is associated. In this case, the word "lack" is associated with when it appears as such in the descriptive element. According to one alternative, it is enough that the keyword is associated with the descriptive element. In an application of Internet search engine, this could be implemented by providing that each site registered on the site that hosts the search engine includes a number of keywords that are not necessarily on the screen user.

As a next step, the user can request a new list of keywords (Figure 5), this time appearing on the new list of descriptors. In this list, the key word previously selected ( "lack") is excluded. The user can refine the search by selecting a new keyword, such as "confidential". After selecting this tag, a new list of descriptors appear on the screen (Figure 6). The key words "taste" and "confidence" are associated with each descriptive element of the list. The process of the invention can be applied in any database type. In particular, it can be used to facilitate internet searches. In this application, it could also involve an advertising strip ( "banner") to a keyword. Thus, when the user selects a particular keyword, we displayed the advertising strip associated keyword.

Claims

1. Information sampling method of a database, comprising the steps of: a) reading at least a keyword selected by the user; b) comparing the keyword with the data in the database; c) generating a list of descriptors, the selected keyword being associated with each of the descriptive elements; and d) representation of the list of descriptors; characterized in that it further comprises the step of: e) generating a list of keywords, each key word in the list of keywords associated with at least one descriptive element of the list of descriptors; and f) representation of the key word list indicating a frequency allocation of the keyword in the list of descriptors.
2. The method of claim 1, the steps further comprising: g) generating a new list of descriptors, each descriptive element of the new list of descriptive elements being associated with the selected keyword and another keyword selected by the user in the list of keywords; h) representation of the new list of descriptors; i) generating a new list of keywords, each key word of the new list being associated with at least one descriptive element of the new list of descriptors; and j) representation of the new list of keywords.
3. The method of claim 2, wherein steps g) to j) are applied repeatedly.
4. A process according to one of claims 1 to 3, wherein the list shows respectively the new list of keywords in order of frequency.
5. The method of claim 4, wherein the list respectively the new list of keywords is shown in order of decreasing frequency.
6. A process according to one of claims 1 to 3, wherein the list shows respectively the new list of key words in alphabetical order.
7. The process of any preceding claim, wherein classifies a number of descriptive elements of the list as a single category.
8. Information Collection System of a database, comprising: a) means arranged to read at least a keyword selected by the user; b) means arranged to compare the key word with the data from the database; and c) means arranged to produce a list of descriptors, the selected keyword being associated with each of the descriptive elements; characterized in that it further comprises: d) means arranged to produce a list of keywords, each keyword is associated with at least one descriptive element of the list of descriptors.
9. The system of claim 8, further comprising a filter arranged to remove predetermined keywords in the list of keywords.
10. The system of claim 9, wherein the filter is further arranged to delete keywords having a number of characters smaller than a certain limit.
PCT/BE2000/000140 1999-11-25 2000-11-24 Method and device for retrieving information WO2001039038A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
BE9900767A BE1013153A3 (en) 1999-11-25 1999-11-25 Method and system for information collection.
BE9900767 1999-11-25

Publications (1)

Publication Number Publication Date
WO2001039038A1 true true WO2001039038A1 (en) 2001-05-31

Family

ID=3892178

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/BE2000/000140 WO2001039038A1 (en) 1999-11-25 2000-11-24 Method and device for retrieving information

Country Status (2)

Country Link
BE (1) BE1013153A3 (en)
WO (1) WO2001039038A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7676462B2 (en) 2002-12-19 2010-03-09 International Business Machines Corporation Method, apparatus, and program for refining search criteria through focusing word definition

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0364179A2 (en) * 1988-10-11 1990-04-18 NeXT COMPUTER, INC. Method and apparatus for extracting keywords from text
EP0741364A1 (en) * 1995-05-01 1996-11-06 Xerox Corporation Automatic method of selecting multi-word key phrases from a document
WO1999012108A1 (en) * 1997-09-04 1999-03-11 British Telecommunications Public Limited Company Methods and/or systems for selecting data sets
US5987457A (en) * 1997-11-25 1999-11-16 Acceleration Software International Corporation Query refinement method for searching documents

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0364179A2 (en) * 1988-10-11 1990-04-18 NeXT COMPUTER, INC. Method and apparatus for extracting keywords from text
EP0741364A1 (en) * 1995-05-01 1996-11-06 Xerox Corporation Automatic method of selecting multi-word key phrases from a document
WO1999012108A1 (en) * 1997-09-04 1999-03-11 British Telecommunications Public Limited Company Methods and/or systems for selecting data sets
US5987457A (en) * 1997-11-25 1999-11-16 Acceleration Software International Corporation Query refinement method for searching documents

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JO T C: "News article classification based on categorical points from keywords in backdata", COMPUTATIONAL INTELLIGENCE FOR MODELLING, CONTROL AND AUTOMATION. INTELLIGENT IMAGE PROCESSING, DATA ANALYSIS AND INFORMATION RETRIEVAL (CONCURRENT SYSTEMS ENGINEERING SERIES VOL.56), COMPUTATIONAL INTELLIGENCE FOR MODELLING, CONTROL AND AUTOMATION., 1999, Amsterdam, Netherlands, IOS Press, Netherlands, pages 211 - 214, XP000964992, ISBN: 90-5199-475-3 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7676462B2 (en) 2002-12-19 2010-03-09 International Business Machines Corporation Method, apparatus, and program for refining search criteria through focusing word definition

Also Published As

Publication number Publication date Type
BE1013153A3 (en) 2001-10-02 grant

Similar Documents

Publication Publication Date Title
Chen Knowledge management systems: a text mining perspective
US5787411A (en) Method and apparatus for database filter generation by display selection
US6711585B1 (en) System and method for implementing a knowledge management system
Callan et al. Automatic discovery of language models for text databases
Gutwin et al. Improving browsing in digital libraries with keyphrase indexes
US6944612B2 (en) Structured contextual clustering method and system in a federated search engine
US6915308B1 (en) Method and apparatus for information mining and filtering
US7555476B2 (en) Apparatus and methods for organizing and/or presenting data
US6014662A (en) Configurable briefing presentations of search results on a graphical interface
US7185001B1 (en) Systems and methods for document searching and organizing
US6721736B1 (en) Methods, computer system, and computer program product for configuring a meta search engine
US20020052928A1 (en) Computer method and apparatus for collecting people and organization information from Web sites
US20050027694A1 (en) User-friendly search results display system, method, and computer program product
US20020054167A1 (en) Method and apparatus for filtering and displaying a thought network from a thought's perspective
US6128635A (en) Document display system and electronic dictionary
US5926811A (en) Statistical thesaurus, method of forming same, and use thereof in query expansion in automated text searching
US20070255735A1 (en) User-context-based search engine
US20050198070A1 (en) Method and system for compression indexing and efficient proximity search of text data
US7505956B2 (en) Method for classification
US6067552A (en) User interface system and method for browsing a hypertext database
US7013300B1 (en) Locating, filtering, matching macro-context from indexed database for searching context where micro-context relevant to textual input by user
US20050060304A1 (en) Navigational learning in a structured transaction processing system
US20040088312A1 (en) System and method for determining community overlap
US6044365A (en) System for indexing and retrieving graphic and sound data
Chen et al. Internet browsing and searching: User evaluation of category map and concept space techniques

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ CZ DE DE DK DK DM DZ EE EE ES FI FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase