WO2004083999A2 - Systeme et procede destines a la recherche interne au moyen d'un vocabulaire selectionne - Google Patents

Systeme et procede destines a la recherche interne au moyen d'un vocabulaire selectionne Download PDF

Info

Publication number
WO2004083999A2
WO2004083999A2 PCT/US2003/007461 US0307461W WO2004083999A2 WO 2004083999 A2 WO2004083999 A2 WO 2004083999A2 US 0307461 W US0307461 W US 0307461W WO 2004083999 A2 WO2004083999 A2 WO 2004083999A2
Authority
WO
WIPO (PCT)
Prior art keywords
controlled vocabulary
search
term
terms
controlled
Prior art date
Application number
PCT/US2003/007461
Other languages
English (en)
Other versions
WO2004083999A3 (fr
Inventor
Liu Songqiao
Original Assignee
Webchoir, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Webchoir, Inc. filed Critical Webchoir, Inc.
Publication of WO2004083999A2 publication Critical patent/WO2004083999A2/fr
Publication of WO2004083999A3 publication Critical patent/WO2004083999A3/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3322Query formulation using system suggestions
    • G06F16/3323Query formulation using system suggestions using document space presentation or visualization, e.g. category, hierarchy or range presentation and selection

Definitions

  • the present invention relates to the use of controlled vocabulary data to facilitate and improve an Internet or database search.
  • a controlled vocabulary is tool which can be used in fields that have a need to describe numerous and various items in a precise and exact manner. For example, a controlled vocabulary can be used by a museum to index the objects in its collection.
  • a controlled vocabulary identifies terms used in a particular field or area, and defines relationships between the terms.
  • a controlled vocabulary does not contain all possible terms that may be used in a particular field. Instead, it is a limited set of relevant terms that are used in a given field.
  • a controlled vocabulary is a collection of descriptive terms. Examples of controlled vocabularies include thesauri, subject headings and classifications.
  • a major purpose of a controlled vocabulary is to match the terms brought to the system by a researcher with the terms used by an indexer. Whenever there are alternative names for a type of item, a indexer will have to choose one to use for indexing, and provide an entry under each of the others saying what the preferred term is. For example, a library controlled vocabulary may index all full-length works of fiction as "novels". Then, someone who searches for "mysteries” must be told that they should look for "novels” instead. This is no problem if the two words are really synonyms, and even if they do differ slightly in meaning it may still be preferable to choose one and index everything under that. The controlled vocabulary will therefore indicate synonyms for terms within the controlled vocabulary.
  • a controlled vocabulary will also describe other types of relationships between words.
  • a controlled vocabulary will often organize terms in a hierarchical format.
  • the term “novels” in the present example can be a subset of the term “works of fiction” (which might also include “poems” and “short stories”).
  • the controlled vocabulary will specify where in the hierarchy the terms fall. Broader terms and narrower terms can be specified.
  • Other types of relationships can also be specified by the controlled vocabulary.
  • the present invention overcomes the limitations of the prior art by providing a system and method of generating a search request for a data repository using controlled vocabularies .
  • the method includes the steps of invoking a command on a graphical user interface to activate a controlled vocabulary display program containing a controlled vocabulary, selecting at least one term of interest in the controlled vocabulary, retrieving additional terms related to the term of interest from the controlled vocabulary by a filter means selected by a user, and formulating a search query by combining the selected term and the related terms, according to a searcher's preferences.
  • the data repository is the Internet
  • the query is a URL which is constructed using the selected term and additional terms to improve precision or increase recall.
  • Figure 1 is a block diagram showing a general purpose computer system which can implement the method of the present invention
  • Figure 2 illustrates a display window of a graphical user interface which is used to display the terms of a controlled vocabulary
  • Figure 3 illustrates a search pane portion of the display window of Figure 2.
  • FIG. 1 a block diagram of a general purpose computer system 110 which can be used to implement the method of the present invention is illustrated.
  • Figure 1 shows a general purpose computer system 110 for use in practicing the present invention.
  • computer system 110 includes a central processing unit (CPU) 111, a readonly memory (ROM) 112, a random access memory (RAM) 113, expansion RAM 145, input/output (I/O) circuitry 115, a display assembly 116, an input device 117, and an expansion bus 120.
  • the computer system 110 may also optionally include a mass storage unit 119 such as a disk drive unit or nonvolatile memory such as flash memory and a real-time clock 121. Some type of mass storage 119 generally is considered desirable.
  • mass storage 119 can be eliminated by providing a sufficient amount of RAM 113 and expansion RAM 114 to store user application programs and data.
  • volatile RAMs 113 and 114 can optionally be provided with a backup battery to prevent the loss of data even when computer system 110 is turned off.
  • it is generally desirable to have some type of long term mass storage 119 such as a commercially available hard disk drive, nonvolatile memory such as flash memory, battery backed RAM, PC-data cards, or the like.
  • the thesaurus data which is stored in the present invention will be generally be found on mass storage device 119.
  • CPU 111 In operation, information is input into the computer system 110 by typing on a keyboard, manipulating a mouse or trackball, or "writing" on a tablet or on position-sensing screen of display assembly 116.
  • CPU 111 then processes the data under control of an operating system and an application program, such as a program to perform steps of the inventive method described above, stored in ROM 112 and/or RAM 113.
  • CPU 111 then typically produces data which is output to the display assembly 116 to produce appropriate images on its screen.
  • Suitable computers for use in implementing the present invention are well known in the art and may be obtained from various vendors .
  • the preferred embodiment of the present invention is intended to be implemented on a personal computer system or web server.
  • Suitable computers include mainframe computers, multiprocessor computers and workstations.
  • the program of the present invention will be stored on mass storage device 119 until a user of the computer system 111 initiates its operation. Portions of the program may then be transferred to RAM 113 while the program executes.
  • the program of the present invention may reside in RAM 113 or ROM 112.
  • a display window 150 of a GUI which contains the elements of the controlled vocabulary.
  • the sample controlled vocabulary illustrated in Figure 2 relates to the general field of mythology. It will be apparent to those of skill in the art that this example is given for illustrative purposes only, and that a controlled vocabulary for any conceivable type of subject can be used with equal effectiveness.
  • the controlled vocabulary elements 151, 152, 153, 154, etc. are displayed in display pane 160. As shown in Figure 2, the terms are arranged in a hierarchical format.
  • Display pane 170 displays the terms of the controlled vocabulary which are related to the particular term of interest, as will be described more fully below. The relationship of yet other, additional, terms to the selected term is also shown.
  • the controlled vocabulary terms are not limited to being displayed in the hierarchical format. In an alternative embodiment, the terms are organized alphabetically. Other arrangements can be used with equal effectiveness, such as string length or chronologically (e.g., by date of creation).
  • Major Gods Another term in the vocabulary is “Major Gods” 152. It is organized as a narrower term of “Mythology” 151 and is therefore shown as being indented in the hierarchical tree appearing in display pane 160. Further indented beneath the term “Major Gods” are a number of terms representing different, specific, gods including the term “Ares” 154.
  • the user of the present invention will select a term of interest which is to be searched in a data repository (such as the Internet or a proprietary database) .
  • a data repository such as the Internet or a proprietary database
  • the user selects the term of interest by navigating the hierarchy using standard tools such as cursor keys or a pointing device.
  • a Boolean keyword search can also be used. In the example of Figure 2, the term “Ares" 154 has been selected and is highlighted.
  • the computer system 110 will then retrieve the data file for the selected term, and display the detailed information for that particular term in display pane 170.
  • a method of retrieving controlled vocabulary data in the form of thesaurus data which is used in the present invention is described in co-pending patent application serial number , assigned to the assignee of the present invention.
  • the user can therefore see the descriptor to be searched in its hierarchical context, and also view the descriptor's detail when moving from one descriptor to another. As a result, the user always knows exactly what is being searched. There is no guesswork and there is no ambiguity.
  • search pane 180 portion of the display window 150.
  • Figure 3 A more detailed view of the search pane 180 is illustrated in Figure 3.
  • the web search pane 180 is illustrated according to a preferred embodiment of the present invention.
  • a Website drop down list 181 in which the available search engines are listed.
  • the search engine "GOOGLE” has been selected.
  • Other search engines can be used with equal effectiveness. Examples include Yahoo, Alta Vista, Goto or DogPile.
  • the user can also add any desired commercial search engine or custom Internet searching tool desired.
  • a Language drop down list 182 is also provided to permit searching in a specific language. In the present example, however, the default setting is "All Languages”. Additional boxes, which can add (AND) additional features such as Broader Term 183 and/or subject Category 184, when checked, can improve the precision of the search.
  • the search results could be retrieved from the search engine and displayed on a pane, not unlike the pane of Figure 2, including the hyperlinks that will enable direct access to each of the results. If a search were to be conducted using only the word "Ares” and the selected engine, one would experience the conventional state of the art search. In an experiment utilizing the GOOGLE search engine, some 636,000 "hits" were noted with the search term "Ares", clearly an unsatisfactory result. The present invention can refine the above search by ANDing the broader term of "Ares" to the search query. A search using GOOGLE will now return 325 pages, most of which are relevant.
  • the system generates a query for the search engine by utilizing the selected terms and any related terms indicated in the search pane to construct a URL for the Internet search engine .
  • the present invention can also be used to broaden a search which does not return a large number of hits.
  • controlled vocabularies typically include synonyms for each term in the vocabulary.
  • a conventional search on the term "Ares” yielded no documents.
  • the addition of the synonym (UF or ALT) "Mars” produced 39 relevant pages. Accordingly, a system and method of using controlled vocabulary data to improve a database search has been described. It is to be understood that the foregoing description has been made with respect to specific embodiments thereof for illustrative purposes only. The overall scope of the present invention is limited only by the following claims.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

L'invention concerne un procédé destiné à générer une requête de recherche pour un dépôt de données comprenant les étapes consistant à invoquer une commande sur une interface utilisateur graphique en vue d'activer un programme d'affichage de vocabulaire sélectionné contenant un vocabulaire sélectionné, à sélectionner au moins un terme d'intérêt dans ce vocabulaire sélectionné, à extraire les termes supplémentaires liés aux termes d'intérêt du vocabulaire sélectionné au moyen d'un élément de filtre choisi par un utilisateur, et à formuler une demande de recherche en combinant le terme sélectionné et les termes associés, en fonction des préférences du chercheur.
PCT/US2003/007461 2003-03-12 2003-03-12 Systeme et procede destines a la recherche interne au moyen d'un vocabulaire selectionne WO2004083999A2 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US36389503P 2003-03-12 2003-03-12
US60/363,895 2003-03-12

Publications (2)

Publication Number Publication Date
WO2004083999A2 true WO2004083999A2 (fr) 2004-09-30
WO2004083999A3 WO2004083999A3 (fr) 2009-06-18

Family

ID=33029620

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2003/007461 WO2004083999A2 (fr) 2003-03-12 2003-03-12 Systeme et procede destines a la recherche interne au moyen d'un vocabulaire selectionne

Country Status (1)

Country Link
WO (1) WO2004083999A2 (fr)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6385602B1 (en) * 1998-11-03 2002-05-07 E-Centives, Inc. Presentation of search results using dynamic categorization
US6564213B1 (en) * 2000-04-18 2003-05-13 Amazon.Com, Inc. Search query autocompletion

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6385602B1 (en) * 1998-11-03 2002-05-07 E-Centives, Inc. Presentation of search results using dynamic categorization
US6564213B1 (en) * 2000-04-18 2003-05-13 Amazon.Com, Inc. Search query autocompletion

Also Published As

Publication number Publication date
WO2004083999A3 (fr) 2009-06-18

Similar Documents

Publication Publication Date Title
US7111237B2 (en) Blinking annotation callouts highlighting cross language search results
US7958153B2 (en) Systems and methods for employing an orthogonal corpus for document indexing
EP2546766B1 (fr) Case de recherche dynamique de navigateur web
US6101503A (en) Active markup--a system and method for navigating through text collections
US8229730B2 (en) Indexing role hierarchies for words in a search index
JP3282937B2 (ja) 情報検索方法及びシステム
US7668887B2 (en) Method, system and software product for locating documents of interest
US20060122997A1 (en) System and method for text searching using weighted keywords
US8886642B2 (en) Method and system for unified searching and incremental searching across and within multiple documents
US20030225756A1 (en) System and method for internet search using controlled vocabulary data
US20120265763A1 (en) System and method for dynamically configuring content-driven relationships among data elements
EP1986113A2 (fr) Procédé de récupération des éléments d'information
US8612431B2 (en) Multi-part record searches
WO2004083999A2 (fr) Systeme et procede destines a la recherche interne au moyen d'un vocabulaire selectionne
EP2181403B1 (fr) Indexation d'arborescence des fonctions pour des mots dans un index de recherche
WO2000062198A2 (fr) Systemes et procedes d'utilisation d'un corpus orthogonal servant a indexer des documents
Zhou et al. CMedPort: Intelligent searching for Chinese medical information
MacDougall Signposts on the information superhighway: indexes and access
Carson et al. Acrobat for AEC Knowledge Management
this Chapter Acrobat for AEC Knowledge Management
Smith et al. Enhancing end-user searching on HealthInsite
Cooper et al. Query-Free Information Retrieval: Active Markup and Summarization
Maness Digital Commons@ DU
Romanik Next Generation Information Retrieval
JP2003216644A (ja) 情報検索装置およびコンピュータ読み取り可能な記憶媒体

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): CA CN JP MX RU

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR

121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase in:

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP