WO1999021110A1 - Gestion informatisee de thesaurus - Google Patents

Gestion informatisee de thesaurus Download PDF

Info

Publication number
WO1999021110A1
WO1999021110A1 PCT/US1998/022215 US9822215W WO9921110A1 WO 1999021110 A1 WO1999021110 A1 WO 1999021110A1 US 9822215 W US9822215 W US 9822215W WO 9921110 A1 WO9921110 A1 WO 9921110A1
Authority
WO
WIPO (PCT)
Prior art keywords
thesaurus
term
terms
entry
subsets
Prior art date
Application number
PCT/US1998/022215
Other languages
English (en)
Inventor
Samir Ibrahim Abed
Coyla Bell Barry
Paul Nathan Bennett
David Gregory Gadbois
Keith Michael Goolsbey
Alberta Sprott Mckay
Kenneth S. Murray
Karen Elizabeth Pittman
Nicholas Paul Siegel
Reubin Clifton Thompson, Jr.
Original Assignee
Glaxo Group Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Glaxo Group Ltd. filed Critical Glaxo Group Ltd.
Priority to JP2000517360A priority Critical patent/JP2001521225A/ja
Priority to EP98953805A priority patent/EP1023679A1/fr
Priority to AU11081/99A priority patent/AU1108199A/en
Publication of WO1999021110A1 publication Critical patent/WO1999021110A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/374Thesaurus

Definitions

  • This invention relates to computerized information management and, more particularly, to a system and method of managing multiple thesaurus subsets in an integrated manner.
  • a thesaurus is a collection of information about a particular field.
  • “thesaurus” is understood to mean an organization of words according to concepts they convey, providing synonyms, hierarchical relationships, and the like.
  • the IBM® Thesaurus Administrator/2 and Thesaurus End User System Toolkit/2 provide a computerized system for acquiring, creating, using, and maintaining terminological data thesauri. Products such as this commonly maintain hierarchical relationships among terms, such as broader term, narrower term, part, and instance relationships. These relationships can often be helpful to users.
  • Such products inform the user that "house” is a narrower term for "building,” or that a "lens” is a part of a "camera.”
  • Other such products include Lexico/2TM from Project Management, Inc., Bethesda, MD; MultiTes from MultiSystems, Miami, FL; TCS [Thesaurus Construction System] from Liu-Palmer, Los Angeles, CA; BiblioTech® PRO from Comstow Information Services, Inc., Harvard, MA; and numerous others.
  • One problem not adequately addressed by any of the known systems is how to efficiently manage multiple thesaurus subsets in an integrated manner, i.e., so that subset(s) from which information is obtained is essentially transparent to the user.
  • a system for integrated access to information from a number of separate thesaurus subsets includes an input subsystem for accepting a thesaurus request; a processing subsystem that collects, from the subsets, entries that correspond to the request, treating the subsets as a single integrated thesaurus; and an output subsystem for displaying the retrieved entries.
  • the system accepts as input information about requested changes and modifications pertaining to a thesaurus request, and implements those changes in a number of thesaurus subsets.
  • a request for a change is implemented as a single, integrated request, and the performance of corresponding changes to multiple thesaurus subsets appears to the user as a single, integrated action.
  • the processing subsystem integrates the thesaurus subsets as the single integrated thesaurus using concepts to establish relationships among terms in the thesaurus subsets.
  • the processing subsystem addresses the thesaurus subsystems so as to provide integration thereof as the single integrated thesaurus. In one embodiment, the integration is performed without actually merging the thesaurus subsets. Alternatively, the processing subsystem merges the thesaurus subsets to form the single integrated thesaurus.
  • the processing system may also be configured to either designate, or to receive as input a designation, of the thesaurus subsets that are active thesauri, such that the processing system will only collect the select entries from the active thesauri.
  • the output subsystem is configured to display the select entries together, irrespective of the subset from which each of the select entries is retrieved.
  • the output subsystem may be configured to display, along with the select entries, an indication of the subset from which each of the select entries is retrieved.
  • the processing subsystem further comprises a correlator configured to establish the equality of meaning of terms or concepts across the thesaurus subsets.
  • the correlator determines the meaning of term with reference to the other terms to which the term being correlated is related.
  • an update processing system is configured to apply an integrity constraint that serves to define a constraint on the relations between thesaurus terms, wherein the input subsystem is operatively coupled to the update processing system so as to accept the integrity constraint.
  • the integrity constraints include at least one of the set of: specifying whether a relationship is one-to-one; specifying whether the relationship is one-to- many; specifying whether the relationship is many-to-one; specifying whether a relationship is many-to-many; specifying whether a relationship is transitive; specifying whether the relationship is symmetric; specifying whether the relationship is reflexive; specifying whether the relationship is irreflexive; specifying whether the relationship is among items each of which has corresponding preferred terms.
  • the subsystem is configured to detect violations of integrity constraints in thesaurus information.
  • the output subsystem is optionally configured to perform at least one of the set of: notifying the user of the violation; executing a correction of the violation; taking responsive action to the violation.
  • the processing subsystem is configured to define a preferred term and a non-preferred term for a concept, in response to signaling from the input subsystem; the processing subsystem being further configured to swap the preferred term and the non-preferred term in response to further signaling from the input subsystem.
  • the processing system may swapping the preferred term and the non-preferred term across one or across a plurality of the thesaurus subsets.
  • the processing system may also be configured to validate the changes across one or more thesaurus subsets.
  • Figure 1 is a block diagram showing a thesaurus manager in accordance with the present invention.
  • Figure 2 is a block diagram illustrating the knowledge base shown in Figure 1.
  • Figure 3 is a flow diagram illustrating processing in response to a user query, in accordance with the present invention.
  • Figure 4 is a flow diagram illustrating integrated maintenance of multiple thesaurus subsets, in accordance with the present invention.
  • Figure 5 is a flow diagram illustrating correlation processing, in accordance with the present invention.
  • Figure 6 is a flow diagram showing details of processing performed to identify correlation candidates. DESCRIPTION OF A PREFERRED EMBODIMENT
  • Thesaurus manager 100 is a preferred embodiment of a computer- implemented system for maintaining an enterprise-wide set of vocabularies for a large organization.
  • each such operational group may maintain a thesaurus, or multiple thesauri, for providing the personnel of such group with a reference for the standard vocabulary used within the group.
  • thesauri may be maintained for each of the following purposes: Research and Development, Product Literature, Marketing, Litigation, Manufacturing, and Regulatory Compliance.
  • the organization may acquire external thesauri developed by others; these need to be integrated in the same manner as diverse internal thesaurus systems.
  • the overall thesaurus of the organization may be considered a superset of these specialized thesauri, and it will be expected that many of the specialized thesauri will include similar concepts under either similar or different terms, and similar terms for both similar and different concepts.
  • Thesaurus manager 100 operates to provide maintenance of terms across all such specialized thesauri in a single step, to permit thesauri to be viewed in an integrated manner that permits ready differentiation among inconsistent or otherwise different thesauri, to allow the selection of preferred terms to be used for particular concepts throughout the organization, and to provide a single tool and source of authority for managing the various subset thesauri as a single integrated thesaurus.
  • Thesaurus Manager allows different terms to be correlated to a particular concept, the Thesaurus Manager allows for the use of different terms throughout the organization, with the Thesaurus Manager providing the link to the same underlying concept. This implementation allows the organization to speak consistently about the same things without enforcing a single standard vocabulary across the organization.
  • the various subsystems illustrated in figures 1 and 2 are implemented by software controlling one or more general purpose computers, as described herein.
  • software is stored in a conventional data storage device (e.g., disk drive subsystem) and is conventionally transferred to random access memory of the computer(s) as needed for execution.
  • thesaurus manager 100 consists of system code 110 and a knowledge base 120.
  • the knowledge base 120 serves as the store for thesaurus information and an underlying concept system, while the system code 110 handles all updating and retrieval of thesaurus information, all maintenance and utility functions, and all interaction with external entities.
  • Thesaurus data files 170 may be loaded into thesaurus manager 100, or may be created by outputting a thesaurus subset.
  • External programs e.g., 160
  • thesaurus manager 100 maintains a variety of external logs 180, which record user requests and actions.
  • thesaurus manager 100 supports a rich, expressive relation set and allows users to customize that set to their own needs.
  • thesaurus manager 100 maintains terms as lexical entries associated with underlying concepts defined or denoted in a logical conceptual language and stored in knowledge base 120, and it is these underlying concepts that relationships are established between.
  • the organization using thesaurus manager 100 is a pharmaceuticals company, there may be a concept for AGENTS-WHICH-RELIENE-PAIN and a concept for ORALLY- ADMINISTERED- AGENTS.
  • Corresponding lexical entries might be "analgesic” and “tablet.” Relationships connecting "analgesic” and “tablet” in this example are made in thesaurus manager 100 not between those terms per se, but rather between the underlying concepts.
  • a benefit to such an approach is that it permits modeling of more complex types of relationships among concepts than would be possible if term relationships were relied upon. For instance, in addition to the traditional thesaurus relationships of "broader term,” “narrower term,” “related term,” and “scope of term,” many other conceptual relationships are available.
  • the preferred embodiment provides an interface that equates the "Preferred Term" to the underlying concept, so that within a thesaurus subset one can use the preferred term string to uniquely identify a concept.
  • the "Use For" terms are present in thesaurus manager 100 as simple strings associated with a Preferred Term/Concept.
  • Thesaurus manager 100 provides the ability to introduce new lexical relationships in addition to "Use For".
  • Lexical relationships record strings that can be used within thesaurus manager 100 to search for a concept. Such search strings are either Preferred Terms, or Alternate Terms (strings recorded using Use For or any of the other lexical relations added by users).
  • user 150 might create and use these lexical relationships in addition to "Use For”: “Former Term,” to record out-of-use terms for a concept; “French Term,” to record French language terms for a concept; and “Slang Term,” to record informally used terms for a concept.
  • Thesaurus information maintained within thesaurus manager knowledge base 121 is divided up into a number of user-defined thesaurus subsets. Several subsets may contain relationships among the same set of concepts, but each subset may use a different relationship to express the link between concepts.
  • a corporate thesaurus subset of thesaurus manager knowledge base 121 may link "Allopurinol” with “Zyloprim” using the relationship [GenericName-Trademark], while a product literature subset links “Allopurinol” with “Zyloprim” using the relationship [UseFor], and while a research and development subset links “Allopurinol” with "Zyloprim” using the relationship [Synonym].
  • SYSTEM CODE 110 COMPONENTS The primary components of system code 110 are the IO layer 111, retrieval and update modules 112 and 113, reporting subsystem 114, and maintenance subsystem 115.
  • user 150 or external program 160 places a query, or a request for update of thesaurus information, to the thesaurus manager via the IO layer 111, and receives information or confirmation of update via the same module.
  • the retrieval module 112 obtains from thesaurus knowledge base 121 the information that satisfies the query and passes it back to the user or external program via the HTML (Hypertext Markup Language) module or API (Application Programming Interface), respectively.
  • update module 113 verifies the validity of the change and changes the thesaurus information in thesaurus knowledge base 121, passing confirmation back to the user or external program via the HTML or API modules 111.1, 111.2, respectively.
  • Update module 113 verifies the validity of the change and changes the thesaurus information in thesaurus knowledge base 121, passing confirmation back to the user or external program
  • Update module 113 provides these updating functions: (a) loading thesaurus data from thesaurus data files 170 into thesaurus knowledge base 121, (b) editing thesaurus information aheady present in thesaurus knowledge base 121, (c) copying thesaurus information from one thesaurus subset within thesaurus knowledge base 121 to another such subset, (d) creating, renaming or deleting thesaurus subsets, (e) creating, renaming or killing thesaurus relationships, (f) fixing integrity violations of data within a thesaurus subset, and (g) "correlating", or establishing correspondences between, concepts within one thesaurus subset and those in all others and/or those in the generalized knowledge base 122. Loading a Thesaurus
  • Editing operations include adding new terms, renaming existing terms, deleting terms, adding or deleting relationships between terms, and creating, renaming or deleting thesaurus subsets. All of these actions may be performed across one or more Thesaurus Subsets as a single integrated operation. Naturally, when specifying a change to be done in several thesauri, there may be unanticipated, to the user, reasons why the change may not be performed in all subsets. For example, suppose the user asks to rename the term "Canine" to "Dog” in all thesaurus subsets. This operation may fail in one subset because there is no term equivalent to "Canine” to rename. In another subset it might fail because "Dog" already is present in that subset as a preferred term.
  • the Update module evaluates each suggested change separately. If all can be performed, all are performed. If any cannot be performed, the Update module requests optional confirmation of the accepted changes, which can be obtained if the edits were initiated through an interface with a human user. Copying a Thesaurus Subset
  • thesaurus information may be copied from one thesaurus subset to another.
  • a number of parameters control what information is copied: Start Terms, Cutoff Terms, and Cutoff Level. If none of these parameters is provided, the entire content of the source thesaurus is copied to the target thesaurus. If Start Terms are given, copying begins with those terms and proceeds down the "narrower term” hierarchy. If Cutoff Terms are given, no terms more specific than those terms, according to the "narrower term” hierarchy, are copied.
  • Cutoff Terms an integer may be specified as a Cutoff Level, which limits the copy operation to terms no more than that many "narrower term" hierarchy levels away either from the Start Terms, if any were specified, or, if no Start Terms were given, from the top terms of the source thesaurus subset. Operations on Thesaurus Subsets
  • new thesaurus subsets may be created. After creation, such subsets are empty. The names and abbreviations for thesaurus subsets may be changed, so long as the new name or abbreviation is not aheady in use. Finally, an existing thesaurus subset may be deleted. Deleting a subset that contains thesaurus information has the effect of deleting all the contained thesaurus information. Operations on Thesaurus Relations
  • new thesaurus relations may be created. Existing relations may be renamed, so long as the name is not already in use. Finally, thesaurus relations may be killed. If a thesaurus relation is in use in one or more of the thesaurus subsets, killing it has the effect of deleting all thesaurus information represented with that relation. Repairing Integrity Violations
  • Integrity checker 118 is a tool for detecting and repairing thesaurus information that violates the set of active integrity rules. Once a violation has been detected, update module 113 is responsible for repairing the violation. Correlating
  • Correlator 116 is described in detail below. Once a correlation has been discovered and confirmed, update module 113 handles the merging of the two correlated concepts into one. Retrieval module 112
  • Retrieval module 112 provides these retrieval functions: (a) outputting thesaurus information from the Thesaurus Knowledge Base 121 as a new Thesaurus Data File 170, (b) search for a thesaurus concept starting from any term (lexical entry) or subword of a term, (c) retrieval of all, or just particular, thesaurus relationships a term is involved in, (d) retrieval of all thesaurus subsets, and (e) retrieval of all thesaurus relationships. For (a), (b) and (c), the Retrieval module maintains the notion of Active Thesaurus Subsets, which are set by the user or specified in a query; these retrievals obtain information only from the Active subsets. Reporting module 114
  • Reporting module 114 generates and/or retrieves saved reports of user actions, thesaurus and user statistics, and system information, as described in greater detail below.
  • Maintenance module 115
  • Maintenance module 115 provides maintenance functions of transcripting of changes to thesaurus information, saving and loading of backups, and user management.
  • IO Layer 111 The IO Layer 111 handles interaction with users and external programs, and consists of two parts: an HTML (Hypertext Markup Language) module 111.1, which is responsible for interaction with a user 150 via an external World Wide Web browser 140, and an API (Application Programming Interface) 111.2, which is responsible for communication with external programs 160.
  • HTML module 111.1 Hypertext Markup Language
  • API Application Programming Interface
  • the HTML (Hypertext Markup Language) module 111.1 handles all human interaction with thesaurus manager 100. In standard operation, it listens for connecting external Web browser 140, and when a connection is made, the HTML module generates the appropriate HTML page for the purpose of interacting with the user. These automatically generated HTML pages contain buttons, type-in boxes, and menus appropriate to the user's task. When the user clicks on a link or submits a form, the HTML module receives that submission, passes control to the necessary subsystem(s) for processing, then generates an HTML page with the results and dispatches it to the external web browser.
  • HTML Hypertext Markup Language
  • Any number of Users 150 may have sessions open with thesaurus manager 100; the HTML subsystem 111.1 maintains the state of each of these connections, so that a user sees only the results of his or her own interactions.
  • HTML module 111.1 provides a conventional web-browser interface for thesaurus manager 100.
  • HTML module 111.1 presents User 150 with a number of choices in a graphical user interface. Since the HTML module allows the user to access and control the operation of system code 110, interface features of HTML module 111.1 can be categorized as accessing each of the four main internal subsystems of System Code 110: Retrieval, Update, Reporting and Maintenance.
  • HTML module 111.1 provides standard hyperlink displays so that the user can see underlined and in color information that provides active links to related topics.
  • the Full Term Display for ZANTAC might provide a collection of synonyms, broader terms, narrower terms, and otherwise conceptually related terms, each underlined to allow the user to click on such term to obtain further information specific to that term.
  • the subset of thesauri containing ZANTAC-related concepts is also displayed in an underlined fashion, and the user can click on each named thesaurus subset to get more information about that subset (e.g., which organizational unit promulgated it and what purpose it is intended to serve).
  • the relation symbols presented by HTML module 111.1 such as "SN" for scope note, "BT” for broader term, and "NT” for narrower term, are underlined links that the user can click to obtain more information about those relation symbols if needed.
  • HTML module 111.1 provides small icons that appear next to entries permitting immediate access to different types of displays pertaining to such entries.
  • one of the underlined narrower term entries for ZANTAC might be "zantac chewdose tablet;" HTML interface 130 places next to that entry small icons for alphabetical and full record displays of that entry.
  • the user 150 is able to immediately move from the full-term record for ZANTAC to an alphabetical display of a narrower term, if desired.
  • HTML module 111.1 present thesaurus information as a series of automatically generated HTML pages.
  • a typein box where the user may specify a term to examine, and one of four modes for viewing that term: Hierarchical Display, Full Term Display, Alphabetical Display, or Show Siblings Display.
  • each of the main term displays (Hierarchical, Full Term, Siblings, and
  • Alphabetical displays shows thesaurus information about the selected term with hyperlinks to related information from all of the user-chosen "Active Thesauri.”
  • Thesaurus information from other thesaurus subsets that are not active is not shown.
  • Each of the four kinds of display shows a union of all Active Thesauri in a single, combined display, as though it were a single (monolithic) thesaurus.
  • these displays show information from potentially many thesauri at one time, they use a special convention for displaying terms which mean the same thing, i.e., use the same underlying concept, but which have different preferred term strings in some of the Active Thesauri.
  • a consumer information thesaurus CI might use the preferred term "Zantac” for the same concept referred to, in a research and development thesaurus RD, as "ranitidine hydrochloride.” If the thesauri CI and RD were both active, this term would be displayed as follows:
  • the Hierarchical Display shows the position of the selected term in a hierarchy of more general and more specific terms, according to some hierarchical thesaurus relationship.
  • the "broader term" relation is used as the relation determining the hierarchy, but some other hierarchical relation may be selected for this.
  • Each more general and more specific term is a hyperlink; clicking on that term brings up a hierarchical display focusing on that term.
  • the Hierarchy Display optionally shows other relationships the selected term is involved in. The relationships themselves are links to pages of information about the relationship, whereas the related terms are each links to a Hierarchy Display focused on that term.
  • the Hierarchy Display shows the "top terms" or major thesaurus partitions the selected term is present in.
  • Hierarchy Display page is annotated with the thesaurus subsets in which it appears. These annotations are hyperlinks; clicking on one displays information about the associated term, but only from that subset. Finally, like the other main term displays, the Hierarchy Display provides small icons next to each displayed term permitting one-click access to the Full Term and Alphabetical displays of that term.
  • the Full Term Display collects all thesaurus facts about a single term. It shows each relationship the term is involved in, and for each such relationship, a list of the other terms or strings related to the displayed term by that relationship.
  • the relationships are hyperlinks to a page of information about the relation, while the related terms are hyperlinks to a Hierarchical Display about the related term.
  • small icons that provide one-click access to the Full Term and Alphabetical Displays for that term accompany each related term.
  • the Full Term Display operates in two modes: Thesauri Separate and Thesauri Merged.
  • Thesauri Separate mode there is a separate section for each thesaurus Subset the term is present in. Information about the term in that Subset appears in that section.
  • Thesauri Merged mode (the default), each term that is related to the displayed term is annotated with a list of thesaurus subset symbols, indicating which Thesaurus Subsets the relationship is present in.
  • These annotations are links; clicking on one displays information about the displayed term, ONLY from that subset.
  • the Full Term Display also has links to some of the update functionality of HTML module 111.1: "Full Term Edit,” “Correlate Concept,” “Uncorrelate,” “Convert Preferred Terms to Use Fors,” “Import Use Fors,” and “Swap Preferred Term and Use For.”
  • the Alphabetical Display shows a list of terms, alphabetized, in KWIC (Key Word In Context) format.
  • the user-selected term or string is positioned in the middle, with several terms alphabetically before, and several terms alphabetically after it.
  • Terms in a thesaurus may consist of several words, and KWIC alphabetizes on each subword of the term.
  • a user may choose to display "analgesic,” with one Active Thesaurus that contains “analgesic agent” and "oral analgesic.”
  • the word “analgesic” by itself is not a term in the Active Thesauri, so the Alphabetical Display will center around the phrase “analgesic would appear here.” Following this line will be a line for "oral analgesic”, then a line for "analgesic agent.”
  • the lines previous to "analgesic would appear here” would be occupied by terms prior to "analgesic” in the alphabet, such as "gastrointestinal agent” (alphabetized by its second word) or "aluminum hydroxide” (alphabetized by its first word).
  • each term displayed on the Alphabetical Display is a hyperlink to the Hierarchy Display view of the term.
  • a small icon that provides a hyperlink to the Full Term view of the term accompanies each term.
  • the Alphabetical Display has a button that toggles the display of alternate terms. When turned off, only Preferred Terms are indexed. When turned on, all of the user-selected Alternate Terms are indexed. As described above], Alternate Terms are strings recorded using Use For or any of the user-defined Lexical Relationships.
  • the Show Siblings Display shows the selected term with all of its sibling terms in all of the Active Thesauri.
  • Sibling terms are grouped according to the parent term ("broader term") they share with the displayed term.
  • Each sibling term is a hyperlink to a Hierarchical Display page about that sibling term.
  • small icons accompany each term that permit one-click access to Full Term and Alphabetical displays about the sibling term.
  • each term is followed by a list of thesaurus annotation which are also hyperlinks. The annotations indicate which thesauri the term is a sibling in, and clicking on one of these annotations brings up a page of Full Term information about the sibling term, but only in the thesaurus subset represented by the clicked annotation.
  • This display collects, on one page, information that would otherwise be at least two clicks away, and sometimes more.
  • the term "man” might have two different broader terms: “male” and “human.” Sibling terms of "man” according to the parent term “human” might include “ woman” and “child.” Sibling terms of "man” according to the parent term “male” might include “bull,” “stallion,” etc. So in a thesaurus system which is not necessarily a strict tree, but allows terms to have more than one parent, it can often be quite a lot of work to locate all the sibling terms if one does not use the Siblings Display. Update features of HTML module 111.1
  • HTML module 111.1 The update features of HTML module 111.1 are provided as a number of automatically generated HTML pages containing conventional HTML forms that can be filled out and submitted by the user to perform changes to thesaurus information. For most update features, the system has the ability to update multiple thesaurus subsets at one time.
  • Each of these causes an HTML page to load, containing the appropriate blanks, radio buttons or checkboxes to allow the user to specify the change to make. Submitting the form causes processing of the change.
  • the Quick Edit page supports adding, deleting or editing thesaurus information of already-present terms. It requests a term to edit, a set of thesauri in which to perform the change, a thesaurus relation, and the type of operation (add, delete or edit).
  • Quick Edit uses the "Editing Thesaurus Information" subsystem of update module 113 to verify and perform the changes.
  • Add Term supports adding a term to one or more thesaurus subsets not yet containing the term. It requests a Preferred Term string, a set of thesauri in which to add the term, and (optionally) an existing term to serve as the "broader term” for the new term. Add Term uses the "Editing Thesaurus Information" subsystem of update module 113 to verify and perform the changes.
  • Delete Term supports removal of a term from one or more thesaurus subsets. It can either delete the term and all its narrower terms, or merely “splice out" the term, connecting the former term's narrower terms up to its prior broader terms, depending on user choice. It requests the term to delete and one or more thesauri from which to delete it. Delete Term uses the "Editing Thesaurus Information" subsystem of update module 113 to verify and perform the changes.
  • Rename Term supports changing the Preferred Term string for a concept in one or more thesauri.
  • Rename Term uses the "Editing Thesaurus Information" subsystem of update module 113 to verify and perform the changes.
  • Create Thesaurus supports the introduction of a new, empty thesaurus subset. It requests a name and an abbreviation for the new thesaurus.
  • Create Thesaurus uses the "Editing Thesaurus Information" subsystem of update module 113 to verify and perform the change.
  • Delete Thesaurus supports the removal of an entire thesaurus subset, along with its contents, from thesaurus manager 100. It allows the user to pick the thesaurus to delete from a menu of available thesauri, and uses the "Editing Thesaurus Information" subsystem of update module 113 to verify and perform the change.
  • Rename Thesaurus supports changing the name and/or abbreviation for a thesaurus subset. It requests either a new name or a new abbreviation or both, and uses the "Editing Thesaurus Information" subsystem of update module 113 to verify and perform the changes.
  • Copy Thesaurus supports copying thesaurus information from one thesaurus subset to another. It requests a source thesaurus and a target thesaurus (from pick menus).
  • Copy Thesaurus uses the "Copying A Thesaurus Subset” subsystem of update module 113 to verify and perform the changes.
  • Relation supports the addition of a new relation to the set available for use to express thesaurus information. Lexical, Hierarchical, Documentation and Custom relations may be created.
  • Relation name name of inverse relation, whether the relation is one-to-one, one- to-many, many-to-one, or many-to-many, whether the relation is reflexive, whether the relation is irreflexive, whether the relation is symmetric, whether the relation is transitive, whether it relates two Preferred Terms (i.e., concepts) or relates a Preferred Term to a string.
  • Relation uses the "Operations on Thesaurus Relations" subsystem of update module 113 to verify and perform the changes.
  • Full Term Edit supports the addition, deletion and editing of relationships among existing terms, and among existing terms and strings, across all subsets, in one action.
  • This facility uses the "Editing Thesaurus Information” subsystem of update module 113 to verify and perform the changes.
  • This facility provides increased capability for managing the granularity of terms across thesaurus subsets. Import Use Fors allows the user to quickly import Use For terms from other thesauri. It uses the "Editing Thesaurus Information” subsystem of update module 113 to verify and perform the changes.
  • Swap Preferred Term with Use allows the user to easily choose the terms to swap. It uses the "Editing Thesaurus Information" subsystem of update module 113 to verify and perform the changes.
  • Integrity Check Thesaurus calls an Integrity Checker subsystem of update module 113 on each fact in the selected thesaurus. It runs until a problem is found, until a time limit is reached, or until the entire thesaurus is checked. If a problem is found it is automatically fixed, by an Integrity Checker subsystem of Update Module 113, if possible, otherwise the user is presented with a set of repair options.
  • the "Integrity Checker" 118 may also be called on a single term. In this case, each thesaurus fact concerning the term, in all thesaurus subsets in which it appears, is checked. Problems that can be fixed automatically are fixed; those with several repair options are presented to the user. In each case, the repair is performed by an Integrity Checker subsystem of update module 113.
  • Correlate supports the establishment of correspondences between all the terms of a thesaurus subset (that can be matched) with terms in other thesaurus subsets and with concepts in the generalized knowledge base 122. It uses correlator 116, a subsystem of system code 110, which is described below.
  • Correlated Concept allows correlator 116 to be called on a single Preferred Term (i.e., concept), and correspondences to be set up between that term and terms in other thesauri or a concept in the generalized knowledge base 122.
  • Preferred Term i.e., concept
  • Uncorrelate breaks the concept apart into separate concepts. This is useful when incorrect correlations have accidentally been performed.
  • Load Thesaurus supports loading thesaurus information from thesaurus data file 170 and into a Thesaurus Subset. It uses the "Loading a Thesaurus Subset" subsystem of update module 113 to perform the load.
  • the Reporting features of HTML module 111.1 are provided as a set of automatically generated HTML pages, accessible from the "Utilities" page, which present information such as thesaurus statistics, user statistics, and operations reports.
  • Thesaurus statistics include information concerning the number of preferred terms and other lexical entries in each thesaurus, the number of concepts that underlie terms in each thesaurus, the number of facts in each thesaurus, and the creator and creation time of each thesaurus.
  • Thesaurus statistics also include the number of concepts, terms and facts in the integrated thesaurus as a whole.
  • User statistics include the following information for each month: For each user, the number of logins during the month, and number of pages requested, per thesaurus and total. Operations reports show a log of changes made to the thesaurus information; this may be sorted in various ways (by user, by date, by time, by thesaurus). Maintenance features of HTML module 111.1
  • HTML module 111.1 The Maintenance features of HTML module 111.1 are provided as a set of automatically generated HTML pages. In a preferred embodiment, not all aspects of maintenance module 115 are under user control, but in the preferred embodiment, those that are can be accessed via links from the "Utilities" page. Maintenance features under administrator control include "Quick State Snapshot,” “Backup Thesaurus Information to File,” “Manage Users,” and “System Information.” API module 111.2
  • API module 111.2 provides an Application Programming Interface that permits an external program to submit queries and requests for update to thesaurus manager 100.
  • API module 111.2 performs retrieval only, but in another embodiment, it performs update in addition to retrieval.
  • the API module 111.2 is a server that uses a stream connection, such as TCP, and SMTP-like commands and responses.
  • the client program sends commands to the server requesting information or update, and the server responds with the information or confirmation of change, or with an indication of why the command could not be performed.
  • retrieval functionality of API module 111.2 includes the ability to (a) test if a string is a known term in a thesaurus subset, (b) retrieve all thesaurus subsets, (c) retrieve all relations, (d) retrieve the thesaurus subsets which contain thesaurus information about a term, (e) retrieve the Narrower Terms, Use Fors, all recursive Narrower Terms, all Use Fors of the term and of all its Narrower Terms, all equivalent terms, or all terms related to a term by a given relation, within some set of thesaurus subsets, and (f) retrieve all the Top Terms (major thesaurus partitions) of a given set of thesaurus subsets.
  • an application program that uses thesaurus information to expand a keyword query for the purpose of retrieving documents might connect to the API module 111.2 to retrieve all Use Fors of all narrower terms of the original, user-provided keywords, and incorporate this information into the search to increase the number of relevant retrievals.
  • update functionality of API module 111.2 includes the ability to (a) create, rename or delete a thesaurus subset, (b) add, rename or delete a term to, in, or from a single thesaurus subset, or, optionally, a set of thesaurus subsets, all in one command, (c) edit thesaurus information of existing terms in (optionally, multiple) existing thesaurus subsets.
  • an application program that automatically extends a thesaurus subset by accessing online text might connect to the API module 111.2 to test whether a term is known, and add it to the thesaurus if it is not known.
  • Logging module 117 is responsible for maintaining logs 180. These logs record two types of information: (a) changes to thesaurus information done by the update module 113, and (b) Thesaurus Subsets accessed by user 150 via the web interface implemented by HTML module 111.1. Reporting module 114 reads these logs. Integrity Checker 118
  • Integrity Checker 118 is a tool for detecting and repairing thesaurus information that violates the set of active integrity rules. The following rules are used in a preferred embodiment:
  • a “top” term must have no broader terms (BTs).
  • the program will identify purported top concepts that have broader terms.
  • a word on the stoplist cannot be either a preferred term or an alternate, "Use For” term.
  • the program will identify concepts which have a stoplist word as a preferred term or a "Use For” term.
  • BT and RT may not both relate a term to the same term.
  • the program will identify pairs of concepts related by both Broader Term and Related Term. • A term and its BT cannot both have RT relations to the same term. (No RT-BT-RT triangles.)
  • the program will identify pairs of concepts that are related by Broader Term/Narrower Term and are both related by Related Term to a common third term.
  • all of these integrity rules are active, but in an alternate embodiment users choose which rules to apply.
  • Integrity checker 118 operates off the definitions of the thesaurus relations used to express the thesaurus information.
  • the definition of that new relation directs which integrity rules will be applied to it by integrity checker 118.
  • Knowledge base 120 is, in a preferred embodiment, implemented using the Cyc ® Knowledge Base available from Cycorp, Inc., of Austin, Texas.
  • This knowledge base includes a generalized knowledge base 122 that consists of approximately one half-million hand entered formulas (or "rules") that are part of human consensus reality knowledge. When used as part of the preferred embodiment, it also includes thesaurus knowledge base 121.
  • Knowledge Base 120 features Formal language
  • the formulas of knowledge base 120 are encoded in a formal language, CycL.
  • Concepts in a formal language are represented by symbols, and these symbols are combined in meaningful ways to form logical Formulas.
  • Formulas are like sentences - each states some fact about the word. For example, from the concepts of TREE, OUTDOOR-REGION, AND PROPERTY-OF-BEING-LOCATED-IN-PLACE, we can form a formula which says “trees are located outdoors.” From the concepts of TO-MEAN-SOMETHING, STRING-OF- CHARACTERS, and AUTOMOBILE, we can form a formula which says "one meaning for the string, 'car,' is the concept automobile.” Contexts
  • knowledge base 120 is divided into a large number of Contexts, each of which is essentially a bundle of formulas that share a common set of assumptions and which are consistent with each other.
  • a context mechanism allows the knowledge base 120 to independently maintain formulas that are prima facie contradictory, by having them reside in different contexts. For example, in a context about the United Kingdom, there might be a formula which says that driving is done on the left side of the road, whereas in a context that assumes a United States location, there will be a formula stating that driving is done on the right side of the road. Notice that the same concepts
  • knowledge base 120 includes a Lexicon of over 12,000 root English words. These words are related in knowledge base 120 to the Concepts that are the meanings of the words. An English word may have many meanings, and the Lexicon of knowledge base 120 accounts for this. The Lexicon recognizes any form of a word.
  • Lexicon information maintained for the root word “swim” is sufficient to map any of these strings into the concept for SWIMMING: “swim,” “swims,” “swam,” “swimming.” Lexicon information is used by correlator 116. Feature summary
  • knowledge base 120 is implemented using the Cyc ® Knowledge Base
  • any knowledge base may be used to implement knowledge base 120 so long as it uses a formal representation language that forms Formulas by combining Concepts and possesses a rich store of Concepts and knowledge about those concepts.
  • Generalized Knowledge Base 122
  • Generalized knowledge base 122 contains, in a preferred embodiment, a rich store of Concepts and of rules that are part of human consensus reality knowledge.
  • generalized knowledge base 122 contains thousands of concepts, including various kinds of intelligent agents like people, everyday objects from paperclips to aircraft carriers, anatomical concepts, substances from water to wood to pharmaceuticals, units of measure like inch, a plethora of actions from scratching to lecturing to thundershowers to collisions to surgery, and the like.
  • Thesaurus Knowledge Base 121 is a matter of design choice and is not essential to the implementation of thesaurus manager 100.
  • Thesaurus knowledge base 121 contains thesaurus information represented using the Concepts and Formulas that are encoded in the formal language of knowledge base 120.
  • thesaurus information is represented in CycL.
  • thesaurus knowledge base 121 contains one or more thesaurus subsets 211 as defined by users. Each thesaurus subset is represented as a Context of knowledge base 120. The formulas of such a context express the thesaurus information of that thesaurus subset.
  • thesaurus information is expressed as formulas in the formal language of knowledge base 120.
  • thesaurus there exists exactly one knowledge base Concept. For example, consider the Preferred Term "Zantac" within a generalized pharmaceutical thesaurus subset, which is represented there by the underlying concept ZANTAC-THE-PRODUCT.
  • Thesaurus knowledge base 121 contains a number of thesaurus subsets, in this instance Subset A 211, Subset B 212, and Subset C 213. Each of these subsets includes a set of relationships among concepts as illustrated by the links (solid lines, representing relationships) between nodes (dots, representing Concepts) in Figure 2. While some concepts may only be involved in relationships in a single subset, other concepts appear in multiple subsets. The dashed lines between subsets in Figure 2 indicate that the same underlying concept is referred' to in each.
  • thesaurus manager 100 integrates multiple thesaurus subsets by sharing concepts among subsets. A single such concept may thus have different Preferred Terms, different Alternate Terms, even different knowledge expressed about it in different thesaurus subsets, but these differing descriptions do not conflict, since each is partitioned away from the others by treating the thesaurus subsets as contexts.
  • Thesaurus manager 100 also integrates thesaurus subsets with generalized knowledge base 122 by using the same concepts, where possible and appropriate, in each.
  • the general concept DOG might appear in several thesauri as well as in the Generalized KB.
  • it might have "Canis familiaris” as its preferred term, with “dog” as an alternate term.
  • "dog” might be the preferred term, with “doggie” as an alternate term.
  • the concept DOG will be involved in formulas such as “dogs are commonly kept as pets by people,” “dogs like to eat meat,” “young dogs are playful,” and so on.
  • Correlator 116 is a tool that is used to establish the equality of concepts across thesaurus subsets and the generalized knowledge base 122.
  • the Correlator plays this role in several types of processing done by thesaurus manager 100: (a) during a load of a thesaurus, (b) at the time a new term is added, and (c) when a User 150 invokes correlator 116 via the web interface implemented by HTML module 111.1. Correlation at load time
  • a term definition consists of a Preferred Term string together with the relationships that term has to other terms.
  • Correlator 116 is invoked on each Preferred Term string to determine if an existing Thesaurus Subset aheady refers to a Concept by that same Preferred Term string. The matching, already-existing concept will be used for the loaded term. This is performed automatically, without user interaction (unless a "re-use concepts?" parameter is turned off).
  • correlator 116 is called on the Preferred Term string entered by the user to see if a pre-existing concept might match.
  • Concepts appearing in thesaurus Subsets other than those to which the term is being added, and concepts appearing in the generalized knowledge base 122 are considered as candidates for re-use. If candidates are found, the user will be asked via the web interface to confirm or choose among the candidates. If a candidate concept is chosen, it will be used to represent the added term. If no concept is chosen, a fresh concept will be generated to represent the added term.
  • correlator 116 may be invoked via the web interface supported by HTML code module 111.1. Given a thesaurus subset, correlator 116 visits every concept mentioned in the thesaurus information of that subset, and attempts to find concepts not mentioned in that subset, but instead mentioned in generalized knowledge base
  • Candidate concepts are presented to user 150 via HTML code 111.1. If the user allows the correlation, the two concepts are merged into one.
  • correlator 116 finds a set of concepts not currently equivalent, which can be considered as correlation candidates. Correlator 116 judges candidates according to a set of heuristics.
  • the Preferred Term string of the starting concept is one of the Alternate Terms of a matching concept, weakly favor the matching concept as a correlation candidate. • If there is overlap between the Alternate Terms of the starting concept and the Alternate Terms of a matching concept, weakly favor the matching concept as a correlation candidate.
  • FIG 3 there is shown a flow diagram illustrating processing in response to a user query for thesaurus information, in accordance with the present invention.
  • the user 150 while browsing 305 one of the standard browsing pages displayed by External Web Browser 140, clicks on either a) the name of a term, to show a Hierarchy Display page about the term, b) the full Term icon, to show a Full Term Display about the term, or c) the Alpha icon, to show the Alphabetical Index centered on the term. (Note that other actions may accomplish the same result as this click, namely typing in the name of a term into a "type in" box on the standard page header, and choosing one of the main modes in which to view the term.)
  • HTML module 111.1 processes 310 that click and dispatches the request to the Retrieval module 112. This module performs a lookup 315 procedure, retrieving thesaurus information about the chosen term from all active thesaurus subsets, in this case Subsets 211, 212, and 214. Subset 213, also depicted, is not among the Active Thesaurus Subsets, so information is not retrieved from that subset.
  • the Retrieval module uses the thesaurus information to build 320 an Output Item, which is passed on to HTML module 111.1 for formatting. HTML module 111.1 formats 325 the information as part of a standard World Wide Web page layout appropriate to the type of display requested by the user. This information is streamed to the Extemal Web Browser 140, where it is displayed so User 150 may view the new browsing page 330.
  • FIG. 4 illustrates integrated maintenance of multiple thesaurus subsets, in accordance with the present invention.
  • the user 150 while viewing one of the standard browsing pages 405 displayed by External Web Browser 140, clicks on a menu button to perform one of the editing procedures described above.
  • HTML module 111.1 receives and processes this request to edit, by formatting 410 the requested editing page and streaming it to External Web Browser 140.
  • Web Browser 140 displays this edit page 415, which contains typein boxes, pick menus and/or buttons as needed for the requested editing procedure.
  • the Web Browser 140 dispatches them to HTML module 111.1, which processes 420 the editing instructions into discrete Operations.
  • Update module 113 verifies 420 each operation according to the integrity constraints present on the relation involved. The update module first confirms 425 that the operation satisfies the integrity constraint present on the relation involved. If all operations are OK 430, update module 113 performs each change 435 in the thesaurus selected for that change. If any operations are not valid 440, HTML module 111.1 formats 445 a verification page 450 that is sent to user 150 via External Web Browser 140.
  • the verification page 450 merely contains an explanation of why the operations could not be performed. If at least some of the operations appear to be valid, the user has the option to OK them on the verification page 450. Sometimes, one or more of the requested operations may be valid if the directives input by User 150 are interpreted in an alternate fashion, e.g., if the user input a "UserFor" instead of a "Preferred Term.” The verification page 450 will always check with the user before performing the operation in this case.
  • the External Web Browser dispatches the page to HTML module 111.1, which processes 455 the verification input.
  • Update module 113 actually performs 435 each operation in the thesaurus subset selected by the operation. In the diagram, no changes were requested for Subset 213, so none are performed there.
  • the HTML module 111.1 formats 460 a results page 465 and streams it to the external web browser 140.
  • Figure 5 illustrates processing performed in response to a Correlate Concept request from User 150.
  • User 150 has requested that correlation be performed for a particular underlying concept of a thesaurus Preferred Term present in thesaurus Subset 211, indicated by a gray dot in the figure.
  • Generalized knowledge base 122 which are not present in Subset 211, and which are likely to mean the same thing as the starting concept 517 in Subset 211.
  • Figure 6 shows in more detail how correlation candidates are found.
  • two candidates 518, 519 were found — one 519 from Subset 212, and one 518 present in the Generalized Knowledge Base 122.
  • HTML module 111.1 formats 520 a page 525 which allows User 150 to choose one of the candidates, or alternatively, to type in a Preferred Term (identifying an underlying concept) from another thesaurus to correlate with the starting concept 518.
  • Web Browser 140 displays this page 525. At this point the user may decide not to perform any correlation at all, and may simply go on to another task.
  • HTML module 111.1 formats 540 a result page 545, which is dispatched to and displayed by Web Browser 140 for User 150 to view.
  • the example depicted shows processing in response to a user's request to find and establish correlations for a particular concept.
  • the Correlator 116 may also be invoked on an entire thesaurus, sweeping through the subset and finding correlations for each concept present in the subset. Processing for each of these concepts is the same as what is depicted here, but interaction for User 150 differs because the Correlator 116 finds candidates for up to 10 starting concepts at a time, or until a time limit is reached. The user 150 then handles all these concepts as a batch.
  • Figure 6 shows details of the "Identify Correlation Candidates" step 515 of correlation processing.
  • the concept to be correlated several lists of concepts are obtained. These are not necessarily obtained in parallel, but the order in which the lists are retrieved does not matter.
  • a list of concepts which have the same or similar Preferred Term string, in another thesaurus subset, as the starting concept is retrieved 602 and is given a strong weight.
  • a list of concepts which have Alternate Terms (terms linked to a concept via "Use For" or one of the other, user-defined, Lexical Relations), in another thesaurus subset, which are the same or similar to the Preferred Term string of the starting concept, is retrieved 604; these concepts are weighted weakly.
  • a list of concepts which have some Alternate Term that is the same as, or similar to, one of the Alternate Terms of the starting concept is retrieved 606; these concepts are weighted weakly.
  • a list of concepts is retrieved 608 by querying Lexicon 122.1 for concepts that serve as one of the meanings for the Preferred Term string of the starting concept. These concepts are given medium weighting.
  • Another list of concepts is retrieved 610 by querying Lexicon 122.1 for concepts that serve as one of the meanings for any of the Alternate Terms of the starting concept. These concepts are given a weak weight.
  • These five concept lists are merged 620 or combined additively — i.e., if a concept appears in more than one list, the weights from each list are added together.
  • the resulting list in which each concept appears only once, associated with the sum of its weights from the five starting lists, is subjected to several filters.
  • the correlator 116 ensures 630 that there is no thesaurus overlap. If a concept is present in any of the thesauri of the starting concept, it is removed from the list. For these purposes, the Generalized Knowledge Base is treated as a thesaurus. Therefore, if the starting concept is involved in Generalized Knowledge Base formulas, any candidate concept also involved in Generalized Knowledge Base formulas is also removed from the list.
  • a graph isomorphism filter is applied 640 to each candidate concept, comparing the relationships the starting concept is involved in with the relationships the candidate concept is involved in.
  • a graph is created for each term. Terms in a thesaurus are linked to many other terms through a variety of relationships. These links are used to create a graph over the terms of the thesaurus. For each term, then, the links from that central term to other linking te ⁇ ns constitutes a subgraph which serves as a signature for the central term. The subgraph of a candidate is then compared to the subgraph of the term being correlated. The similarity of the subgraphs can be treated as a graph isomorhpism problem between the two subgraphs of the terms. The number of links in one subgraph which can be mapped on to an isomorphic link in the other subgraph serves as an indicator of the number of links, or relations, that are shared by the terms.
  • a string similarity filter is applied 650 to each candidate concept. If the Preferred Term string or Alternate Term strings for a candidate concept are not equal, yet are string-similar, to the Preferred Term string or Alternate Term strings of the starting concept, the weight for that candidate concept is increased slightly.
  • the string-similarity routine looks for missing, substituted or transposed letters as well as singular vs. plural, common variants in spelling (e.g., British vs. American English word endings) and differing verb conjugations. After these filters have been applied, concepts that exceed a certain, parameterized weight cutoff 660 are returned as correlation candidates.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

L'invention porte sur un système intégré d'accès à des informations contenues dans plusieurs sous-ensembles séparés de thesaurus qui comporte: un sous-système d'introduction d'une demande relative aux thesaurus; un sous-système de traitement qui prélève dans les sous-ensembles les entrées correspondant à la demande en considérant chaque sous-ensemble comme un thesaurus intégré séparé; et un sous-système de sortie affichant les entrées recueillies. Une interface HTML permet à l'usager d'accéder au système via un explorateur WEB. Les informations recueillies sont des entrées affichées et facultativement accompagnées de termes connexes ou de détails de sous-ensembles de thesaurus dont elles ont été extraites. Les sous-ensembles de thesaurus peuvent être regroupés en un seul thesaurus soit effectivement, soit virtuellement. Les sous-ensembles de thesaurus peuvent être regroupés en un seul thesaurus moyennant l'utilisation de concepts établissant des relations entre termes du thesaurus. Les entrées ou termes d'un sous-ensemble de thesaurus peuvent être corrélés afin d'éliminer les entrées multiples présentant un sens identique ou voisin à l'intérieur d'un sous-ensemble de thesaurus donné ou dans plusieurs sous-ensembles. On peut placer des restrictions sur les relations entre certains termes d'un sous-ensemble de thesaurus donné ou entre plusieurs sous-ensembles de thesaurus afin de maintenir la cohérence du thesaurus.
PCT/US1998/022215 1997-10-22 1998-10-20 Gestion informatisee de thesaurus WO1999021110A1 (fr)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2000517360A JP2001521225A (ja) 1997-10-22 1998-10-20 コンピュータシソーラスマネージャ
EP98953805A EP1023679A1 (fr) 1997-10-22 1998-10-20 Gestion informatisee de thesaurus
AU11081/99A AU1108199A (en) 1997-10-22 1998-10-20 Computer thesaurus manager

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US6313697P 1997-10-22 1997-10-22
US60/063,136 1997-10-22

Publications (1)

Publication Number Publication Date
WO1999021110A1 true WO1999021110A1 (fr) 1999-04-29

Family

ID=22047163

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1998/022215 WO1999021110A1 (fr) 1997-10-22 1998-10-20 Gestion informatisee de thesaurus

Country Status (4)

Country Link
EP (1) EP1023679A1 (fr)
JP (1) JP2001521225A (fr)
AU (1) AU1108199A (fr)
WO (1) WO1999021110A1 (fr)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1233349A2 (fr) * 2001-02-20 2002-08-21 Hitachi, Ltd. Méthode d'affichage de données et appareil à utiliser pour l'analyse de textes
EP1445708A1 (fr) * 2001-10-17 2004-08-11 Japan Science and Technology Agency Procede et programme de recherche d'information, support d'enregistrement lisible par ordinateur sur lequel est enregistre le programme de recherche d'information
WO2016190495A1 (fr) * 2015-05-28 2016-12-01 삼성에스디에스 주식회사 Procédé de gestion de règles à base de données non structurées et dispositif associé
US10733223B2 (en) 2008-01-08 2020-08-04 International Business Machines Corporation Term-driven records file plan and thesaurus design

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA3055137C (fr) * 2011-01-07 2023-09-12 Ihab Francis Ilyas Systemes et procedes pour analyser et synthetiser des representations de connaissances complexes

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
KRAMER R. ET AL: "Thesaurus federations: loosely integrated thesauri for document retrieval in networks based on Internet technologies", INTERNATIONAL JOURNAL ON DIGITAL LIBRARIES, vol. 1, no. 2, September 1997 (1997-09-01), http://link.springer.de/link/service/journals/00799/papers/7001002/70010122.pdf, XP002094377 *
M. SINTICHAKIS AND P. CONSTANTOPOULOS: "A Method for Monolingual Thesauri Merging", PROC. 20TH INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, ACM SIGIR, 27 JULY 1997, PHILADELPHIA, PA, USA., http://www.ics.forth.gr/proj/isst/Publications/paperlink/A_method_for_Monoling_Thesauri_Merg.ps.gz, pages 129 - 138, XP000775964 *
NISHIKAWA N ET AL: "Allowing multiple experts to revise a thesaurus database", DESIGN OF COMPUTING SYSTEMS: COGNITIVE CONSIDERATIONS. PROCEEDINGS OF THE SEVENTH INTERNATIONAL CONFERENCE ON HUMAN-COMPUTER INTERACTION (HCI INTERNATIONAL '97), PROCEEDINGS OF HCI INTERNATIONAL 97. 7TH INTERNATIONAL CONFERENCE ON HUMAN COMPUTER INTE, ISBN 0-444-82183-X, 1997, Amsterdam, Netherlands, Elsevier, Netherlands, pages 371 - 374 vol.2, XP002094378 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1233349A2 (fr) * 2001-02-20 2002-08-21 Hitachi, Ltd. Méthode d'affichage de données et appareil à utiliser pour l'analyse de textes
EP1233349A3 (fr) * 2001-02-20 2004-10-13 Hitachi, Ltd. Méthode d'affichage de données et appareil à utiliser pour l'analyse de textes
EP1445708A1 (fr) * 2001-10-17 2004-08-11 Japan Science and Technology Agency Procede et programme de recherche d'information, support d'enregistrement lisible par ordinateur sur lequel est enregistre le programme de recherche d'information
EP1445708A4 (fr) * 2001-10-17 2006-12-27 Japan Science & Tech Agency Procede et programme de recherche d'information, support d'enregistrement lisible par ordinateur sur lequel est enregistre le programme de recherche d'information
US7346614B2 (en) 2001-10-17 2008-03-18 Japan Science And Technology Corporation Information searching method, information searching program, and computer-readable recording medium on which information searching program is recorded
US10733223B2 (en) 2008-01-08 2020-08-04 International Business Machines Corporation Term-driven records file plan and thesaurus design
WO2016190495A1 (fr) * 2015-05-28 2016-12-01 삼성에스디에스 주식회사 Procédé de gestion de règles à base de données non structurées et dispositif associé
KR20160139590A (ko) * 2015-05-28 2016-12-07 삼성에스디에스 주식회사 비정형 데이터 기반 룰 관리 방법 및 그 장치
KR101716692B1 (ko) * 2015-05-28 2017-03-15 삼성에스디에스 주식회사 비정형 데이터 기반 룰 관리 방법 및 그 장치

Also Published As

Publication number Publication date
AU1108199A (en) 1999-05-10
EP1023679A1 (fr) 2000-08-02
JP2001521225A (ja) 2001-11-06

Similar Documents

Publication Publication Date Title
He et al. Automatic integration of web search interfaces with wise-integrator
Jayapandian et al. Automated creation of a forms-based database query interface
US6385600B1 (en) System and method for searching on a computer using an evidence set
US6694331B2 (en) Apparatus for and method of searching and organizing intellectual property information utilizing a classification system
US9547287B1 (en) System and method for analyzing library of legal analysis charts
US20040186705A1 (en) Concept word management
US7984047B2 (en) System for extracting relevant data from an intellectual property database
US20020065857A1 (en) System and method for analysis and clustering of documents for search engine
US20120131049A1 (en) Search Tools and Techniques
US20080027933A1 (en) System and method for location, understanding and assimilation of digital documents through abstract indicia
US7389289B2 (en) Filtering search results by grade level readability
US20100029580A1 (en) Method for Diagnosing Non-Small Cell Lung Carcinoma
Johnson et al. DEVISE: a framework for the evaluation of Internet search engines
Bettahar et al. Towards a Semantic Interoperability in an e‑Government Application
EP1023679A1 (fr) Gestion informatisee de thesaurus
Kim et al. A framework for design rationale retrieval
Jagerman Creating, maintaining and applying quality taxonomies
EP1158424A1 (fr) Système et procédé de publication et classification de documents sur un réseau
Khoo et al. Task-based navigation of a taxonomy interface to a digital repository
De Haan et al. Beginning Oracle SQL
Calvanese et al. Building a digital library of newspaper clippings: The LAURIN project
US20030028370A1 (en) System and method for providing a fixed grammar to allow a user to create a relational database without programming
Young et al. Aquifers of Texas bibliography to support the Brackish Resources Aquifer Characterization System (BRACS) program final report. Austin (Texas): Texas Water Development Board
Scott et al. Creating a massive master index for HTML and print
Jung Designing and understanding information retrieval systems using collaborative filtering in an academic library environment

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GD GE GH GM HR HU ID IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG US UZ VN YU ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

WWE Wipo information: entry into national phase

Ref document number: 1998953805

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 09529868

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: KR

ENP Entry into the national phase

Ref country code: JP

Ref document number: 2000 517360

Kind code of ref document: A

Format of ref document f/p: F

WWP Wipo information: published in national office

Ref document number: 1998953805

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: CA

WWW Wipo information: withdrawn in national office

Ref document number: 1998953805

Country of ref document: EP