EP1155377A1 - Methode und vorrichtung zur dynamischen darstellung einer durch eine hierarchie von indexkonzepten organisierten menge von dokumenten - Google Patents

Methode und vorrichtung zur dynamischen darstellung einer durch eine hierarchie von indexkonzepten organisierten menge von dokumenten

Info

Publication number
EP1155377A1
EP1155377A1 EP00907906A EP00907906A EP1155377A1 EP 1155377 A1 EP1155377 A1 EP 1155377A1 EP 00907906 A EP00907906 A EP 00907906A EP 00907906 A EP00907906 A EP 00907906A EP 1155377 A1 EP1155377 A1 EP 1155377A1
Authority
EP
European Patent Office
Prior art keywords
documents
concept
indexing
document
concepts
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP00907906A
Other languages
English (en)
French (fr)
Inventor
Ido Dagan
Yitzhak Stauber
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
LingoMotors Inc
Original Assignee
Focusengine Software Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Focusengine Software Ltd filed Critical Focusengine Software Ltd
Publication of EP1155377A1 publication Critical patent/EP1155377A1/de
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results

Definitions

  • a search mechanism typically attaches to each document a set of indexing concepts.
  • An indexing concept is a symbol or value that characterizes the document, and is typically used within search queries or within routing queries ("queries " that specify which documents will be routed to an addressee).
  • Typical types of indexing concepts include (but are not limited to):
  • Topical categories also known as controlled keywords, topics. descriptors etc.
  • Topical categories are symbols denoting topical issues, which are usually general or abstract concepts that do not necessarily appear literally in the text.
  • a topical category may be "Company Acquisition”. This term, serving as the name of the category, may not appear literally in a document that describes such an event.
  • Document meta-data items such as document source, type, author and date.
  • indexing concepts may also be used to determine routine routing of incoming documents to addressees.
  • indexing process The process of associating indexing concepts to documents (the indexing process) is performed either manually, automatically, or by some combination of the two modes.
  • indexing concepts that consist of terms and names from the document text
  • the indexing process usually involves scanning the text of the document, identifying words, terms and names, and possibly bringing these terms to some canonical form (e.g. the grammatical base form (lemma) of the word).
  • Meta-data indexing concepts are often determined by the systems, in which the document is created or received, but may also be handled manually.
  • the first approach is based on manual definition of the rules, or some other type of logic, by which a document is being classified to a category based on the terms in the text.
  • some systems allow users (or administrators) to define complex queries, which may include Boolean and other types of conditions (such as weights and proximity) that the terms in the document should satisfy.
  • a document that satisfies these conditions is classified to the category.
  • An example for such a system is the Topics TM system that was developed by Verity Inc., USA.
  • the second approach is based on automatic learning of the "logic" which entails the classification of the document to a category.
  • Methods belonging to this approach utilize a set of training documents, for which the correct categories are known in advance (usually as the result of manual classification of these documents).
  • a learning method may then include a learning phase, in which some model of the category is constructed.
  • a model may include terms that are highly associated with the category, and possibly some weights that quantify the degree of correlation (entailment) between each term and the category.
  • a learning method may be memory based, in which case the learning method simply stores the training data in some useful format.
  • the method classifies it automatically by consulting or applying the category model (or by simply comparing the document to the training data, in case of a memory based approach). Examples for trainable (learning) classification systems are described in:
  • a common method for display is to present a list of items, each providing some high level information about a document, such as the document title, meta-data items (such as author, source or date) and possibly a short summary.
  • the list may be sorted by document publication date or by some relevance score, which quantifies the degree of relevance of the document to the user's query, as hypothesized by the search system.
  • Another display method is a hierarchical display, in which documents are organized in a hierarchical structure, similar to a graphical user interface displaying a hierarchical file system.
  • U.S. patent 5,924,090 "Method and Apparatus for Searching a Database of Records" discloses system for searching a database and present to the user a small number of categories along with a list of most relevant records that satisfy a query.
  • the methodology of the Krellenstein patent has a sophisticated clustering algorithm that includes three primary steps: identifying candidate categories, weighting candidate categories and displaying a set of search result categories selected from the candidate categories.
  • Fig. 1 A typical result of the system according to the Krellenstein patent is illustrated in Fig. 1 , as extracted from the www.northernlight.com site.
  • the query text categorization (1) results in 19,215 documents (records) (2) (of which 6 are shown in the first page).
  • the documents are assigned to 15 categories (3).
  • the set of categories are determined after applying the specified sophisticated clustering including identifying candidate categories, weighting candidate categories and displaying a set of search result categories selected from the candidate categories.
  • the user can repeat this process further narrowing the search with each iteration.
  • double clicking the category Special collection documents (4) will result in applying the specified steps again giving rise to the search results illustrate in Fig. 2.
  • the invention provides for a method for dynamically presenting set of documents to users . comprising:
  • each concept in said at least one organizing hierarchical display is associated with a respective subset of documents, from among said organized document subset.
  • the invention further provides for a method for presenting set of documents o users comprising:
  • the invention further provides for a method for presenting set of documents o users comprising:
  • the invention provides for a method for dynamically presenting set of documents to users, comprising: (a) providing a predetermined hierarchy of indexing concepts;
  • the invention provides for a system that includes a processor associated with a memory and display for dynamically presenting set of documents to users , comprising:
  • the memory is configured to store of a predetermined hierarchy of indexing concepts
  • the memory is configured for store a set of documents
  • the processor is configured to provide hierarchical display of the indexing concepts such that each indexing concept in the hierarchy is associated with a sub-set of documents from among said set of documents;
  • the processor is configured to apply steps that include the following (i) to (iii) , as many times as required:
  • each concept in said at least one organizing hierarchical display is associated with a respective subset of documents, from among said organized document subset.
  • the invention provides for a system that includes a processor associated with a memory and display for presenting a set of documents to users comprising:
  • the memory is configured to store indexing concepts and a set of documents ; the set of documents are associated with concepts in accordance with triggering terms in said documents; (b) the processor is configured to select a document from said set;
  • the processor is configured to select at least one concept associated
  • the processor is configured to quantify the importance of the triggering terms of said at least one concept and in response thereto, determining the important triggering terms;
  • the processor is configured to emphasize in the display the important triggering terms that correspond to said at least one concept.
  • the invention provides for a system that includes a processor associated with a memory and display for presenting set of documents to users comprising:
  • the memory is configured to store indexing concepts and a set of documents ; the set of documents are associated with concepts in accordance with triggering terms in said documents;
  • the processor is configured to select a document from said set; (c) the processor is configured to select at least one concept associated with said document;
  • the processor is configured to quantify the importance of the triggering terms of said at least one concept and in response thereto, determining the important triggering terms;
  • the processor is configured to obtain a summary in said display based on said important triggering terms.
  • the invention provides for a system that includes a processor associated with a memory and display for dynamically presenting set of documents to users, comprising:
  • the memory is configured to store a predetermined hierarchy of indexing concepts
  • the memory is configured to store a set of documents;
  • the processor is configured to provide hierarchical display of the indexing concepts; the indexing concepts are associated with the set of documents; the processor is configured to apply the following steps (d) to (f) as many times as required. (d) determining a subset of documents by utilizing the hierarchical display, thereby rendering it organized document subset;
  • Figure 1 - illustrates a screen result of a database search system in accordance with the prior art
  • Figure 2 - illustrates a screen result of a database search system in accordance with the prior art
  • Figure 3 - illustrates a generalized computer system.
  • FIG. 4 - illustrates a flowchart of the preferred embodiment of the invention.
  • Figure 5 Illustrates a top pane the concept hierarchical display, Left Top pane tree representation of hierarchical document set display. Bottom document list of a document subset.
  • Figure 6 - illustrates a left Top pane pie: representation of hierarchical document set display.
  • Figure 7 - illustrates a left Top pane: pie representation of hierarchical document set display.
  • Figure 8 - illustrates an overlapping window - Top pane: document important terms
  • Bottom pane document full text and terms highlighting.
  • Figure 9 - illustrates a left Top pane: a document subset that have been "organized by”.
  • Right Top pane the topics that have performed the "organization”.
  • Figure 10 - illustrates an overlapping window - Top pane: document important terms.
  • Bottom pane document summary and terms highlighting.
  • Figure 1 1 - illustrates a left Top pane: tree representation of hierarchical document set display.
  • Figure 12 - illustrates a left Top pane: tree representation of hierarchical document set display Overlapping window - Top pane: automatic important terms selection. Bottom pane: document text and automatic selected terms highlighting.
  • Figure 13 - illustrates a left Top pane: a document subset that have been "organized by" twice.
  • Right Top pane the topics that have performed the second "organization"; and,
  • Figures 14 to 21 illustrate a succession of screen results obtained by applying the method in accordance with one embodiment of the invention.
  • the invention provides novel methods for utilizing textual information that considerably increase the effectiveness of the end user when dealing with large volumes of documents.
  • a typical embodiment of the invention is used in a computer system, as illustrated in e.g. in Fig. 3.
  • the computer system (30) includes a processor unit (31) with input and output (32 and 33) and associated display (32) and memory (not shown).
  • the computer system (30) is configured to display documents and information about them in order to fulfill some information needs of end users (referred in the following as "system”).
  • system The invention is, of course, not bound by any specific realization of computer system and may include any known structure such as conventional Personal Computer (P.C.) in either stand-alone or network configuration, all as required and appropriate.
  • Fig. 4 provides a high-level flow chart of a typical embodiment of the invention within some computer system (the details of the components of the invention are described below).
  • the system presents a document set (41) in a hierarchical display (42).
  • the structure of the display may be modified
  • the user may select a node (standing for indexing concept) (44) within the hierarchy, and ask for a display of information about the documents that are associated with the selected node (45).
  • the displayed information may include one or more of the following the number of l o sub-set of the documents that are associated with the specified indexing concept, the percentage thereof from among the entire document set, the document title, meta-data elements (such as source and date) and optionally a short summary of the document.
  • the information is of course not limited to the specified details and may vary, depending upon the particular application.
  • the user may then select a particular document (16) for display, leading to the display of the full document text or of a summary of the document.
  • the content of the summary, as well as highlighting within the text, are determined automatically by some indexing concepts, that are determined by default to be in focus of attention of the user.
  • the user may then select different indexing
  • a method and system for presenting document sets and their content to the user of a system in an effective manner refers to any situation in which some document set has to be presented by the system, at any point of time, for purposes such as 3 0 exploration, scanning, reading or analysis.
  • the term document should be construed in a broad manner to encompass any record in a database including, but not limited to, a text and or text/image document.
  • the displayed document set may be e.g. the output of a search query that is applied to a search engine (e.g. AltaVista ), or an entire document collection indexed by the system, or any other document set that is provided as an input for displaying to the user in accordance with the invention.
  • the documents in the presented document set are characterized by indexing concepts, as described above. That is, a typical document is characterized by several indexing concepts that are logically associated thereto. A document is considered indexed by the indexing concepts characterizing it.
  • the possible indexing concepts for documents in the system are arranged in a predetermined hierarchy of indexing concepts (hierarchy in short), as illustrated e.g. in Fig. 5 (31). That is, a parent concept (which is an indexing concept by itself) is defined for each indexing concept. For example, in Fig. 5 (33) "Countries " is the parent of (34) "Latin America". One or several concepts that are defined as roots of the hierarchy may not have a parent node. For example, in Fig. 3 (32) "All" is the root. Usually, each concept in the hierarchy has only one parent giving the hierarchy the form of a tree data structure (or several trees in case of several roots). The described functionality can accommodate also situations where some nodes have more then one parent. The terms concept and node are used interchangeably to denote an indexing concept within the hierarchy. Those versed in the art will readily appreciate that the structure of the indexing concept hierarchy is substantially predetermined.
  • the predetermined structure does not necessarily mean that the indexing categories may not be subject to modification.
  • the hierarchy may include an indexing concept
  • the system may include a mechanism to recognize dynamically that a new name appearing in a document is a company, and define that name as an indexing concept for the document which is a daughter of the node "Companies”.
  • the system includes a filtering mechanism which in response to filtering criterion decides whether an indexing concept is displayed, or not, in the hierarchy.
  • the filtering criterion may filter out concepts associated with a small number of documents, below a certain threshold, or concepts that are associated only with documents whose score for a search query, whose results list constitutes the document set to be displayed, is low.
  • a system displays the concept hierarchy (in a hierarchical display) by any visualization mechanism that is suitable for displaying a hierarchical structure.
  • the most typical display form for a hierarchy is a tree display, as in Fig. 5 (37), in which each node of the tree corresponds to one concept in the. Clicking on a node (or on a special sign, such as "+ that is attached to the node) leads to displaying or hiding its daughters.
  • Hierarchical display mechanisms may show one level of siblings in the hierarchy at a time, by showing a list of elements, each represented by some symbol or icon, where clicking on an element leads to displaying its siblings, while some other option enables getting back (up) in the hierarchy (for example, the "My Computer” icon in the Windows-98/NT system available from Microsoft Inc, USA).
  • any hierarchical display mechanism can be used to display the hierarchy of indexing concept, where user interaction with the display mechanism controls the display of different portions of the hierarchy.
  • Another non-limiting example of hierarchical display is a chart, e.g. a pie chart.
  • This subsection defines a hierarchical display of a presented document set (containing documents indexed by indexing concepts).
  • the hierarchical display serves as a "table of contents" for the document set, which facilitates navigating and browsing of document sets.
  • the scheme of a hierarchical document set display is available in previous systems, but the invention includes some specific enhancements to this scheme, as noted below.
  • the hierarchical document set display is based on the concept hierarchical display, and can be realized by any mechanism for displaying hierarchies, just like the concept hierarchical display discussed above.
  • Fig. 5 37) is a hierarchical document set display in tree form.
  • a set of documents which is a subset of the currently presented document set, is associated with each concept (node) in the hierarchy.
  • a set of documents is associated with the node (39) "Countries".
  • the associated document set for a concept in the hierarchy contains all documents that are indexed by that concept.
  • the associated document set for a concept is defined to include all documents associated with by any of its decedents in the hierarchy.
  • the document set of the concept "Countries" includes all documents indexed by any country or geographical region, assuming that these concepts are all descendents of the concept "Countries" in the hierarchical display.
  • a hierarchical document display thus includes a display of the concept hierarchy (as described above), augmented with some information at each concept node about the document set associated with that node.
  • the information about the associated document set may include, by one embodiment, one or more of the following items:
  • Some key information about prominent topics described within documents of the document set such as most frequent or prominent key terms within the documents of the set, and/or the list of all or some of the indexing concepts for the documents.
  • Fig. 5 is a tree display of the hierarchy with associated information about the document set of each node, containing number of documents and percentage relative to the parent node document set.
  • a pie (or bar) chart can be used to display several sibling nodes (daughters of a common parent).
  • Fig. 6 (44) is a pie representing the daughter nodes of "Countries " .
  • Each pie slice corresponds to one concept and its size indicates the proportion of its associated document set relative to the parent node document set.
  • the quantitative graphical display mechanism may be interactive, in a similar manner to interactive tree presentation of the concept hierarchy. For example, double clicking on a pie slice may lead to displaying the pie of the daughters of the selected node. For example, double clicking on the slice in Fig. 4 (45), corresponding to "Latin America", leads to the display in Fig. 7 (47), a pie presenting the daughters of "Latin America”.
  • the displayed daughters of a node may be sorted alphabetically, or by some characterizing quantitative information, in particular by the size of the associated document set for each daughter.
  • a system may combine both a pie chart display and a tree display. When viewing the tree display with a certain node selected, and switching to the pie chart display, the system will present the pie that corresponds to the daughters of the selected node.
  • the graphical display may present further information about the documents in the associated document set, such as their titles, meta-data elements, document summaries or the full text of the document.
  • Fig. 5 (42) is the list of titles for the documents associated with the node "Latin America”.
  • Fig. 10 (48) is a summary of a selected document in the document list.
  • a display of the full text of a document is presented in Fig. 8 (52).
  • (54) is a list of indexing concepts for the document.
  • concepts in the hierarchical display to which no documents are attached may be omitted from the display. For example, in Fig. 5 no documents are associated with the indexing concept (36) "Bahamas" in the concept hierarchy, thus in the hierarchical indexing concept display, this concept does not
  • the concepts in the hierarchical display are being subject to filtering criterion in order to determine whether or not they will be displayed in said hierarchy.
  • filtering criterion concern which folders in deeper levels of the hierarchy tree will be displayed in said hierarchy.
  • I o be displayed.
  • the necessity of this criterion stems from the fact that the display area allocated to the hierarchy in the display screen may not be sufficient to accommodate the entire hierarchy, and accordingly only portion thereof is displayed, e.g. few levels, and only in response to user selection further levels are displayed (instead of the previously higher levels). For example: if the top level
  • More advanced filtering criterion may rank folders (standing by this embodiment for nodes) to be presented according to, say the number of documents in it and the quality of their match to the current
  • an "Others" node is added to each list of siblings having a common parent.
  • the documents associated with the "Others" node are those associated with the parent
  • the hierarchical indexing concept display may be restricted to a particular sub-part of the hierarchy, determined by some mechanism, rather then presenting 5 the full hierarchy. For example, it is possible to present the hierarchical indexing concept display using only the "Countries" sub-tree of the hierarchy. This non-limiting modification also falls in the definition of predetermined hierarchical indexing concept display.
  • the hierarchical indexing concept set display serves as a "table of contents" for the document set and can be used as a method for displaying document sets to
  • the hierarchical indexing concept set display is limited because it has a static structure, which is equivalent to the structure of the concept hierarchy.
  • one of the leaves of the tree may be the country "France”, as in Fig. 1 1 (55), containing 45 documents.
  • This section defines a novel mechanism provided by the invention for presenting dynamic "tables of contents” displays for document sets, enabling the user to dynamically modify and refine the document display whilst maintaining the predetermined hierarchical indexing concept display.
  • the dynamic display is by itself hierarchical utilizing the specified predetermined hierarchy of categories, and thus provides all the functionality of the hierarchical document set display, as described above..
  • a document set is presented in some manner, possibly by the (static) hierarchical indexing concept display.
  • the dynamic display is created by a series of organize by" operations, each specified by two definitions:
  • selecting the document set may, preferably, correspond to selecting a node in the hierarchical document set display.
  • selecting the node "France” in Fig. 1 1 (55) defines the document set associated with this node as the subset to be organized. This subset is termed the organized document subset.
  • the selected subset corresponds to a node in the display, that node is termed the organized node.
  • the selection of the "organized" document subset is performed on the basis of information displayed in the hierarchy, e.g. defining an indexing concept in the hierarchy as an organized by concept and rendering the documents associated therewith as the specified "organized" document subset.
  • the node "Companies” may be selected as an organizing node (57 in Fig. 9), to organize the document subset associated with the node "France”.
  • any concept in the indexing concept hierarchy display is associated with respective sub set of documents from among the organized document subset.
  • a document may be associated with more than one concept of the organizing hierarchical display.
  • a "respective" subset of documents encompasses also the special situation in which a concept is associated with no documents.
  • the "organize by" operation may be interpreted as a recursive application of the hierarchical indexing concept display, as its effect is to provide a new hierarchical display for a node within a previously displayed hierarchy.
  • the hierarchical display is maintained predetermined considering that in the modified presentation, substantially, the same concepts are employed, which makes it easier for the user to follow "well known” and familiar concepts, even after applying the "organizing" operation.
  • the organizing node can be the root of the concept hierarchy, in which case the organized document subset will be displayed by a hierarchical indexing concept set display that corresponds to the entire concept hierarchy.
  • a system may apply only this special case (always organizing by the full hierarchy considering the root as the organizing node), in which case it is necessary to define only the organized node in order to apply an "organize by" operation.
  • a system may implement the hierarchical document display such that at each point of time the user view is focused only on one node of the tree. In this case, applying the "organize by" operation implies implicitly that the organized node is the currently displayed node, saving the need of an explicit definition of the organized node.
  • the default definition of organizing concept as the root node and the organized by concept as the currently displayed node may be realized by a single user operation say, for example, clicking on a predetermined icon.
  • the hierarchical display of the organized subset is displayed as a new, dynamically created, daughter (or daughters) of the selected organized node.
  • the node "Companies” in Fig. 7 (60) is added dynamically as a new daughter node of (59) the node "France”, modifying the hierarchical display that was presented to the user just before applying the "organize by" operation.
  • a new daughter node either replaces or is added as a sibling to the previously existing daughters of the organized node.
  • any part of the new display may be subject to further “organize by” operations.
  • a node that was added to the hierarchy in a previous "organize by” operation may be selected as the organized subset in a later operation.
  • Subsequent "organize by” operations on the modified dynamic display may be applied as requested by the user.
  • the node “Boeing” which has been created by a previous “organize by” operation is later selected as an organized node, where the organizing node is (65) "Activities”.
  • a node “Activities” (70) is dynamically added to the display, and its daughters (64) (signifying documents indexed by both "France” and “Boeing” and by some activity) are associated, each, with information that pertains to these documents. For example, there are 19 documents indexed by “France” (67) "Boeing” (69) and “Agreement” (71).
  • the specified organized by operation may be applied recursively (repeated) as many time as required each time in respect of new selected "organized by” and “organizing” concepts.
  • the basic form of the "organize by" operation may consist selecting one node in a hierarchical display as the organized node, and one node in the concept hierarchy as the organizing node.
  • the following paragraph describes extensions to the basic form.
  • the organizing node “France” may be organized by, “Companies” and “Activities”, which means that all the documents associated with the indexing concept France will be organized by the indexing concept “Companies” and separately by the indexing concept “Activities " . If desired, the nodes "Companies” and “Activities” are added as daughters to "France”.
  • Multiple selection of organized nodes has the effect of applying the "organize by" operation simultaneously to all selected nodes. For example, applying an "organize by" operation with the same organizing node to both nodes "France” and "Spain".
  • the net effect of selecting more than one organized nodes is that each node is associated with its respective organized by subset of documents and then some operator or operators is (are) applied to the specified subsets so as to constitute resulting organized subset of documents that is then subject to the organizing operation.
  • there is a first subset of documents associated with France a second subset of documents associated with Spain.
  • the operator that is applied to the subsets is OR giving rise to a document subset that includes documents that pertain only to Spain, only to France or to both. This resulting subset of documents is than being subject to the organizing operation by one or more organizing concepts.
  • the set of documents may be obtained by applying a search query to say conventional search engine that operates similarly to as AltaVista and display the resulting set along with the hierarchical display of the invention.
  • FIG. 14 illustrates a predetermined indexing concept hierarchy (140) that includes 1 1 ,000 documents (142) that constitute the document set and are broken down by the hierarchy concepts.
  • Applying a query results in 318 documents (see 151 in Fig. 15) that are broken down by the concept hierarchy.
  • the list of documents is displayed (152), and, by this example, the first four documents are shown in the first page.
  • the query itself ("pagers") is automatically assigned to categories in the hierarchy as if it were a document.
  • the resulting category is illustrated in the Related category" field (153), to wit: Telecom All > Applications > Messaging > Paging. All the categories, except from "Paging" are shown in the hierarchical presentation (151, 154, and 155).
  • Paging is a sub category of Applications and can be shown if the Browse section of the screen is enlarged, or if the user decides to show it by, say, clicking a specified symbol (as described above).
  • Fig. 16 is the same as Fig. 16 except that now the documents that are associated with sub-category Telecom Service Companies (171) are shown. This may be achieved by simply clicking the relevant category in the hierarchy (by this particular example Telecom Service Companies - not shown in the hierarchy Fig. 17) and the documents associated therewith are shown.
  • the documents that are shown obviously relate to "paging" and telecom service companies.
  • Fig. 18 illustrates yet another degree of detail wherein only documents that pertain to Sky Tel 181 (which forms sub-category of the specified Telecom Service Companies - not shown in the hierarchy of Fig. 18) are shown.
  • the user simply clicks the products category (200) in Fig. 20 and the 8 relevant documents are shown at the search section of the screen (201) If, from among the specified 8 documents only those that concern Motorola are of interest the user simply clicks the Motorola category (210) in Fig. 21 and in response thereto the pertinent 3 documents are shown.
  • the invention provides in accordance with another aspect thereof, new mechanisms for presenting parts of or all of the text of a document in a dynamic and effective manner. These mechanisms direct the attention of the user to relevant parts of the document and enables quick focusing on these parts. For example, these relevant parts might be text segments that contain relevant information for the user or can help deciding about the relevance of the document.
  • the decision of which parts of the document should be in focus is dynamic, and may be changed according to user guidance or to the context in which the document is being displayed.
  • the parts of the document which should be highlighted or be included in a summary are determined according to a set of (one or more) indexing concepts, among the indexing concepts of the document, that are considered to be in focus at a certain stage of user interaction with the system. These indexing concepts are called focus indexing concepts.
  • the highlighting and summarization for a given focus indexing concept is determined by tbe important triggering terms for that concept.
  • the triggering terms for a concept are the occurrences in the document of all terms which entail the attachment (or classification) of the concept to the document.
  • Highlighting and an extracted summary will include the important triggering terms for the concept, or short segments of text that are considered to be important.
  • the degree of importance of terms and segments may be quantified by some scoring mechanism, where the degree of importance of the terms in a segment is factor in determining the degree of the segment importance.
  • the invention provides dynamic methods for determining (quantifying) which triggering terms and segments are important in a given context of the user interaction with the system that displays the documents.
  • the quantifying step assigns the same degree of importance to all triggering terms.
  • the latter option does not apply to the aspect which concerns emphasizing important triggering terms.
  • emphasizing important triggering terms not all the triggering terms are ranked with the same degree of importance.
  • the important triggering terms and segments are presented to the user, either in a form of an extracted summary, which contains the important terms and/or segments, or by highlighting the important terms within the display of the full document, or by some combination of the two methods.
  • the term important one refers to the case where the degree of importance of triggering terms and segments can be quantified and the display is restricted those with the highest importance.
  • the amount of terms or segments to be included in the display is determined by some mechanism, such as a threshold on the degree of importance or on the number of items to be included. This ranking mechanism by degree of importance is necessary when there are many important terms or segments and it is desired to limit their display in order to achieve optimal focus of attention by the user. Fig.
  • FIG. 10 displays a summary of a document, in which the important terms are highlighted.
  • the important terms were determined relative to the highlighted indexing concepts "Latin America” (50) and “Lockheed Martin” (51) which are in the focus of interest to the user, as explained below).
  • the summary includes segments of the text that contain the important terms.
  • Fig. 8 presents a full display of a document text, in which important terms (relative to the indexing concept (54), see below) are highlighted. While the general scheme of making some form of highlighting triggering terms in a document for display is available in previous systems, the invention, by this aspect, concerns selecting important terms, described below.
  • One non-limiting method in the context of the invention refers to selecting the important triggering terms in a document with respect to an indexing concept that is determined to be in focus (of interest) at a certain stage of the user interaction with the system.
  • the indexing concept "Product specifications/capabilities” (54) is selected to be in focus.
  • This part of the invention refers to the case where the indexing concept was assigned to the document by some text classification method, as described above. Such a method classifies the document to a certain indexing concept based on words, terms or their combinations that appear in the document. It is assumed that it is possible to trace within the classification system which words or terms in the document entailed the classification to the given indexing concept.
  • a trainable text classification method in which the terms and the degree to which they entail classification to the indexing concept are learned from training documents, for which it is previously known whether they belong to the indexing concept or not.
  • This method applies a Bayesian learning scheme for text classification. For a given category, the method computes (during the training phase) certain weights for terms (words or phrases) in the text, with respect to the category. The score of the category for a particular document is computed as a function (usually some sort of a normalized sum) of the weights of the terms that appear in the document. When computing the category score for a document, it is possible to trace the relative contribution of each term in the document to the accumulative score. Thus, triggering terms in this method will be those terms that provided the highest contribution to the accumulative score of the document.
  • the important triggering terms are those term occurrences that signivicantly contributed to the classification of the document to the focus indexing concept.
  • the triggering terms for the indexing concept "Product Specification/Capabilities" (54) are highlighted within the text (52) of the document.
  • their degree of importance would be proportional to this degree of relative contribution to classification.
  • the method described above for selecting the important triggering terms for an indexing concept in focus could be combined with simpler methods for identifying the triggering terms for an indexing concept (such methods are not part of the invention).
  • the important term is simply the occurrence of the indexing concept in the text.
  • the triggering terms are simply all the terms that appear in the query (similar to document search systems that highlight matching query terms in the retrieved documents).
  • Another method within the invention refers to selecting important terms and segments for display by selecting dynamically several focus indexing concepts.
  • One way of selecting the focus indexing concepts is by letting the user select them interactively from the list of all indexing concepts of the document. In Fig. 10 the user have selected (50) “Latin America” and (51) “Lockheed Martin” as focus indexing concepts. Consequently, the selected important terms, which are highlighted in the document text (48), are the triggering terms for both (50) and (51).
  • Other mechanisms for selecting the set of focus indexing concepts may be applied as well, such as the method described next.
  • the important triggering terms and segments are selected from the important triggering terms and segments of each one of the focus indexing concepts, applying some procedure that combines them and reevaluates their degree of importance with respect to the complete set of focus indexing concepts.
  • the degree of importance of a triggering term or segment with respect to the complete set of focus indexing concepts may be defined (referred to also as quantified) by its maximal (or minimal) degree of importance for any of the individual indexing concepts (applying a disjunctive (or conjunctive) reasoning criterion), or by computing some averaging function of the individual importance degrees.
  • the display of important terms or segments for the complete set of focus indexing concepts may distinguish between terms that were selected originally for the different indexing concepts that compose the set. For example, a different color is attributed to each indexing concept, and the important terms related to this concept are highlighted by the corresponding color.
  • the indexing concept "LATIN AMERICA” (50) is highlighted with a blue background and "LOCKHEED MARTIN” (51) is highlighted with a pink background (blue appear darker than pink in the black and white printing).
  • Another method within the invention refers to the selection of default focus indexing concepts, to be used automatically as the focus indexing concepts when the document is presented to the user.
  • the default focus indexing concepts are selected according to the selection conditions that were applied in the process that led to the display of the document.
  • the indexing concepts contained in the query become the default focus indexing concepts.
  • a particular setting for this method occurs when the document is selected for display within the hierarchical document set display or within the dynamic hierarchical document set display.
  • a document was selected for display from the node (document subset) (61) "ARGENTINA”.
  • the default focus indexing concept is (62) “ARGENTINA”
  • the triggering term "Argentina” (63) is highlighted within the document text.
  • a document is selected for display from the document set that is associated with a certain node in the hierarchy.
  • the documents in this set satisfy a logical condition that is equivalent to a search query which is a conjunction (logical AND) of all indexing concepts in the path from the root of the displayed hierarchy to the selected node.
  • the default focus indexing concepts are the concepts along this path.
  • parts of this path may correspond to paths within the concept hierarchy and parts of the path might be created dynamically within the dynamic hierarchical document set display.
  • the documents associated with the node "Agreement" (71) satisfy a logical AND condition for all indexing concepts on the displayed path from the root of the tree to this node.
  • the method of viewing document sets that are attached to concept nodes in a (possibly dynamic) hierarchical document set display may be combined with the use of explicit search queries issued by the user.
  • the document set attached to a concept node is restricted by an additional condition supplied in an explicit search query, then the default focus indexing concepts will be a combination of the concepts of the path, as described above, and the concepts that are included in the query.
  • the organized document subset is determined by defining one (or more) of the concepts in the hierarchy as "organized by" concept, thereby rendering the subset of documents associated therewith "organized document subset" this is not necessarily always the case.
  • any determination of subset of documents (organized document subset) by utilizing the so displayed hierarchy i.e. implemented using information derived from the so displayed hierarchy is embraced by the invention.
EP00907906A 1999-02-25 2000-02-25 Methode und vorrichtung zur dynamischen darstellung einer durch eine hierarchie von indexkonzepten organisierten menge von dokumenten Withdrawn EP1155377A1 (de)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US12159699P 1999-02-25 1999-02-25
US121596P 1999-02-25
PCT/IL2000/000117 WO2000051024A1 (en) 1999-02-25 2000-02-25 Method and apparatus for dynamically displaying a set of documents organized by a hierarchy of indexing concepts

Publications (1)

Publication Number Publication Date
EP1155377A1 true EP1155377A1 (de) 2001-11-21

Family

ID=22397683

Family Applications (1)

Application Number Title Priority Date Filing Date
EP00907906A Withdrawn EP1155377A1 (de) 1999-02-25 2000-02-25 Methode und vorrichtung zur dynamischen darstellung einer durch eine hierarchie von indexkonzepten organisierten menge von dokumenten

Country Status (5)

Country Link
EP (1) EP1155377A1 (de)
AU (1) AU2936600A (de)
CA (1) CA2371244A1 (de)
IL (1) IL145049A0 (de)
WO (1) WO2000051024A1 (de)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2404337A1 (en) * 2000-03-27 2001-10-04 Documentum, Inc. Method and apparatus for generating metadata for a document
AU2002210882A1 (en) * 2000-10-17 2002-05-15 Focusengine Software Ltd. Integrating search, classification, scoring and ranking
NO20052215L (no) 2005-05-06 2006-11-07 Fast Search & Transfer Asa Fremgangsmate til bestemmelse av kontekstuell sammendragsinformasjon over dokumenter
EP2050024A1 (de) * 2006-07-27 2009-04-22 Sapio Systems Aps Verfahren zum verarbeiten einer sammlung von dokumentquellen
NO325864B1 (no) 2006-11-07 2008-08-04 Fast Search & Transfer Asa Fremgangsmåte ved beregning av sammendragsinformasjon og en søkemotor for å støtte og implementere fremgangsmåten

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO0051024A1 *

Also Published As

Publication number Publication date
WO2000051024A1 (en) 2000-08-31
CA2371244A1 (en) 2000-08-31
AU2936600A (en) 2000-09-14
IL145049A0 (en) 2002-06-30

Similar Documents

Publication Publication Date Title
Carpineto et al. Exploiting the potential of concept lattices for information retrieval with CREDO.
US7496567B1 (en) System and method for document categorization
US5721897A (en) Browse by prompted keyword phrases with an improved user interface
US5598557A (en) Apparatus and method for retrieving and grouping images representing text files based on the relevance of key words extracted from a selected file to the text files
US5924090A (en) Method and apparatus for searching a database of records
US20030061209A1 (en) Computer user interface tool for navigation of data stored in directed graphs
US7523095B2 (en) System and method for generating refinement categories for a set of search results
US7216115B1 (en) Apparatus and method for displaying records responsive to a database query
US5787422A (en) Method and apparatus for information accesss employing overlapping clusters
US7130848B2 (en) Methods for document indexing and analysis
US20020049705A1 (en) Method for creating content oriented databases and content files
EP1024437B1 (de) Multimodaler Informationzugriff
US8332439B2 (en) Automatically generating a hierarchy of terms
US20010039490A1 (en) System and method of analyzing and comparing entity documents
WO2007136560A2 (en) Method and system for information extraction and modeling
JP3643470B2 (ja) 文書検索システムおよび文書検索支援方法
US20090083312A1 (en) Document composition system and method
WO2000051024A1 (en) Method and apparatus for dynamically displaying a set of documents organized by a hierarchy of indexing concepts
Attardi et al. Theseus: categorization by context
WO2002037328A2 (en) Integrating search, classification, scoring and ranking
JPH09311805A (ja) 文書処理方法及び装置
JP2004348768A (ja) 文書検索方法
EP1282844A2 (de) Verfahren zur erzeugung von inhaltsorientierten datenbanken und inhaltsdateien
CN116108129A (zh) 基于语义的多维度文本文献关联检索方法及检索系统
Jones PROGRESSIVE DISCOVERY OF DOCUMENT CONTENT

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20010822

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE

17Q First examination report despatched

Effective date: 20011204

RAP1 Party data changed (applicant data changed or rights of an application transferred)

Owner name: LINGOMOTORS, INC.

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20020615