EP1155377A1 - Method and apparatus for dynamically displaying a set of documents organized by a hierarchy of indexing concepts - Google Patents

Method and apparatus for dynamically displaying a set of documents organized by a hierarchy of indexing concepts

Info

Publication number
EP1155377A1
EP1155377A1 EP00907906A EP00907906A EP1155377A1 EP 1155377 A1 EP1155377 A1 EP 1155377A1 EP 00907906 A EP00907906 A EP 00907906A EP 00907906 A EP00907906 A EP 00907906A EP 1155377 A1 EP1155377 A1 EP 1155377A1
Authority
EP
European Patent Office
Prior art keywords
documents
concept
indexing
document
concepts
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP00907906A
Other languages
German (de)
French (fr)
Inventor
Ido Dagan
Yitzhak Stauber
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
LingoMotors Inc
Original Assignee
Focusengine Software Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Focusengine Software Ltd filed Critical Focusengine Software Ltd
Publication of EP1155377A1 publication Critical patent/EP1155377A1/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results

Definitions

  • a search mechanism typically attaches to each document a set of indexing concepts.
  • An indexing concept is a symbol or value that characterizes the document, and is typically used within search queries or within routing queries ("queries " that specify which documents will be routed to an addressee).
  • Typical types of indexing concepts include (but are not limited to):
  • Topical categories also known as controlled keywords, topics. descriptors etc.
  • Topical categories are symbols denoting topical issues, which are usually general or abstract concepts that do not necessarily appear literally in the text.
  • a topical category may be "Company Acquisition”. This term, serving as the name of the category, may not appear literally in a document that describes such an event.
  • Document meta-data items such as document source, type, author and date.
  • indexing concepts may also be used to determine routine routing of incoming documents to addressees.
  • indexing process The process of associating indexing concepts to documents (the indexing process) is performed either manually, automatically, or by some combination of the two modes.
  • indexing concepts that consist of terms and names from the document text
  • the indexing process usually involves scanning the text of the document, identifying words, terms and names, and possibly bringing these terms to some canonical form (e.g. the grammatical base form (lemma) of the word).
  • Meta-data indexing concepts are often determined by the systems, in which the document is created or received, but may also be handled manually.
  • the first approach is based on manual definition of the rules, or some other type of logic, by which a document is being classified to a category based on the terms in the text.
  • some systems allow users (or administrators) to define complex queries, which may include Boolean and other types of conditions (such as weights and proximity) that the terms in the document should satisfy.
  • a document that satisfies these conditions is classified to the category.
  • An example for such a system is the Topics TM system that was developed by Verity Inc., USA.
  • the second approach is based on automatic learning of the "logic" which entails the classification of the document to a category.
  • Methods belonging to this approach utilize a set of training documents, for which the correct categories are known in advance (usually as the result of manual classification of these documents).
  • a learning method may then include a learning phase, in which some model of the category is constructed.
  • a model may include terms that are highly associated with the category, and possibly some weights that quantify the degree of correlation (entailment) between each term and the category.
  • a learning method may be memory based, in which case the learning method simply stores the training data in some useful format.
  • the method classifies it automatically by consulting or applying the category model (or by simply comparing the document to the training data, in case of a memory based approach). Examples for trainable (learning) classification systems are described in:
  • a common method for display is to present a list of items, each providing some high level information about a document, such as the document title, meta-data items (such as author, source or date) and possibly a short summary.
  • the list may be sorted by document publication date or by some relevance score, which quantifies the degree of relevance of the document to the user's query, as hypothesized by the search system.
  • Another display method is a hierarchical display, in which documents are organized in a hierarchical structure, similar to a graphical user interface displaying a hierarchical file system.
  • U.S. patent 5,924,090 "Method and Apparatus for Searching a Database of Records" discloses system for searching a database and present to the user a small number of categories along with a list of most relevant records that satisfy a query.
  • the methodology of the Krellenstein patent has a sophisticated clustering algorithm that includes three primary steps: identifying candidate categories, weighting candidate categories and displaying a set of search result categories selected from the candidate categories.
  • Fig. 1 A typical result of the system according to the Krellenstein patent is illustrated in Fig. 1 , as extracted from the www.northernlight.com site.
  • the query text categorization (1) results in 19,215 documents (records) (2) (of which 6 are shown in the first page).
  • the documents are assigned to 15 categories (3).
  • the set of categories are determined after applying the specified sophisticated clustering including identifying candidate categories, weighting candidate categories and displaying a set of search result categories selected from the candidate categories.
  • the user can repeat this process further narrowing the search with each iteration.
  • double clicking the category Special collection documents (4) will result in applying the specified steps again giving rise to the search results illustrate in Fig. 2.
  • the invention provides for a method for dynamically presenting set of documents to users . comprising:
  • each concept in said at least one organizing hierarchical display is associated with a respective subset of documents, from among said organized document subset.
  • the invention further provides for a method for presenting set of documents o users comprising:
  • the invention further provides for a method for presenting set of documents o users comprising:
  • the invention provides for a method for dynamically presenting set of documents to users, comprising: (a) providing a predetermined hierarchy of indexing concepts;
  • the invention provides for a system that includes a processor associated with a memory and display for dynamically presenting set of documents to users , comprising:
  • the memory is configured to store of a predetermined hierarchy of indexing concepts
  • the memory is configured for store a set of documents
  • the processor is configured to provide hierarchical display of the indexing concepts such that each indexing concept in the hierarchy is associated with a sub-set of documents from among said set of documents;
  • the processor is configured to apply steps that include the following (i) to (iii) , as many times as required:
  • each concept in said at least one organizing hierarchical display is associated with a respective subset of documents, from among said organized document subset.
  • the invention provides for a system that includes a processor associated with a memory and display for presenting a set of documents to users comprising:
  • the memory is configured to store indexing concepts and a set of documents ; the set of documents are associated with concepts in accordance with triggering terms in said documents; (b) the processor is configured to select a document from said set;
  • the processor is configured to select at least one concept associated
  • the processor is configured to quantify the importance of the triggering terms of said at least one concept and in response thereto, determining the important triggering terms;
  • the processor is configured to emphasize in the display the important triggering terms that correspond to said at least one concept.
  • the invention provides for a system that includes a processor associated with a memory and display for presenting set of documents to users comprising:
  • the memory is configured to store indexing concepts and a set of documents ; the set of documents are associated with concepts in accordance with triggering terms in said documents;
  • the processor is configured to select a document from said set; (c) the processor is configured to select at least one concept associated with said document;
  • the processor is configured to quantify the importance of the triggering terms of said at least one concept and in response thereto, determining the important triggering terms;
  • the processor is configured to obtain a summary in said display based on said important triggering terms.
  • the invention provides for a system that includes a processor associated with a memory and display for dynamically presenting set of documents to users, comprising:
  • the memory is configured to store a predetermined hierarchy of indexing concepts
  • the memory is configured to store a set of documents;
  • the processor is configured to provide hierarchical display of the indexing concepts; the indexing concepts are associated with the set of documents; the processor is configured to apply the following steps (d) to (f) as many times as required. (d) determining a subset of documents by utilizing the hierarchical display, thereby rendering it organized document subset;
  • Figure 1 - illustrates a screen result of a database search system in accordance with the prior art
  • Figure 2 - illustrates a screen result of a database search system in accordance with the prior art
  • Figure 3 - illustrates a generalized computer system.
  • FIG. 4 - illustrates a flowchart of the preferred embodiment of the invention.
  • Figure 5 Illustrates a top pane the concept hierarchical display, Left Top pane tree representation of hierarchical document set display. Bottom document list of a document subset.
  • Figure 6 - illustrates a left Top pane pie: representation of hierarchical document set display.
  • Figure 7 - illustrates a left Top pane: pie representation of hierarchical document set display.
  • Figure 8 - illustrates an overlapping window - Top pane: document important terms
  • Bottom pane document full text and terms highlighting.
  • Figure 9 - illustrates a left Top pane: a document subset that have been "organized by”.
  • Right Top pane the topics that have performed the "organization”.
  • Figure 10 - illustrates an overlapping window - Top pane: document important terms.
  • Bottom pane document summary and terms highlighting.
  • Figure 1 1 - illustrates a left Top pane: tree representation of hierarchical document set display.
  • Figure 12 - illustrates a left Top pane: tree representation of hierarchical document set display Overlapping window - Top pane: automatic important terms selection. Bottom pane: document text and automatic selected terms highlighting.
  • Figure 13 - illustrates a left Top pane: a document subset that have been "organized by" twice.
  • Right Top pane the topics that have performed the second "organization"; and,
  • Figures 14 to 21 illustrate a succession of screen results obtained by applying the method in accordance with one embodiment of the invention.
  • the invention provides novel methods for utilizing textual information that considerably increase the effectiveness of the end user when dealing with large volumes of documents.
  • a typical embodiment of the invention is used in a computer system, as illustrated in e.g. in Fig. 3.
  • the computer system (30) includes a processor unit (31) with input and output (32 and 33) and associated display (32) and memory (not shown).
  • the computer system (30) is configured to display documents and information about them in order to fulfill some information needs of end users (referred in the following as "system”).
  • system The invention is, of course, not bound by any specific realization of computer system and may include any known structure such as conventional Personal Computer (P.C.) in either stand-alone or network configuration, all as required and appropriate.
  • Fig. 4 provides a high-level flow chart of a typical embodiment of the invention within some computer system (the details of the components of the invention are described below).
  • the system presents a document set (41) in a hierarchical display (42).
  • the structure of the display may be modified
  • the user may select a node (standing for indexing concept) (44) within the hierarchy, and ask for a display of information about the documents that are associated with the selected node (45).
  • the displayed information may include one or more of the following the number of l o sub-set of the documents that are associated with the specified indexing concept, the percentage thereof from among the entire document set, the document title, meta-data elements (such as source and date) and optionally a short summary of the document.
  • the information is of course not limited to the specified details and may vary, depending upon the particular application.
  • the user may then select a particular document (16) for display, leading to the display of the full document text or of a summary of the document.
  • the content of the summary, as well as highlighting within the text, are determined automatically by some indexing concepts, that are determined by default to be in focus of attention of the user.
  • the user may then select different indexing
  • a method and system for presenting document sets and their content to the user of a system in an effective manner refers to any situation in which some document set has to be presented by the system, at any point of time, for purposes such as 3 0 exploration, scanning, reading or analysis.
  • the term document should be construed in a broad manner to encompass any record in a database including, but not limited to, a text and or text/image document.
  • the displayed document set may be e.g. the output of a search query that is applied to a search engine (e.g. AltaVista ), or an entire document collection indexed by the system, or any other document set that is provided as an input for displaying to the user in accordance with the invention.
  • the documents in the presented document set are characterized by indexing concepts, as described above. That is, a typical document is characterized by several indexing concepts that are logically associated thereto. A document is considered indexed by the indexing concepts characterizing it.
  • the possible indexing concepts for documents in the system are arranged in a predetermined hierarchy of indexing concepts (hierarchy in short), as illustrated e.g. in Fig. 5 (31). That is, a parent concept (which is an indexing concept by itself) is defined for each indexing concept. For example, in Fig. 5 (33) "Countries " is the parent of (34) "Latin America". One or several concepts that are defined as roots of the hierarchy may not have a parent node. For example, in Fig. 3 (32) "All" is the root. Usually, each concept in the hierarchy has only one parent giving the hierarchy the form of a tree data structure (or several trees in case of several roots). The described functionality can accommodate also situations where some nodes have more then one parent. The terms concept and node are used interchangeably to denote an indexing concept within the hierarchy. Those versed in the art will readily appreciate that the structure of the indexing concept hierarchy is substantially predetermined.
  • the predetermined structure does not necessarily mean that the indexing categories may not be subject to modification.
  • the hierarchy may include an indexing concept
  • the system may include a mechanism to recognize dynamically that a new name appearing in a document is a company, and define that name as an indexing concept for the document which is a daughter of the node "Companies”.
  • the system includes a filtering mechanism which in response to filtering criterion decides whether an indexing concept is displayed, or not, in the hierarchy.
  • the filtering criterion may filter out concepts associated with a small number of documents, below a certain threshold, or concepts that are associated only with documents whose score for a search query, whose results list constitutes the document set to be displayed, is low.
  • a system displays the concept hierarchy (in a hierarchical display) by any visualization mechanism that is suitable for displaying a hierarchical structure.
  • the most typical display form for a hierarchy is a tree display, as in Fig. 5 (37), in which each node of the tree corresponds to one concept in the. Clicking on a node (or on a special sign, such as "+ that is attached to the node) leads to displaying or hiding its daughters.
  • Hierarchical display mechanisms may show one level of siblings in the hierarchy at a time, by showing a list of elements, each represented by some symbol or icon, where clicking on an element leads to displaying its siblings, while some other option enables getting back (up) in the hierarchy (for example, the "My Computer” icon in the Windows-98/NT system available from Microsoft Inc, USA).
  • any hierarchical display mechanism can be used to display the hierarchy of indexing concept, where user interaction with the display mechanism controls the display of different portions of the hierarchy.
  • Another non-limiting example of hierarchical display is a chart, e.g. a pie chart.
  • This subsection defines a hierarchical display of a presented document set (containing documents indexed by indexing concepts).
  • the hierarchical display serves as a "table of contents" for the document set, which facilitates navigating and browsing of document sets.
  • the scheme of a hierarchical document set display is available in previous systems, but the invention includes some specific enhancements to this scheme, as noted below.
  • the hierarchical document set display is based on the concept hierarchical display, and can be realized by any mechanism for displaying hierarchies, just like the concept hierarchical display discussed above.
  • Fig. 5 37) is a hierarchical document set display in tree form.
  • a set of documents which is a subset of the currently presented document set, is associated with each concept (node) in the hierarchy.
  • a set of documents is associated with the node (39) "Countries".
  • the associated document set for a concept in the hierarchy contains all documents that are indexed by that concept.
  • the associated document set for a concept is defined to include all documents associated with by any of its decedents in the hierarchy.
  • the document set of the concept "Countries" includes all documents indexed by any country or geographical region, assuming that these concepts are all descendents of the concept "Countries" in the hierarchical display.
  • a hierarchical document display thus includes a display of the concept hierarchy (as described above), augmented with some information at each concept node about the document set associated with that node.
  • the information about the associated document set may include, by one embodiment, one or more of the following items:
  • Some key information about prominent topics described within documents of the document set such as most frequent or prominent key terms within the documents of the set, and/or the list of all or some of the indexing concepts for the documents.
  • Fig. 5 is a tree display of the hierarchy with associated information about the document set of each node, containing number of documents and percentage relative to the parent node document set.
  • a pie (or bar) chart can be used to display several sibling nodes (daughters of a common parent).
  • Fig. 6 (44) is a pie representing the daughter nodes of "Countries " .
  • Each pie slice corresponds to one concept and its size indicates the proportion of its associated document set relative to the parent node document set.
  • the quantitative graphical display mechanism may be interactive, in a similar manner to interactive tree presentation of the concept hierarchy. For example, double clicking on a pie slice may lead to displaying the pie of the daughters of the selected node. For example, double clicking on the slice in Fig. 4 (45), corresponding to "Latin America", leads to the display in Fig. 7 (47), a pie presenting the daughters of "Latin America”.
  • the displayed daughters of a node may be sorted alphabetically, or by some characterizing quantitative information, in particular by the size of the associated document set for each daughter.
  • a system may combine both a pie chart display and a tree display. When viewing the tree display with a certain node selected, and switching to the pie chart display, the system will present the pie that corresponds to the daughters of the selected node.
  • the graphical display may present further information about the documents in the associated document set, such as their titles, meta-data elements, document summaries or the full text of the document.
  • Fig. 5 (42) is the list of titles for the documents associated with the node "Latin America”.
  • Fig. 10 (48) is a summary of a selected document in the document list.
  • a display of the full text of a document is presented in Fig. 8 (52).
  • (54) is a list of indexing concepts for the document.
  • concepts in the hierarchical display to which no documents are attached may be omitted from the display. For example, in Fig. 5 no documents are associated with the indexing concept (36) "Bahamas" in the concept hierarchy, thus in the hierarchical indexing concept display, this concept does not
  • the concepts in the hierarchical display are being subject to filtering criterion in order to determine whether or not they will be displayed in said hierarchy.
  • filtering criterion concern which folders in deeper levels of the hierarchy tree will be displayed in said hierarchy.
  • I o be displayed.
  • the necessity of this criterion stems from the fact that the display area allocated to the hierarchy in the display screen may not be sufficient to accommodate the entire hierarchy, and accordingly only portion thereof is displayed, e.g. few levels, and only in response to user selection further levels are displayed (instead of the previously higher levels). For example: if the top level
  • More advanced filtering criterion may rank folders (standing by this embodiment for nodes) to be presented according to, say the number of documents in it and the quality of their match to the current
  • an "Others" node is added to each list of siblings having a common parent.
  • the documents associated with the "Others" node are those associated with the parent
  • the hierarchical indexing concept display may be restricted to a particular sub-part of the hierarchy, determined by some mechanism, rather then presenting 5 the full hierarchy. For example, it is possible to present the hierarchical indexing concept display using only the "Countries" sub-tree of the hierarchy. This non-limiting modification also falls in the definition of predetermined hierarchical indexing concept display.
  • the hierarchical indexing concept set display serves as a "table of contents" for the document set and can be used as a method for displaying document sets to
  • the hierarchical indexing concept set display is limited because it has a static structure, which is equivalent to the structure of the concept hierarchy.
  • one of the leaves of the tree may be the country "France”, as in Fig. 1 1 (55), containing 45 documents.
  • This section defines a novel mechanism provided by the invention for presenting dynamic "tables of contents” displays for document sets, enabling the user to dynamically modify and refine the document display whilst maintaining the predetermined hierarchical indexing concept display.
  • the dynamic display is by itself hierarchical utilizing the specified predetermined hierarchy of categories, and thus provides all the functionality of the hierarchical document set display, as described above..
  • a document set is presented in some manner, possibly by the (static) hierarchical indexing concept display.
  • the dynamic display is created by a series of organize by" operations, each specified by two definitions:
  • selecting the document set may, preferably, correspond to selecting a node in the hierarchical document set display.
  • selecting the node "France” in Fig. 1 1 (55) defines the document set associated with this node as the subset to be organized. This subset is termed the organized document subset.
  • the selected subset corresponds to a node in the display, that node is termed the organized node.
  • the selection of the "organized" document subset is performed on the basis of information displayed in the hierarchy, e.g. defining an indexing concept in the hierarchy as an organized by concept and rendering the documents associated therewith as the specified "organized" document subset.
  • the node "Companies” may be selected as an organizing node (57 in Fig. 9), to organize the document subset associated with the node "France”.
  • any concept in the indexing concept hierarchy display is associated with respective sub set of documents from among the organized document subset.
  • a document may be associated with more than one concept of the organizing hierarchical display.
  • a "respective" subset of documents encompasses also the special situation in which a concept is associated with no documents.
  • the "organize by" operation may be interpreted as a recursive application of the hierarchical indexing concept display, as its effect is to provide a new hierarchical display for a node within a previously displayed hierarchy.
  • the hierarchical display is maintained predetermined considering that in the modified presentation, substantially, the same concepts are employed, which makes it easier for the user to follow "well known” and familiar concepts, even after applying the "organizing" operation.
  • the organizing node can be the root of the concept hierarchy, in which case the organized document subset will be displayed by a hierarchical indexing concept set display that corresponds to the entire concept hierarchy.
  • a system may apply only this special case (always organizing by the full hierarchy considering the root as the organizing node), in which case it is necessary to define only the organized node in order to apply an "organize by" operation.
  • a system may implement the hierarchical document display such that at each point of time the user view is focused only on one node of the tree. In this case, applying the "organize by" operation implies implicitly that the organized node is the currently displayed node, saving the need of an explicit definition of the organized node.
  • the default definition of organizing concept as the root node and the organized by concept as the currently displayed node may be realized by a single user operation say, for example, clicking on a predetermined icon.
  • the hierarchical display of the organized subset is displayed as a new, dynamically created, daughter (or daughters) of the selected organized node.
  • the node "Companies” in Fig. 7 (60) is added dynamically as a new daughter node of (59) the node "France”, modifying the hierarchical display that was presented to the user just before applying the "organize by" operation.
  • a new daughter node either replaces or is added as a sibling to the previously existing daughters of the organized node.
  • any part of the new display may be subject to further “organize by” operations.
  • a node that was added to the hierarchy in a previous "organize by” operation may be selected as the organized subset in a later operation.
  • Subsequent "organize by” operations on the modified dynamic display may be applied as requested by the user.
  • the node “Boeing” which has been created by a previous “organize by” operation is later selected as an organized node, where the organizing node is (65) "Activities”.
  • a node “Activities” (70) is dynamically added to the display, and its daughters (64) (signifying documents indexed by both "France” and “Boeing” and by some activity) are associated, each, with information that pertains to these documents. For example, there are 19 documents indexed by “France” (67) "Boeing” (69) and “Agreement” (71).
  • the specified organized by operation may be applied recursively (repeated) as many time as required each time in respect of new selected "organized by” and “organizing” concepts.
  • the basic form of the "organize by" operation may consist selecting one node in a hierarchical display as the organized node, and one node in the concept hierarchy as the organizing node.
  • the following paragraph describes extensions to the basic form.
  • the organizing node “France” may be organized by, “Companies” and “Activities”, which means that all the documents associated with the indexing concept France will be organized by the indexing concept “Companies” and separately by the indexing concept “Activities " . If desired, the nodes "Companies” and “Activities” are added as daughters to "France”.
  • Multiple selection of organized nodes has the effect of applying the "organize by" operation simultaneously to all selected nodes. For example, applying an "organize by" operation with the same organizing node to both nodes "France” and "Spain".
  • the net effect of selecting more than one organized nodes is that each node is associated with its respective organized by subset of documents and then some operator or operators is (are) applied to the specified subsets so as to constitute resulting organized subset of documents that is then subject to the organizing operation.
  • there is a first subset of documents associated with France a second subset of documents associated with Spain.
  • the operator that is applied to the subsets is OR giving rise to a document subset that includes documents that pertain only to Spain, only to France or to both. This resulting subset of documents is than being subject to the organizing operation by one or more organizing concepts.
  • the set of documents may be obtained by applying a search query to say conventional search engine that operates similarly to as AltaVista and display the resulting set along with the hierarchical display of the invention.
  • FIG. 14 illustrates a predetermined indexing concept hierarchy (140) that includes 1 1 ,000 documents (142) that constitute the document set and are broken down by the hierarchy concepts.
  • Applying a query results in 318 documents (see 151 in Fig. 15) that are broken down by the concept hierarchy.
  • the list of documents is displayed (152), and, by this example, the first four documents are shown in the first page.
  • the query itself ("pagers") is automatically assigned to categories in the hierarchy as if it were a document.
  • the resulting category is illustrated in the Related category" field (153), to wit: Telecom All > Applications > Messaging > Paging. All the categories, except from "Paging" are shown in the hierarchical presentation (151, 154, and 155).
  • Paging is a sub category of Applications and can be shown if the Browse section of the screen is enlarged, or if the user decides to show it by, say, clicking a specified symbol (as described above).
  • Fig. 16 is the same as Fig. 16 except that now the documents that are associated with sub-category Telecom Service Companies (171) are shown. This may be achieved by simply clicking the relevant category in the hierarchy (by this particular example Telecom Service Companies - not shown in the hierarchy Fig. 17) and the documents associated therewith are shown.
  • the documents that are shown obviously relate to "paging" and telecom service companies.
  • Fig. 18 illustrates yet another degree of detail wherein only documents that pertain to Sky Tel 181 (which forms sub-category of the specified Telecom Service Companies - not shown in the hierarchy of Fig. 18) are shown.
  • the user simply clicks the products category (200) in Fig. 20 and the 8 relevant documents are shown at the search section of the screen (201) If, from among the specified 8 documents only those that concern Motorola are of interest the user simply clicks the Motorola category (210) in Fig. 21 and in response thereto the pertinent 3 documents are shown.
  • the invention provides in accordance with another aspect thereof, new mechanisms for presenting parts of or all of the text of a document in a dynamic and effective manner. These mechanisms direct the attention of the user to relevant parts of the document and enables quick focusing on these parts. For example, these relevant parts might be text segments that contain relevant information for the user or can help deciding about the relevance of the document.
  • the decision of which parts of the document should be in focus is dynamic, and may be changed according to user guidance or to the context in which the document is being displayed.
  • the parts of the document which should be highlighted or be included in a summary are determined according to a set of (one or more) indexing concepts, among the indexing concepts of the document, that are considered to be in focus at a certain stage of user interaction with the system. These indexing concepts are called focus indexing concepts.
  • the highlighting and summarization for a given focus indexing concept is determined by tbe important triggering terms for that concept.
  • the triggering terms for a concept are the occurrences in the document of all terms which entail the attachment (or classification) of the concept to the document.
  • Highlighting and an extracted summary will include the important triggering terms for the concept, or short segments of text that are considered to be important.
  • the degree of importance of terms and segments may be quantified by some scoring mechanism, where the degree of importance of the terms in a segment is factor in determining the degree of the segment importance.
  • the invention provides dynamic methods for determining (quantifying) which triggering terms and segments are important in a given context of the user interaction with the system that displays the documents.
  • the quantifying step assigns the same degree of importance to all triggering terms.
  • the latter option does not apply to the aspect which concerns emphasizing important triggering terms.
  • emphasizing important triggering terms not all the triggering terms are ranked with the same degree of importance.
  • the important triggering terms and segments are presented to the user, either in a form of an extracted summary, which contains the important terms and/or segments, or by highlighting the important terms within the display of the full document, or by some combination of the two methods.
  • the term important one refers to the case where the degree of importance of triggering terms and segments can be quantified and the display is restricted those with the highest importance.
  • the amount of terms or segments to be included in the display is determined by some mechanism, such as a threshold on the degree of importance or on the number of items to be included. This ranking mechanism by degree of importance is necessary when there are many important terms or segments and it is desired to limit their display in order to achieve optimal focus of attention by the user. Fig.
  • FIG. 10 displays a summary of a document, in which the important terms are highlighted.
  • the important terms were determined relative to the highlighted indexing concepts "Latin America” (50) and “Lockheed Martin” (51) which are in the focus of interest to the user, as explained below).
  • the summary includes segments of the text that contain the important terms.
  • Fig. 8 presents a full display of a document text, in which important terms (relative to the indexing concept (54), see below) are highlighted. While the general scheme of making some form of highlighting triggering terms in a document for display is available in previous systems, the invention, by this aspect, concerns selecting important terms, described below.
  • One non-limiting method in the context of the invention refers to selecting the important triggering terms in a document with respect to an indexing concept that is determined to be in focus (of interest) at a certain stage of the user interaction with the system.
  • the indexing concept "Product specifications/capabilities” (54) is selected to be in focus.
  • This part of the invention refers to the case where the indexing concept was assigned to the document by some text classification method, as described above. Such a method classifies the document to a certain indexing concept based on words, terms or their combinations that appear in the document. It is assumed that it is possible to trace within the classification system which words or terms in the document entailed the classification to the given indexing concept.
  • a trainable text classification method in which the terms and the degree to which they entail classification to the indexing concept are learned from training documents, for which it is previously known whether they belong to the indexing concept or not.
  • This method applies a Bayesian learning scheme for text classification. For a given category, the method computes (during the training phase) certain weights for terms (words or phrases) in the text, with respect to the category. The score of the category for a particular document is computed as a function (usually some sort of a normalized sum) of the weights of the terms that appear in the document. When computing the category score for a document, it is possible to trace the relative contribution of each term in the document to the accumulative score. Thus, triggering terms in this method will be those terms that provided the highest contribution to the accumulative score of the document.
  • the important triggering terms are those term occurrences that signivicantly contributed to the classification of the document to the focus indexing concept.
  • the triggering terms for the indexing concept "Product Specification/Capabilities" (54) are highlighted within the text (52) of the document.
  • their degree of importance would be proportional to this degree of relative contribution to classification.
  • the method described above for selecting the important triggering terms for an indexing concept in focus could be combined with simpler methods for identifying the triggering terms for an indexing concept (such methods are not part of the invention).
  • the important term is simply the occurrence of the indexing concept in the text.
  • the triggering terms are simply all the terms that appear in the query (similar to document search systems that highlight matching query terms in the retrieved documents).
  • Another method within the invention refers to selecting important terms and segments for display by selecting dynamically several focus indexing concepts.
  • One way of selecting the focus indexing concepts is by letting the user select them interactively from the list of all indexing concepts of the document. In Fig. 10 the user have selected (50) “Latin America” and (51) “Lockheed Martin” as focus indexing concepts. Consequently, the selected important terms, which are highlighted in the document text (48), are the triggering terms for both (50) and (51).
  • Other mechanisms for selecting the set of focus indexing concepts may be applied as well, such as the method described next.
  • the important triggering terms and segments are selected from the important triggering terms and segments of each one of the focus indexing concepts, applying some procedure that combines them and reevaluates their degree of importance with respect to the complete set of focus indexing concepts.
  • the degree of importance of a triggering term or segment with respect to the complete set of focus indexing concepts may be defined (referred to also as quantified) by its maximal (or minimal) degree of importance for any of the individual indexing concepts (applying a disjunctive (or conjunctive) reasoning criterion), or by computing some averaging function of the individual importance degrees.
  • the display of important terms or segments for the complete set of focus indexing concepts may distinguish between terms that were selected originally for the different indexing concepts that compose the set. For example, a different color is attributed to each indexing concept, and the important terms related to this concept are highlighted by the corresponding color.
  • the indexing concept "LATIN AMERICA” (50) is highlighted with a blue background and "LOCKHEED MARTIN” (51) is highlighted with a pink background (blue appear darker than pink in the black and white printing).
  • Another method within the invention refers to the selection of default focus indexing concepts, to be used automatically as the focus indexing concepts when the document is presented to the user.
  • the default focus indexing concepts are selected according to the selection conditions that were applied in the process that led to the display of the document.
  • the indexing concepts contained in the query become the default focus indexing concepts.
  • a particular setting for this method occurs when the document is selected for display within the hierarchical document set display or within the dynamic hierarchical document set display.
  • a document was selected for display from the node (document subset) (61) "ARGENTINA”.
  • the default focus indexing concept is (62) “ARGENTINA”
  • the triggering term "Argentina” (63) is highlighted within the document text.
  • a document is selected for display from the document set that is associated with a certain node in the hierarchy.
  • the documents in this set satisfy a logical condition that is equivalent to a search query which is a conjunction (logical AND) of all indexing concepts in the path from the root of the displayed hierarchy to the selected node.
  • the default focus indexing concepts are the concepts along this path.
  • parts of this path may correspond to paths within the concept hierarchy and parts of the path might be created dynamically within the dynamic hierarchical document set display.
  • the documents associated with the node "Agreement" (71) satisfy a logical AND condition for all indexing concepts on the displayed path from the root of the tree to this node.
  • the method of viewing document sets that are attached to concept nodes in a (possibly dynamic) hierarchical document set display may be combined with the use of explicit search queries issued by the user.
  • the document set attached to a concept node is restricted by an additional condition supplied in an explicit search query, then the default focus indexing concepts will be a combination of the concepts of the path, as described above, and the concepts that are included in the query.
  • the organized document subset is determined by defining one (or more) of the concepts in the hierarchy as "organized by" concept, thereby rendering the subset of documents associated therewith "organized document subset" this is not necessarily always the case.
  • any determination of subset of documents (organized document subset) by utilizing the so displayed hierarchy i.e. implemented using information derived from the so displayed hierarchy is embraced by the invention.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A system that includes a processor associated with a memory and display for dynamically presenting set of documents to users. The memory is configured to store a predetermined hierarchy of indexing concepts and is configured to store a set of documents. The processor is configured to provide hierarchical display of the indexing concepts. The indexing concepts are associated with the set of documents. The processor is further configured to apply the following as many times as required: determining a subset of documents by utilizing the hierarchical display thereby rendering it organized document subset, defining at least one indexing concept in the hierarchical display so as to constitute an 'organizing' concept, and providing an organizing hierarchical display of indexing concepts, wherein the root of the organizing hierarchical display is the organizing concept, wherein concepts in the organizing hierarchical display are associated with the organized document subset.

Description

METHOD AND APPARATUS FOR DYNAMICALLY DISPLAYING A SET OF DOCUMENTS ORGANIZED BY A HIERARCHY OF INDEXING CONCEPTS
FIELD AND BACKGROUND OF THE INVENTION
The amount of textual information that is available in computerized media has increased dramatically in recent years. As a result, there is an increasing need
10 for end users to have effective tools for searching, browsing, navigating, reading and analyzing collections of textual documents. Current common practice, within organizations as well as in the Internet, is having a search engine that indexes a large repository of documents and enables users to issue a search query and to get in response all documents that satisfy the search conditions. Usually, a list of
1 5 titles, along with some additional information, is presented for each document and the user can further ask for the display of specific documents from the list. The list of documents is often sorted by some relevance ranking, which is intended to approximate the degree of relevance of the document to the query. Sorting by date is also often available.
20 A search mechanism typically attaches to each document a set of indexing concepts. An indexing concept is a symbol or value that characterizes the document, and is typically used within search queries or within routing queries ("queries" that specify which documents will be routed to an addressee). Typical types of indexing concepts include (but are not limited to):
25 1. Topical categories (also known as controlled keywords, topics. descriptors etc.). These are symbols denoting topical issues, which are usually general or abstract concepts that do not necessarily appear literally in the text. For example, a topical category may be "Company Acquisition". This term, serving as the name of the category, may not appear literally in a document that describes such an event.
2. Important terms and names of entities (such as countries, companies, products and people) which appear or are referred to in the text (as is or by synonyms).
3. Document meta-data items, such as document source, type, author and date.
In the following, a document is considered indexed by the indexing concepts characterizing it. Apart from being used in ad-hoc search queries, indexing concepts may also be used to determine routine routing of incoming documents to addressees.
The process of associating indexing concepts to documents (the indexing process) is performed either manually, automatically, or by some combination of the two modes. With respect to indexing concepts that consist of terms and names from the document text, the indexing process usually involves scanning the text of the document, identifying words, terms and names, and possibly bringing these terms to some canonical form (e.g. the grammatical base form (lemma) of the word). Meta-data indexing concepts are often determined by the systems, in which the document is created or received, but may also be handled manually.
Of particular interest to the invention is the indexing process for topical categories (categories, in short). In many systems, it is possible for the user to manually assign topical categories to a document. More recently, there have been developed a number of methods for assigning topical categories to documents automatically, which are referred to here as automatic text classification methods. Such methods classify documents to appropriate categories taken from a predetermined list of possible categories. Classification is performed by some mechanism that receives the document text as input and determines the appropriate categories based on the words, terms or their combinations that appear in the document.
There are two common approaches for automatic text classification methods. The first approach is based on manual definition of the rules, or some other type of logic, by which a document is being classified to a category based on the terms in the text. For example, some systems allow users (or administrators) to define complex queries, which may include Boolean and other types of conditions (such as weights and proximity) that the terms in the document should satisfy. A document that satisfies these conditions is classified to the category. An example for such a system is the Topics ™ system that was developed by Verity Inc., USA.
The second approach is based on automatic learning of the "logic" which entails the classification of the document to a category. Methods belonging to this approach utilize a set of training documents, for which the correct categories are known in advance (usually as the result of manual classification of these documents). A learning method may then include a learning phase, in which some model of the category is constructed. For example, such a model may include terms that are highly associated with the category, and possibly some weights that quantify the degree of correlation (entailment) between each term and the category. Alternatively, a learning method may be memory based, in which case the learning method simply stores the training data in some useful format. Then, when a new document is given for classification, the method classifies it automatically by consulting or applying the category model (or by simply comparing the document to the training data, in case of a memory based approach). Examples for trainable (learning) classification systems are described in:
1. C. Apte and F. Damerau and S. Weiss, 1994. Towards language independent automated learning of text categorization models, in Proceedings of ACM-SIGIR Conference on Information Retrieval. 2. W.W. Cohen, Text categorization and relational learning, in Machine Learning Journal, 1995, pages 124 — 132.
3. W. W. Cohen and Y. Singer, Context-sensitive learning methods for text categorization, in Proceedings of the 19th Annual Int. ACM Conference on Research and Development in Information Retrieval, 1996, pages 307—315.
4. D. Lewis, 1992. An evaluation of phrasal and clustered representations on a text categorization problem, in Proc. of the 15th Int. ACM-SIGIR
Conference on Information Retrieval, pages 37 — 50.
5. D. Lewis and M. Ringuette, 1994. A comparison of two learning algorithms for text categorization, in Proc. of Symposium on Document Analysis and Information Retrieval, pages 81 — 93.
6. D. Lewis and R. E. Schapire and J. P. Callan and R. Papka, 1996, Training algorithms for linear text classifiers, in SIGIR '96: Proc. of the 19th Int. Conference on Research and Development in Information Retrieval.
7. K. Tzeras and S. Hartmann, 1993, Automatic Indexing Based on Bayesian
Inference Networks, in Proc. of 16th Int. ACM SIGIR Conference on Research and Development in Information Retrieval, pages22 — 34.
8. E. Wiener and J. Pedersen and A. Weigend, 1995, A neural network approach to topic spotting, in Symposium on Document Analysis and Information Retrieval, pages 317 — 332.
Once documents have been obtained by a user, as a result of some search or some routing mechanism, these documents are typically displayed in one of several formats. A common method for display is to present a list of items, each providing some high level information about a document, such as the document title, meta-data items (such as author, source or date) and possibly a short summary. The list may be sorted by document publication date or by some relevance score, which quantifies the degree of relevance of the document to the user's query, as hypothesized by the search system. Another display method is a hierarchical display, in which documents are organized in a hierarchical structure, similar to a graphical user interface displaying a hierarchical file system.
U.S. patent 5,924,090 (Krellenstein) "Method and Apparatus for Searching a Database of Records" discloses system for searching a database and present to the user a small number of categories along with a list of most relevant records that satisfy a query. The methodology of the Krellenstein patent has a sophisticated clustering algorithm that includes three primary steps: identifying candidate categories, weighting candidate categories and displaying a set of search result categories selected from the candidate categories. A typical result of the system according to the Krellenstein patent is illustrated in Fig. 1 , as extracted from the www.northernlight.com site.
Thus, as shown the query text categorization (1) results in 19,215 documents (records) (2) (of which 6 are shown in the first page). The documents are assigned to 15 categories (3). The set of categories are determined after applying the specified sophisticated clustering including identifying candidate categories, weighting candidate categories and displaying a set of search result categories selected from the candidate categories. In accordance with the specified system, the user can repeat this process further narrowing the search with each iteration. Thus, double clicking the category Special collection documents (4) will result in applying the specified steps again giving rise to the search results illustrate in Fig. 2. As shown there are 2057 records (5) in the sought category (6) that, in turn are assigned to 12 categories (7). As readily arises from the search results depicted in Fig. 2, the resulting categories are determined dynamically and, accordingly, each search is likely to give rise to different set of categories. This approach has a significant shortcoming in that every time there is a different list of categories, so the user depends on "luck" on whether the categories of interest are included in the list or not. In addition, there is no fixed structure that the user knows and can expect, in order to look for the categories that are of interest to him.
Several systems and method provide a summarization mechanism, which produces automatically a summary for a document. The summary is produced based on various rules or other criteria that evaluate the degree of importance of different parts of the document. The summary is typically constructed as an extract of important sentences or paragraphs taken from the document. For example, systems that offer summaries include the LinguistX software package from InXight Inc.. USA, the "AutoSummarize" option in Word, available from Microsoft Inc., USA.
When displaying the full text of a document, many search systems highlight the search words that were matched in the document text.
The current common practice for utilizing textual information does not satisfy sufficiently the increasing need of individuals and organizations. Searching information in large repositories is often a very tedious process, preventing effective utilization of information that is potentially available to the user. In particular, searches made with current techniques in large repositories often retrieve large document sets, making it extremely difficult and often impractical for the user to browse and sift through the retrieved documents and extract the relevant knowledge hidden in the vast amount of information. The bottleneck in information quest processes thus becomes the amount of time necessaiy for users to satisfy their information needs, as current processes require too much of the user's time.
There is accordingly a need in the art to provide for a system and method that substantially reduces or overcomes the drawbacks of hitherto known techniques, and for increasing the effectiveness of user effort in information quest processes. SUMMARY OF THE INVENTION
The invention provides for a method for dynamically presenting set of documents to users . comprising:
(a) providing a predetermined hierarchy of indexing concepts;
(b) providing a set of documents;
(c) providing hierarchical display of the indexing concepts such that each indexing concept in the hierarchy is associated with a sub-set of documents from among said set of documents;
(d) applying steps that include the following (i) to (iii) . as many times as required:
(i) determining a subset of documents by utilizing the hierarchical display, thereby rendering it organized document subset;
(ii) defining at least one indexing concept in the hierarchical display so as to constitute a respective organizing concept; and
(iii) providing at least one organizing hierarchical display of indexing concepts, wherein the root of said at least one organizing hierarchical display being the respective at least one organizing concept; each concept in said at least one organizing hierarchical display is associated with a respective subset of documents, from among said organized document subset.
The invention further provides for a method for presenting set of documents o users comprising:
(a) providing indexing concepts and a set of documents ; the set of documents are associated with concepts in accordance with triggering terms in said documents; (b) selecting a document from said set;
(c) selecting at least one concept associated with said document;
(d) quantifying the importance of the triggering terms of said at least one concept and in response thereto, determining the important triggering terms; and (e) emphasizing the important triggering terms that correspond to said at least one concept.
The invention further provides for a method for presenting set of documents o users comprising:
(a) providing indexing concepts and a set of documents ; the set of documents are associated with concepts in accordance with triggering terms in said documents;
(b) selecting a document from said set;
(c) selecting at least one concept associated with said document;
(d) quantifying the importance of the triggering terms of said at least one concept and in response thereto, determining the important triggering terms; and
(e) obtaining a summary based on said important triggering terms.
Still further the invention provides for a method for dynamically presenting set of documents to users, comprising: (a) providing a predetermined hierarchy of indexing concepts;
(b) providing a set of documents;
(c) providing hierarchical display of the indexing concepts; the indexing concepts are associated with the set of documents
(d) determining a subset of documents by utilizing the hierarchical display, thereby rendering it organized document subset;
(e) defining at least one indexing concept in the hierarchical display so as to constitute a respective "organizing" concept; and
(f) providing at least one organizing hierarchical display of indexing concepts, wherein the root of said at least one organizing hierarchical display being the respective at least one organizing concept, wherein concepts in said organizing hierarchical display are associated with the organized document subset;
(g) repeating steps (d) to (f), as many times as required.
Still further, the invention provides for a system that includes a processor associated with a memory and display for dynamically presenting set of documents to users , comprising:
(a) the memory is configured to store of a predetermined hierarchy of indexing concepts;
(b) the memory is configured for store a set of documents (c) the processor is configured to provide hierarchical display of the indexing concepts such that each indexing concept in the hierarchy is associated with a sub-set of documents from among said set of documents;
(d) the processor is configured to apply steps that include the following (i) to (iii) , as many times as required:
(i) determining a subset of documents by utilizing the hierarchy of display, thereby rendering it organized document subset;
(ii) defining at least one indexing concept in the hierarchy of display so as to constitute a respective "organizing" concept; and
(iii) providing at least one organizing hierarchical display of indexing concepts, wherein the root of said at least one organizing hierarchical display being the respective at least one organizing concept; each concept in said at least one organizing hierarchical display is associated with a respective subset of documents, from among said organized document subset.
Yet further, the invention provides for a system that includes a processor associated with a memory and display for presenting a set of documents to users comprising:
(a) the memory is configured to store indexing concepts and a set of documents ; the set of documents are associated with concepts in accordance with triggering terms in said documents; (b) the processor is configured to select a document from said set;
(c) the processor is configured to select at least one concept associated
Λvith said document;
(d) the processor is configured to quantify the importance of the triggering terms of said at least one concept and in response thereto, determining the important triggering terms; and
(e) the processor is configured to emphasize in the display the important triggering terms that correspond to said at least one concept.
The invention provides for a system that includes a processor associated with a memory and display for presenting set of documents to users comprising:
(a) the memory is configured to store indexing concepts and a set of documents ; the set of documents are associated with concepts in accordance with triggering terms in said documents;
(b) the processor is configured to select a document from said set; (c) the processor is configured to select at least one concept associated with said document;
(d) the processor is configured to quantify the importance of the triggering terms of said at least one concept and in response thereto, determining the important triggering terms; and
(e) the processor is configured to obtain a summary in said display based on said important triggering terms.
Still further, the invention provides for a system that includes a processor associated with a memory and display for dynamically presenting set of documents to users, comprising:
(a) the memory is configured to store a predetermined hierarchy of indexing concepts;
(b) the memory is configured to store a set of documents; (c) the processor is configured to provide hierarchical display of the indexing concepts; the indexing concepts are associated with the set of documents; the processor is configured to apply the following steps (d) to (f) as many times as required. (d) determining a subset of documents by utilizing the hierarchical display, thereby rendering it organized document subset;
(e) defining at least one indexing concept in the hierarchical display so as to constitute a respective "organizing" concept; and
(f) providing at least one organizing hierarchical display of indexing concepts, wherein the root of said at least one organizing hierarchical display being the respective at least one organizing concept, wherein concepts in said organizing hierarchical display are associated with the organized document subset;
BRIEF DESCRIPTION OF THE DRAWINGS For better understanding, the invention will now be described by way of examples only, with reference to the accompanying drawings in which:
Figure 1 - illustrates a screen result of a database search system in accordance with the prior art; Figure 2 - illustrates a screen result of a database search system in accordance with the prior art; Figure 3 - illustrates a generalized computer system.
Figure 4 - illustrates a flowchart of the preferred embodiment of the invention.
Figure 5 -illustrates a top pane the concept hierarchical display, Left Top pane tree representation of hierarchical document set display. Bottom document list of a document subset.
Figure 6 - illustrates a left Top pane pie: representation of hierarchical document set display.
Figure 7 - illustrates a left Top pane: pie representation of hierarchical document set display.
Figure 8 - illustrates an overlapping window - Top pane: document important terms Bottom pane: document full text and terms highlighting.
Figure 9 - illustrates a left Top pane: a document subset that have been "organized by". Right Top pane: the topics that have performed the "organization".
Figure 10 - illustrates an overlapping window - Top pane: document important terms. Bottom pane: document summary and terms highlighting. Figure 1 1 - illustrates a left Top pane: tree representation of hierarchical document set display.
Figure 12 - illustrates a left Top pane: tree representation of hierarchical document set display Overlapping window - Top pane: automatic important terms selection. Bottom pane: document text and automatic selected terms highlighting.
Figure 13 - illustrates a left Top pane: a document subset that have been "organized by" twice. Right Top pane: the topics that have performed the second "organization"; and,
Figures 14 to 21 illustrate a succession of screen results obtained by applying the method in accordance with one embodiment of the invention.
DETAILED DESCRIPTION OF A PREFERRED EMBODIMENT
It should be noted that in the context of the invention, the terms concept and category are used interchangeably. In connection with some embodiments the term node signifies concept or category.
The invention provides novel methods for utilizing textual information that considerably increase the effectiveness of the end user when dealing with large volumes of documents. A typical embodiment of the invention is used in a computer system, as illustrated in e.g. in Fig. 3. The computer system (30) includes a processor unit (31) with input and output (32 and 33) and associated display (32) and memory (not shown). The computer system (30) is configured to display documents and information about them in order to fulfill some information needs of end users (referred in the following as "system"). The invention is, of course, not bound by any specific realization of computer system and may include any known structure such as conventional Personal Computer (P.C.) in either stand-alone or network configuration, all as required and appropriate. Fig. 4 provides a high-level flow chart of a typical embodiment of the invention within some computer system (the details of the components of the invention are described below). The system presents a document set (41) in a hierarchical display (42). The structure of the display may be modified
5 dynamically by an "organize by" operation (43) maintaining, however, a predetermined structure of the hierarchy. The user may select a node (standing for indexing concept) (44) within the hierarchy, and ask for a display of information about the documents that are associated with the selected node (45). The displayed information may include one or more of the following the number of l o sub-set of the documents that are associated with the specified indexing concept, the percentage thereof from among the entire document set, the document title, meta-data elements (such as source and date) and optionally a short summary of the document. The information is of course not limited to the specified details and may vary, depending upon the particular application.
15 The user may then select a particular document (16) for display, leading to the display of the full document text or of a summary of the document. The content of the summary, as well as highlighting within the text, are determined automatically by some indexing concepts, that are determined by default to be in focus of attention of the user. The user may then select different indexing
20 concepts to be in focus, leading to modified highlighting and summary.
The rest of the section describes the details of the preferred embodiment of the invention.
Setting and Input 25 Document set
In accordance with the invention, there is provided a method and system for presenting document sets and their content to the user of a system in an effective manner. The invention thus refers to any situation in which some document set has to be presented by the system, at any point of time, for purposes such as 30 exploration, scanning, reading or analysis. The term document should be construed in a broad manner to encompass any record in a database including, but not limited to, a text and or text/image document. The displayed document set may be e.g. the output of a search query that is applied to a search engine (e.g. AltaVista ), or an entire document collection indexed by the system, or any other document set that is provided as an input for displaying to the user in accordance with the invention.
Indexing concepts
The documents in the presented document set are characterized by indexing concepts, as described above. That is, a typical document is characterized by several indexing concepts that are logically associated thereto. A document is considered indexed by the indexing concepts characterizing it.
Concept hierarchy
The possible indexing concepts for documents in the system are arranged in a predetermined hierarchy of indexing concepts (hierarchy in short), as illustrated e.g. in Fig. 5 (31). That is, a parent concept (which is an indexing concept by itself) is defined for each indexing concept. For example, in Fig. 5 (33) "Countries" is the parent of (34) "Latin America". One or several concepts that are defined as roots of the hierarchy may not have a parent node. For example, in Fig. 3 (32) "All" is the root. Usually, each concept in the hierarchy has only one parent giving the hierarchy the form of a tree data structure (or several trees in case of several roots). The described functionality can accommodate also situations where some nodes have more then one parent. The terms concept and node are used interchangeably to denote an indexing concept within the hierarchy. Those versed in the art will readily appreciate that the structure of the indexing concept hierarchy is substantially predetermined.
Those versed in the art will readily appreciate that the predetermined structure does not necessarily mean that the indexing categories may not be subject to modification.
For example, the hierarchy may include an indexing concept
"Companies", such that some of its specific daughters are not predetermined. The system may include a mechanism to recognize dynamically that a new name appearing in a document is a company, and define that name as an indexing concept for the document which is a daughter of the node "Companies".
By another embodiment, notwithstanding the predetermined nature of the hierarchy, the system includes a filtering mechanism which in response to filtering criterion decides whether an indexing concept is displayed, or not, in the hierarchy. For example, the filtering criterion may filter out concepts associated with a small number of documents, below a certain threshold, or concepts that are associated only with documents whose score for a search query, whose results list constitutes the document set to be displayed, is low.
The Concept Hierarchical Display
In an embodiment of the invention, a system displays the concept hierarchy (in a hierarchical display) by any visualization mechanism that is suitable for displaying a hierarchical structure. The most typical display form for a hierarchy is a tree display, as in Fig. 5 (37), in which each node of the tree corresponds to one concept in the. Clicking on a node (or on a special sign, such as "+ that is attached to the node) leads to displaying or hiding its daughters. Other hierarchical display mechanisms may show one level of siblings in the hierarchy at a time, by showing a list of elements, each represented by some symbol or icon, where clicking on an element leads to displaying its siblings, while some other option enables getting back (up) in the hierarchy (for example, the "My Computer" icon in the Windows-98/NT system available from Microsoft Inc, USA). For the purpose of the invention, any hierarchical display mechanism can be used to display the hierarchy of indexing concept, where user interaction with the display mechanism controls the display of different portions of the hierarchy. Another non-limiting example of hierarchical display is a chart, e.g. a pie chart.
Hierarchical Document Set Display
This subsection defines a hierarchical display of a presented document set (containing documents indexed by indexing concepts). The hierarchical display serves as a "table of contents" for the document set, which facilitates navigating and browsing of document sets. The scheme of a hierarchical document set display is available in previous systems, but the invention includes some specific enhancements to this scheme, as noted below.
The hierarchical document set display is based on the concept hierarchical display, and can be realized by any mechanism for displaying hierarchies, just like the concept hierarchical display discussed above. For example, in Fig. 5 (37) is a hierarchical document set display in tree form. In addition to the predetermined hierarchy of concepts (as explained above) a set of documents, which is a subset of the currently presented document set, is associated with each concept (node) in the hierarchy. In Fig. 5, a set of documents is associated with the node (39) "Countries". The associated document set for a concept in the hierarchy (the document set of the node) contains all documents that are indexed by that concept. In certain embodiments of the invention, the associated document set for a concept is defined to include all documents associated with by any of its decedents in the hierarchy. For example, the document set of the concept "Countries" includes all documents indexed by any country or geographical region, assuming that these concepts are all descendents of the concept "Countries" in the hierarchical display.
It is simple to compute the document set that is associated with a given node in the hierarchical display. As a non-limiting example, such computation may scan all documents in the displayed document set and check for each of them if it is associated with the given concept. A hierarchical document display thus includes a display of the concept hierarchy (as described above), augmented with some information at each concept node about the document set associated with that node. The information about the associated document set may include, by one embodiment, one or more of the following items:
1 . The number of documents in the associated set. In Fig. 5 (40) there are
13 documents in the set associated with "Latin America".
2. The percentage (proportion) of documents associated with the concept relative to the number of documents associated with its parent in the hierarchy. In Fig. 3 (41) 7% of the documents in the set associated with
"Latin America" relate to "Argentina" (note that a 0% number represents a small positive percentage that was rounded to 0).
3. Some key information about prominent topics described within documents of the document set, such as most frequent or prominent key terms within the documents of the set, and/or the list of all or some of the indexing concepts for the documents.
It should be noted that the nature and form of presenting the specified types of information (by this particular example number of documents, percentage and prominent topics) is only an example and accordingly other types of information may be presented in addition or instead the specified items. Likewise, and as will be explained in greater detail below the concepts and their associated information is not limited to a specific form of graphical and or textual representation. Reverting now to the specified types, these or other types of information may be presented either textually or graphically. In Fig. 5 (37) is a tree display of the hierarchy with associated information about the document set of each node, containing number of documents and percentage relative to the parent node document set. In particular, since the hierarchical display of indexing concept may include numerical data, such as numbers and proportions, mechanisms for displaying quantitative information may be used for the display. For example, a pie (or bar) chart can be used to display several sibling nodes (daughters of a common parent). In Fig. 6 (44) is a pie representing the daughter nodes of "Countries". Each pie slice corresponds to one concept and its size indicates the proportion of its associated document set relative to the parent node document set. The quantitative graphical display mechanism may be interactive, in a similar manner to interactive tree presentation of the concept hierarchy. For example, double clicking on a pie slice may lead to displaying the pie of the daughters of the selected node. For example, double clicking on the slice in Fig. 4 (45), corresponding to "Latin America", leads to the display in Fig. 7 (47), a pie presenting the daughters of "Latin America".
The displayed daughters of a node may be sorted alphabetically, or by some characterizing quantitative information, in particular by the size of the associated document set for each daughter.
In accordance with the invention, different display mechanisms are provided. According to the invention, several different display mechanisms may be used interchangeably within a system for the hierarchical document set display, letting the user switch from one to another while maintaining the position within the hierarchy. For example, a system may combine both a pie chart display and a tree display. When viewing the tree display with a certain node selected, and switching to the pie chart display, the system will present the pie that corresponds to the daughters of the selected node.
The graphical display may present further information about the documents in the associated document set, such as their titles, meta-data elements, document summaries or the full text of the document. For example, Fig. 5 (42) is the list of titles for the documents associated with the node "Latin America". Fig. 10 (48) is a summary of a selected document in the document list. A display of the full text of a document is presented in Fig. 8 (52). (54) is a list of indexing concepts for the document. Optionally, concepts in the hierarchical display to which no documents are attached may be omitted from the display. For example, in Fig. 5 no documents are associated with the indexing concept (36) "Bahamas" in the concept hierarchy, thus in the hierarchical indexing concept display, this concept does not
5 appear as a daughter of (40) "Latin America".
In a more generalized embodiment, the concepts in the hierarchical display are being subject to filtering criterion in order to determine whether or not they will be displayed in said hierarchy. A typical, yet not exclusive, example of filtering criterion concern which folders in deeper levels of the hierarchy tree will
I o be displayed. The necessity of this criterion stems from the fact that the display area allocated to the hierarchy in the display screen may not be sufficient to accommodate the entire hierarchy, and accordingly only portion thereof is displayed, e.g. few levels, and only in response to user selection further levels are displayed (instead of the previously higher levels). For example: if the top level
15 and only some of its daughters are shown, with say a "..." symbol indicating that there are more daughters, that can be displayed if the user explicitly opens the parent node- (as, say in AltaVista ). More advanced filtering criterion may rank folders (standing by this embodiment for nodes) to be presented according to, say the number of documents in it and the quality of their match to the current
20 "query" (query means the entire sequence of operations that led to the display of the current results). Thus, folders having high rank may be displayed in the limited display zone instead of other folders having lower rank, notwithstanding the fact that the higher ranked folders reside in a lower level in the hierarchy as compared to the lower ranked folders. Obviously, the user can display the rest of
25 the folders (which are currently not displayed due to their low rank) by, say, clicking the specified "..." symbol.
According to an embodiment of the invention, an "Others" node is added to each list of siblings having a common parent. By this embodiment, the documents associated with the "Others" node are those associated with the parent
30 node but not with any of its daughters in the concept hierarchy. For example, an "Others" node that is a daughter of the node "Europe" will be associated with all documents indexed by "Europe" but not by any particular European country.
The hierarchical indexing concept display may be restricted to a particular sub-part of the hierarchy, determined by some mechanism, rather then presenting 5 the full hierarchy. For example, it is possible to present the hierarchical indexing concept display using only the "Countries" sub-tree of the hierarchy. This non-limiting modification also falls in the definition of predetermined hierarchical indexing concept display.
l o Dynamic Hierarchical Document Indexing Concept Set Display
The hierarchical indexing concept set display serves as a "table of contents" for the document set and can be used as a method for displaying document sets to
15 the user. However, the hierarchical indexing concept set display is limited because it has a static structure, which is equivalent to the structure of the concept hierarchy. For example, when presenting a large document set by the hierarchical indexing concept display, one of the leaves of the tree may be the country "France", as in Fig. 1 1 (55), containing 45 documents. No further
20 organization is given for these 45 documents, since "France" is a leaf in the concept hierarchy. This section defines a novel mechanism provided by the invention for presenting dynamic "tables of contents" displays for document sets, enabling the user to dynamically modify and refine the document display whilst maintaining the predetermined hierarchical indexing concept display. This
25 mechanism is called the dynamic hierarchical document set display (dynamic display). The dynamic display is by itself hierarchical utilizing the specified predetermined hierarchy of categories, and thus provides all the functionality of the hierarchical document set display, as described above..
30 In accordance with one embodiment, at the initial stage of the dynamic display, a document set is presented in some manner, possibly by the (static) hierarchical indexing concept display. The dynamic display is created by a series of organize by" operations, each specified by two definitions:
1. Defining a document subset (or set), to be organized (constituting
"organized" document subset) by the "organize by" operation. For a hierarchical presentation, selecting the document set may, preferably, correspond to selecting a node in the hierarchical document set display. For example, selecting the node "France" in Fig. 1 1 (55) defines the document set associated with this node as the subset to be organized. This subset is termed the organized document subset. When the selected subset corresponds to a node in the display, that node is termed the organized node. The selection of the "organized" document subset is performed on the basis of information displayed in the hierarchy, e.g. defining an indexing concept in the hierarchy as an organized by concept and rendering the documents associated therewith as the specified "organized" document subset.
2. Defining a node of the concept hierarchy to serve as the root of the sub-tree by which the document subset will be organized. This node (or corresponding sub-tree) is termed the organizing node
(sub-tree). For example, the node "Companies" may be selected as an organizing node (57 in Fig. 9), to organize the document subset associated with the node "France".
The effect of applying the "organize by" operation is to provide an
"organizing" hierarchical indexing concept display (as defined above) for the organized document subset, which is restricted to the sub-hierarchy under the organizing node. In the above example, the documents associated with the node "France" will be displayed in a hierarchical indexing concept display that is restricted to the sub-tree of the concept hierarchy rooted by the node "Companies" (having all companies as daughters). This display appears in Fig. 9 (60), where the "Companies" node (60) is the root of the hierarchical display for the "France" document set, and (58) are the daughters of (60). This would have the effect of presenting which companies appear as indexing concepts in documents that are also indexed by "France", along with quantitative information about the documents indexed by each company. For example, there are 27 documents indexed by both "France" and "Boeing". The indexing concept Boeing (61) signifies, due to its position in the hierarchy, the path from the root to wit: All->countries->West Europe->France->Companies-> Boeing. Put differently, indexing concept (61) is associated with the documents indexed by both "France" (a country in west Europe) and "Boeing (company). The pertinent information that is associated with this concept is 27 (No. of documents) and 60% (standing for 27 documents out of the 45 associated with indexing concept (60) - Boeing. Accordingly, any concept in the indexing concept hierarchy display is associated with respective sub set of documents from among the organized document subset. Obviously a document may be associated with more than one concept of the organizing hierarchical display. A "respective" subset of documents encompasses also the special situation in which a concept is associated with no documents.
The "organize by" operation may be interpreted as a recursive application of the hierarchical indexing concept display, as its effect is to provide a new hierarchical display for a node within a previously displayed hierarchy. However, the hierarchical display is maintained predetermined considering that in the modified presentation, substantially, the same concepts are employed, which makes it easier for the user to follow "well known" and familiar concepts, even after applying the "organizing" operation.
As a special case, the organizing node can be the root of the concept hierarchy, in which case the organized document subset will be displayed by a hierarchical indexing concept set display that corresponds to the entire concept hierarchy. A system may apply only this special case (always organizing by the full hierarchy considering the root as the organizing node), in which case it is necessary to define only the organized node in order to apply an "organize by" operation. Furthermore, a system may implement the hierarchical document display such that at each point of time the user view is focused only on one node of the tree. In this case, applying the "organize by" operation implies implicitly that the organized node is the currently displayed node, saving the need of an explicit definition of the organized node. If desired, by a specific embodiment, the default definition of organizing concept as the root node and the organized by concept as the currently displayed node may be realized by a single user operation say, for example, clicking on a predetermined icon.
As a particular (but not the only) mechanism of operation, in the case where the organized document subset corresponds to a specific selected node in a hierarchical display, the hierarchical display of the organized subset is displayed as a new, dynamically created, daughter (or daughters) of the selected organized node. In the example above, the node "Companies" in Fig. 7 (60) is added dynamically as a new daughter node of (59) the node "France", modifying the hierarchical display that was presented to the user just before applying the "organize by" operation. Several variations of the method may be implemented, in which a new daughter node either replaces or is added as a sibling to the previously existing daughters of the organized node. Notwithstanding the modification, the predetermined hierarchy of concepts is maintained in the sense that the category "company" is already known to the user (see e.g. Fig. 3) before applying the specified "organize by" operation. Once a modified hierarchical display has been created by applying an
"organize by" operation, as described above, any part of the new display may be subject to further "organize by" operations. In particular, a node that was added to the hierarchy in a previous "organize by" operation may be selected as the organized subset in a later operation. Subsequent "organize by" operations on the modified dynamic display may be applied as requested by the user. In Fig 13 (69) the node "Boeing" which has been created by a previous "organize by" operation (as in Fig. 9) is later selected as an organized node, where the organizing node is (65) "Activities". Thus, in this example, a node "Activities" (70) is dynamically added to the display, and its daughters (64) (signifying documents indexed by both "France" and "Boeing" and by some activity) are associated, each, with information that pertains to these documents. For example, there are 19 documents indexed by "France" (67) "Boeing" (69) and "Agreement" (71). The specified organized by operation may be applied recursively (repeated) as many time as required each time in respect of new selected "organized by" and "organizing" concepts.
The basic form of the "organize by" operation may consist selecting one node in a hierarchical display as the organized node, and one node in the concept hierarchy as the organizing node. The following paragraph describes extensions to the basic form.
Multiple selection for simultaneous operation
Multiple selection of organizing nodes within a single "organize by" operation has the effect by one embodiment of adding all the selected nodes as daughters of the organized node. For example, the organizing node "France" may be organized by, "Companies" and "Activities", which means that all the documents associated with the indexing concept France will be organized by the indexing concept "Companies" and separately by the indexing concept "Activities". If desired, the nodes "Companies" and "Activities" are added as daughters to "France".
Multiple selection of organized nodes has the effect of applying the "organize by" operation simultaneously to all selected nodes. For example, applying an "organize by" operation with the same organizing node to both nodes "France" and "Spain". The net effect of selecting more than one organized nodes is that each node is associated with its respective organized by subset of documents and then some operator or operators is (are) applied to the specified subsets so as to constitute resulting organized subset of documents that is then subject to the organizing operation. In the latter example there is a first subset of documents associated with France, a second subset of documents associated with Spain. By this particular example the operator that is applied to the subsets is OR giving rise to a document subset that includes documents that pertain only to Spain, only to France or to both. This resulting subset of documents is than being subject to the organizing operation by one or more organizing concepts.
In accordance with an embodiment of the invention, the set of documents may be obtained by applying a search query to say conventional search engine that operates similarly to as AltaVista and display the resulting set along with the hierarchical display of the invention.
Thus, for example, Figs. 14 to 21 showing a succession of screen results by applying the method in accordance with one embodiment of the invention. Fig. 14 illustrates a predetermined indexing concept hierarchy (140) that includes 1 1 ,000 documents (142) that constitute the document set and are broken down by the hierarchy concepts.
Applying a query (e.g. pagers 143) results in 318 documents (see 151 in Fig. 15) that are broken down by the concept hierarchy. The list of documents is displayed (152), and, by this example, the first four documents are shown in the first page. The query itself ("pagers") is automatically assigned to categories in the hierarchy as if it were a document. The resulting category is illustrated in the Related category" field (153), to wit: Telecom All > Applications > Messaging > Paging. All the categories, except from "Paging" are shown in the hierarchical presentation (151, 154, and 155). Paging is a sub category of Applications and can be shown if the Browse section of the screen is enlarged, or if the user decides to show it by, say, clicking a specified symbol (as described above).
Clicking the Paging will render the latter organized by category and the All (i.e. the root ) organizing category. The net effect is that the 122 documents that are associated with Paging are now broken down by the entire hierarchical tree, as shown in Fig. 16. The "results for" field shows that the display corresponds to the query "paging" (which by this example matches one of the categories). The four documents shown in the search section are the first 4 out of 122 documents that meet the search. Fig. 17 is the same as Fig. 16 except that now the documents that are associated with sub-category Telecom Service Companies (171) are shown. This may be achieved by simply clicking the relevant category in the hierarchy (by this particular example Telecom Service Companies - not shown in the hierarchy Fig. 17) and the documents associated therewith are shown. The documents that are shown obviously relate to "paging" and telecom service companies.
Fig. 18 illustrates yet another degree of detail wherein only documents that pertain to Sky Tel 181 (which forms sub-category of the specified Telecom Service Companies - not shown in the hierarchy of Fig. 18) are shown.
Now, Skytel constitutes the organized by concepts and the documents associated therewith constitute the organized document subset. Next, clicking the Zoom In symbol (182) will render the Telecom All root category (183) the organizing category and the resulting hierarchical display is depicted in Fig. 19.
There are 12 documents (191) broken down by the predetermined categories. Thus, for example, 8 documents are associated with the category Business (192). Categories that have no documents associated therewith are not shown. Incidentally, the information that pertains to the sub documents associated with each category is simply the number of documents ( 12 and 8 in the latter example). The 12 documents concern both Skytel and paging. Four out of the 12 pertinent documents are shown in the Search section of the screen (193).
Considering now that only the documents from among the specified 12 documents that concern product companies (the "products" node) are of interest, the user simply clicks the products category (200) in Fig. 20 and the 8 relevant documents are shown at the search section of the screen (201) If, from among the specified 8 documents only those that concern Motorola are of interest the user simply clicks the Motorola category (210) in Fig. 21 and in response thereto the pertinent 3 documents are shown.
Selecting text terms and segments for focused reading
In addition to the dynamic display, which provides a "table of contents" style display for document sets, the invention provides in accordance with another aspect thereof, new mechanisms for presenting parts of or all of the text of a document in a dynamic and effective manner. These mechanisms direct the attention of the user to relevant parts of the document and enables quick focusing on these parts. For example, these relevant parts might be text segments that contain relevant information for the user or can help deciding about the relevance of the document. The decision of which parts of the document should be in focus is dynamic, and may be changed according to user guidance or to the context in which the document is being displayed.
There are two typical ways in an embodiment of the invention for focusing the user attention on particular parts or pieces of information in the document. The first is by highlighting the parts of the text that should be in focus (based on important triggering terms) , and the second is by creating a summary for the document that contains the parts in focus (based on important triggering terms).
According to the invention, the parts of the document which should be highlighted or be included in a summary are determined according to a set of (one or more) indexing concepts, among the indexing concepts of the document, that are considered to be in focus at a certain stage of user interaction with the system. These indexing concepts are called focus indexing concepts.
According to the invention, the highlighting and summarization for a given focus indexing concept is determined by tbe important triggering terms for that concept. The triggering terms for a concept are the occurrences in the document of all terms which entail the attachment (or classification) of the concept to the document. Highlighting and an extracted summary will include the important triggering terms for the concept, or short segments of text that are considered to be important. The degree of importance of terms and segments may be quantified by some scoring mechanism, where the degree of importance of the terms in a segment is factor in determining the degree of the segment importance. The invention provides dynamic methods for determining (quantifying) which triggering terms and segments are important in a given context of the user interaction with the system that displays the documents.
It should be noted that in the context of obtaining a summary according to one embodiment of the invention, the quantifying step assigns the same degree of importance to all triggering terms. The latter option does not apply to the aspect which concerns emphasizing important triggering terms. Put differently, insofar as emphasizing important triggering terms, not all the triggering terms are ranked with the same degree of importance.
The important triggering terms and segments are presented to the user, either in a form of an extracted summary, which contains the important terms and/or segments, or by highlighting the important terms within the display of the full document, or by some combination of the two methods. When using the term important, one refers to the case where the degree of importance of triggering terms and segments can be quantified and the display is restricted those with the highest importance. The amount of terms or segments to be included in the display is determined by some mechanism, such as a threshold on the degree of importance or on the number of items to be included. This ranking mechanism by degree of importance is necessary when there are many important terms or segments and it is desired to limit their display in order to achieve optimal focus of attention by the user. Fig. 10 (48) displays a summary of a document, in which the important terms are highlighted. (The important terms were determined relative to the highlighted indexing concepts "Latin America" (50) and "Lockheed Martin" (51) which are in the focus of interest to the user, as explained below). The summary includes segments of the text that contain the important terms. Fig. 8 (52) presents a full display of a document text, in which important terms (relative to the indexing concept (54), see below) are highlighted. While the general scheme of making some form of highlighting triggering terms in a document for display is available in previous systems, the invention, by this aspect, concerns selecting important terms, described below.
Selecting the important triggering terms within a text classification system that quantifies the importance of triggering terms
One non-limiting method in the context of the invention refers to selecting the important triggering terms in a document with respect to an indexing concept that is determined to be in focus (of interest) at a certain stage of the user interaction with the system. For example, in Fig. 8 the indexing concept "Product specifications/capabilities" (54) is selected to be in focus. This part of the invention refers to the case where the indexing concept was assigned to the document by some text classification method, as described above. Such a method classifies the document to a certain indexing concept based on words, terms or their combinations that appear in the document. It is assumed that it is possible to trace within the classification system which words or terms in the document entailed the classification to the given indexing concept. Optionally, it is possible to quantify within the system the relative contribution of each term to the classification of the document to the indexing concept. In certain embodiments, a trainable text classification method in which the terms and the degree to which they entail classification to the indexing concept are learned from training documents, for which it is previously known whether they belong to the indexing concept or not.
As non-limiting examples for possibilities for determining triggering terms, consider the following trainable text classification methods.
• D. Lewis. 1992, An evaluation of phrasal and clustered representations on a text categorization problem, in Proc. of the 15th Int. ACM-SIGIR Conference on Information Retrieval, pages 37 — 50. This method applies a Bayesian learning scheme for text classification. For a given category, the method computes (during the training phase) certain weights for terms (words or phrases) in the text, with respect to the category. The score of the category for a particular document is computed as a function (usually some sort of a normalized sum) of the weights of the terms that appear in the document. When computing the category score for a document, it is possible to trace the relative contribution of each term in the document to the accumulative score. Thus, triggering terms in this method will be those terms that provided the highest contribution to the accumulative score of the document.
• E. Wiener and J. Pedersen and A. Weigend, 1995, A neural network approach to topic spotting, in Symposium on Document Analysis and Information Retrieval, pages 317—332.
• W.W. Cohen, Text categorization and relational learning, in Machine Learning Journal, 1995, pages 124 — 132. This method learns classification rules for each category, that consist of words or combination of words. Each "firing" of a rule, that is, the occurrence of the word or word combination of the rule in the document, entails the classification of the document to the category. Thus, in this method, the words and word combinations in the rules that matched in the document will be considered as triggering terms in the document.
According to the invention, the important triggering terms, to be included in a summary or to be highlighted, are those term occurrences that signivicantly contributed to the classification of the document to the focus indexing concept. In Fig. 8 the triggering terms for the indexing concept "Product Specification/Capabilities" (54) are highlighted within the text (52) of the document. Furthermore, when the relative contribution of triggering terms to classification can be determined (traced) then their degree of importance would be proportional to this degree of relative contribution to classification.
It should be noted that the method described above for selecting the important triggering terms for an indexing concept in focus could be combined with simpler methods for identifying the triggering terms for an indexing concept (such methods are not part of the invention). For example, when the indexing concept is identical to a term or name that appears explicitly in the document text then the important term is simply the occurrence of the indexing concept in the text. (E.g. when the indexing concept is "France" and the important terms are simply the explicit occurrences of the term "France" in the text). Another example is a topical indexing concept that is identified in the text by a manually defined query. In this case the triggering terms are simply all the terms that appear in the query (similar to document search systems that highlight matching query terms in the retrieved documents). Those versed in the art will readily appreciate that the invention is not bound to the specified specific techniques for determining important triggering terms.
Multiple focus indexing concepts
Another method within the invention refers to selecting important terms and segments for display by selecting dynamically several focus indexing concepts. One way of selecting the focus indexing concepts is by letting the user select them interactively from the list of all indexing concepts of the document. In Fig. 10 the user have selected (50) "Latin America" and (51) "Lockheed Martin" as focus indexing concepts. Consequently, the selected important terms, which are highlighted in the document text (48), are the triggering terms for both (50) and (51). Other mechanisms for selecting the set of focus indexing concepts may be applied as well, such as the method described next. According to the invention, the important triggering terms and segments are selected from the important triggering terms and segments of each one of the focus indexing concepts, applying some procedure that combines them and reevaluates their degree of importance with respect to the complete set of focus indexing concepts. For example, the degree of importance of a triggering term or segment with respect to the complete set of focus indexing concepts may be defined (referred to also as quantified) by its maximal (or minimal) degree of importance for any of the individual indexing concepts (applying a disjunctive (or conjunctive) reasoning criterion), or by computing some averaging function of the individual importance degrees. According to the invention, the display of important terms or segments for the complete set of focus indexing concepts may distinguish between terms that were selected originally for the different indexing concepts that compose the set. For example, a different color is attributed to each indexing concept, and the important terms related to this concept are highlighted by the corresponding color. In Fig. 10 the indexing concept "LATIN AMERICA" (50) is highlighted with a blue background and "LOCKHEED MARTIN" (51) is highlighted with a pink background (blue appear darker than pink in the black and white printing). Accordingly, the triggering terms for both concepts ("Brazil" and "Amazon" for "LATIN AMERICA" and "Lockheed Martin" for the indexing concept "LOCKHEED MARTIN") are highlighted in the corresponding colors in the document text (48).
Default focus indexing concepts
Another method within the invention refers to the selection of default focus indexing concepts, to be used automatically as the focus indexing concepts when the document is presented to the user. According to the invention, the default focus indexing concepts are selected according to the selection conditions that were applied in the process that led to the display of the document. In particular, when the document is displayed as a result of a search query that contains indexing concepts then the indexing concepts contained in the query become the default focus indexing concepts. A particular setting for this method occurs when the document is selected for display within the hierarchical document set display or within the dynamic hierarchical document set display. In Fig. 12 a document was selected for display from the node (document subset) (61) "ARGENTINA". Accordingly, the default focus indexing concept is (62) "ARGENTINA" and the triggering term "Argentina" (63) is highlighted within the document text.
In this setting of a hierarchical display a document is selected for display from the document set that is associated with a certain node in the hierarchy. The documents in this set satisfy a logical condition that is equivalent to a search query which is a conjunction (logical AND) of all indexing concepts in the path from the root of the displayed hierarchy to the selected node. Thus, according to the invention, the default focus indexing concepts are the concepts along this path. Recall that parts of this path may correspond to paths within the concept hierarchy and parts of the path might be created dynamically within the dynamic hierarchical document set display. For example, in Fig. 13 the documents associated with the node "Agreement" (71) satisfy a logical AND condition for all indexing concepts on the displayed path from the root of the tree to this node. Optionally, for a pair of concepts x and y in the set of default focus indexing concepts, such that x is an ancestor of y in the concept hierarchy, it is possible to exclude x from the set of default focus indexing concepts. In the example of Fig. 11. it is possible to exclude "West Europe" from the set of default focus indexing concepts since it is likely that the focus of interest for the user is concerned in particular with "France", which is a daughter of "West Europe" in the concept hierarchy.
In some systems, the method of viewing document sets that are attached to concept nodes in a (possibly dynamic) hierarchical document set display may be combined with the use of explicit search queries issued by the user. In this case, if the document set attached to a concept node is restricted by an additional condition supplied in an explicit search query, then the default focus indexing concepts will be a combination of the concepts of the path, as described above, and the concepts that are included in the query.
Alphabetical characters and Roman symbols are designated in the description below for convenience only and do not necessarily imply a particular order of the method steps.
The present invention has been described with a certain degree of particularity, but those versed in the art will readily appreciate that various alterations and modifications will be carried out without departing from the scope of the following Claims. Thus, by way of example, whereas, typically, the organized document subset is determined by defining one (or more) of the concepts in the hierarchy as "organized by" concept, thereby rendering the subset of documents associated therewith "organized document subset" this is not necessarily always the case. Thus, according to a more generalized embodiment any determination of subset of documents (organized document subset) by utilizing the so displayed hierarchy (i.e. implemented using information derived from the so displayed hierarchy) is embraced by the invention.

Claims

CLAIMS:
1. A method for dynamically presenting set of documents to users , comprising:
(a) providing a predetermined hierarchy of indexing concepts; (b) providing a set of documents;
(c) providing hierarchical display of the indexing concepts such that each indexing concept in the hierarchy is associated with a sub-set of documents from among said set of documents; (d) applying steps that include the following (i) to (iii) , as many times as required:
(i) determining a subset of documents by utilizing the hierarchical display, thereby rendering it organized document subset;
(ii) defining at least one indexing concept in the hierarchical display so as to constitute a respective organizing concept; and
(iii) providing at least one organizing hierarchical display of indexing concepts, wherein the root of said at least one organizing hierarchical display being the respective at least one organizing concept; each concept in said at least one organizing hierarchical display is associated with a respective subset of documents, from among said organized document subset.
2. The method of Claim 1, wherein said "organizing" concept being the root concept of said hierarchical display.
3. The method according to claim 9, wherein said "organized by" concept is the concept on which the hierarchical display is focused at a given time.
4. The method according to anyone of the preceding Claims, wherein said steps (d)(i) and (d)(ii) are obtained by activating a single command.
5. The method according to anyone of the preceding Claims, wherein said hierarchical display is in a form of tree.
6. The method according to anyone of Claims 1 to 4, wherein said hierarchical display is in a form of a chart.
7. The method according to Claim 6, wherein said chart and tree representations are interchangeable whilst maintaining the position in the hierarchical display.
8. The method according to anyone of the preceding Claims, wherein concepts having no documents associated therewith are not displayed.
9. The method according to anyone of the preceding Claims, wherein said step (d)(i) includes: defining at least one indexing concept in the hierarchical display so as to constitute a respective "organized by" concept; the documents associated with said organized concept constitute organized document subset.
10. The method according to anyone of the preceding Claims, further including "others" concept in at least one position in said hierarchical display.
11. The method according to anyone of the preceding Claims, further comprising the step of: displaying at least one desired document or portion thereof from among the documents.
12. The method according to Claim 11, further comprising the step of: displaying said document with emphasis on important triggering terms that correspond to default focus indexing concepts.
13. The method according to Claim 12, further comprising the step of: obtaining a summary based on said important triggering terms.
14. The method according to Claim 12, wherein said emphasis being highlighting the important triggering terms in a predetermined color.
15. The method according to anyone of the preceding Claims, further comprising the step of applying a filtering criterion in said steps (c) in order to determine the concepts that will be in said hierarchical display.
16. The method according to anyone of the preceding Claims, further comprising the step of applying a filtering criterion in said steps (d)(iii) in order to determine the concepts that will be in said organizing hierarchical display.
17. The method according to anyone of the preceding Claims, comprising the following preliminary step of: applying a search query to a search engine and obtaining as a result said set of documents, stipulated in said step (b).
18. The method according to Claim 17. further comprising the step of displaying said set of documents in a displaying format of said search engine.
19. A method for presenting set of documents to users comprising
(a) providing indexing concepts and a set of documents ; the set of documents are associated with concepts in accordance with triggering terms in said documents;
(b) selecting a document from said set;
(c) selecting at least one concept associated with said document; (d) quantifying the importance of the triggering terms of said at least one concept and in response thereto, determining the important triggering terms; and (e) emphasizing the important triggering terms that correspond to said at least one concept.
20. The method according to Claim 19, wherein said emphases being highlighting the important triggering terms in a color that corresponds to
5 respective indexing concept.
21. The method according to Claim 19, further comprising the step of: obtaining a summary based on said important triggering terms.
22. A method for presenting set of documents to users comprising
(a) providing indexing concepts and a set of documents ; 10 the set of documents are associated with concepts in accordance with triggering terms in said documents;
(b) selecting a document from said set;
(c) selecting at least one concept associated with said document; 15 (d) quantifying the importance of the triggering terms of said at least one concept and in response thereto, determining the important triggering terms; and
(e) obtaining a summary based on said important triggering terms.
20
23. The method according to Claim 22, wherein said quantifying step renders all the triggering terms, as important triggering terms.
24. A method for dynamically presenting set of documents to users, comprising (a) providing a predetermined hierarchy of indexing concepts; 25 (b) providing a set of documents;
(c) providing hierarchical display of the indexing concepts; the indexing concepts are associated with the set of documents (d) determining a subset of documents by utilizing the hierarchical display, thereby rendering it organized document subset;
(e) defining at least one indexing concept in the hierarchical display so 5 as to constitute a respective "organizing" concept; and
(f) providing at least one organizing hierarchical display of indexing concepts, wherein the root of said at least one organizing hierarchical display being the respective at least one organizing ιo concept, wherein concepts in said organizing hierarchical display are associated with the organized document subset;
(g) repeating steps (d) to (f), as many times as required.
15
25. The method according to Claim 24, wherein said step (d) includes: defining at least one indexing concept in the hierarchy so as to constitute a respective "organized by" concept; the documents associated with said organized concept constitute organized document subset;
26.The method according to Claim 25, comprising the following preliminary step
20 of: applying a search query to a search engine and obtaining as a result said set of documents, stipulated in said step (b).
27. The method according to Claim 26, further comprising the step of displaying said set of documents in a displaying format of said search engine.
28. A system that includes a processor associated with a memory and display 25 for dynamically presenting set of documents to users , comprising:
(a) the memory is configured to store of a predetermined hierarchy of indexing concepts;
(b) the memory is configured for store a set of documents (c) the processor is configured to provide hierarchical display of the indexing concepts such that each indexing concept in the hierarchy is associated with a sub-set of documents from among said set of documents;
(d) the processor is configured to apply steps that include the following (i) to (iii) , as many times as required:
(i) determining a subset of documents by utilizing the hierarchy of display, thereby rendering it organized document subset;
(ii) defining at least one indexing concept in the hierarchy of display so as to constitute a respective "organizing" concept; and
(iii) providing at least one organizing hierarchical display of indexing concepts, wherein the root of said at least one organizing hierarchical display being the respective at least one organizing concept; each concept in said at least one organizing hierarchical display is associated with a respective subset of documents, from among said organized document subset.
29. A system that includes a processor associated with a memory and display for presenting a set of documents to users comprising:
(a) the memory is configured to store indexing concepts and a set of documents ; the set of documents are associated with concepts in accordance with triggering terms in said documents;
(b) the processor is configured to select a document from said set;
(c) the processor is configured to select at least one concept associated with said document;
(d) the processor is configured to quantify the importance of the triggering terms of said at least one concept and in response thereto, determining the important triggering terms; and (e) the processor is configured to emphasize in the display the important triggering terms that correspond to said at least one concept.
30. A system that includes a processor associated with a memory and display for presenting set of documents to users comprising: (a) the memory is configured to store indexing concepts and a set of documents ; the set of documents are associated with concepts in accordance with triggering terms in said documents;
(b) the processor is configured to select a document from said set;
(c) the processor is configured to select at least one concept associated with said document;
(d) the processor is configured to quantify the importance of the triggering terms of said at least one concept and in response thereto, determining the important triggering terms; and
(e) the processor is configured to obtain a summary in said display based on said important triggering terms.
31. A system that includes a processor associated with a memory and display for dynamically presenting set of documents to users, comprising: (a) the memory is configured to store a predetermined hierarchy of indexing concepts; (b) the memory is configured to store a set of documents;
(c) the processor is configured to provide hierarchical display of the indexing concepts; the indexing concepts are associated with the set of documents ; the processor is configured to apply the following steps (d) to (f) as many times as required.
(d) determining a subset of documents by utilizing the hierarchical display, thereby rendering it organized document subset;
(e) defining at least one indexing concept in the hierarchical display so as to constitute a respective "organizing" concept; and
(f) providing at least one organizing hierarchical display of indexing concepts, wherein the root of said at least one organizing hierarchical display being the respective at least one organizing concept, wherein concepts in said organizing hierarchical display are associated with the organized document subset.
EP00907906A 1999-02-25 2000-02-25 Method and apparatus for dynamically displaying a set of documents organized by a hierarchy of indexing concepts Withdrawn EP1155377A1 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US12159699P 1999-02-25 1999-02-25
US121596P 1999-02-25
PCT/IL2000/000117 WO2000051024A1 (en) 1999-02-25 2000-02-25 Method and apparatus for dynamically displaying a set of documents organized by a hierarchy of indexing concepts

Publications (1)

Publication Number Publication Date
EP1155377A1 true EP1155377A1 (en) 2001-11-21

Family

ID=22397683

Family Applications (1)

Application Number Title Priority Date Filing Date
EP00907906A Withdrawn EP1155377A1 (en) 1999-02-25 2000-02-25 Method and apparatus for dynamically displaying a set of documents organized by a hierarchy of indexing concepts

Country Status (5)

Country Link
EP (1) EP1155377A1 (en)
AU (1) AU2936600A (en)
CA (1) CA2371244A1 (en)
IL (1) IL145049A0 (en)
WO (1) WO2000051024A1 (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004501421A (en) * 2000-03-27 2004-01-15 ドキュメンタム,インコーポレイティド Method and apparatus for generating metadata for documents
AU2002210882A1 (en) * 2000-10-17 2002-05-15 Focusengine Software Ltd. Integrating search, classification, scoring and ranking
NO20052215L (en) 2005-05-06 2006-11-07 Fast Search & Transfer Asa Procedure for determining contextual summary information of documents
US20090240687A1 (en) * 2006-07-27 2009-09-24 Thomas Eskebaek Method of Processing a Collection of Document Sources
NO325864B1 (en) 2006-11-07 2008-08-04 Fast Search & Transfer Asa Procedure for calculating summary information and a search engine to support and implement the procedure

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO0051024A1 *

Also Published As

Publication number Publication date
AU2936600A (en) 2000-09-14
CA2371244A1 (en) 2000-08-31
IL145049A0 (en) 2002-06-30
WO2000051024A1 (en) 2000-08-31

Similar Documents

Publication Publication Date Title
Carpineto et al. Exploiting the potential of concept lattices for information retrieval with CREDO.
US7496567B1 (en) System and method for document categorization
US5721897A (en) Browse by prompted keyword phrases with an improved user interface
US5598557A (en) Apparatus and method for retrieving and grouping images representing text files based on the relevance of key words extracted from a selected file to the text files
US5924090A (en) Method and apparatus for searching a database of records
US20030061209A1 (en) Computer user interface tool for navigation of data stored in directed graphs
US7523095B2 (en) System and method for generating refinement categories for a set of search results
US7216115B1 (en) Apparatus and method for displaying records responsive to a database query
US5787422A (en) Method and apparatus for information accesss employing overlapping clusters
US7130848B2 (en) Methods for document indexing and analysis
US20020049705A1 (en) Method for creating content oriented databases and content files
JP4241934B2 (en) Text processing and retrieval system and method
US8332439B2 (en) Automatically generating a hierarchy of terms
EP1024437A2 (en) Multi-modal information access
US20010039490A1 (en) System and method of analyzing and comparing entity documents
WO2007136560A2 (en) Method and system for information extraction and modeling
JP3643470B2 (en) Document search system and document search support method
US20090083312A1 (en) Document composition system and method
WO2000051024A1 (en) Method and apparatus for dynamically displaying a set of documents organized by a hierarchy of indexing concepts
Attardi et al. Theseus: categorization by context
WO2002037328A2 (en) Integrating search, classification, scoring and ranking
JPH09311805A (en) Document processing method and device therefor
Ozaku et al. Topic search for intelligent network news reader HISHO
JP2004348768A (en) Document retrieval method
EP1282844A2 (en) A method for creating content oriented databases and content files

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20010822

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE

17Q First examination report despatched

Effective date: 20011204

RAP1 Party data changed (applicant data changed or rights of an application transferred)

Owner name: LINGOMOTORS, INC.

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20020615