EP1282844A2 - Verfahren zur erzeugung von inhaltsorientierten datenbanken und inhaltsdateien - Google Patents

Verfahren zur erzeugung von inhaltsorientierten datenbanken und inhaltsdateien

Info

Publication number
EP1282844A2
EP1282844A2 EP01923952A EP01923952A EP1282844A2 EP 1282844 A2 EP1282844 A2 EP 1282844A2 EP 01923952 A EP01923952 A EP 01923952A EP 01923952 A EP01923952 A EP 01923952A EP 1282844 A2 EP1282844 A2 EP 1282844A2
Authority
EP
European Patent Office
Prior art keywords
content
knowledge
node
user
relevant
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP01923952A
Other languages
English (en)
French (fr)
Other versions
EP1282844A4 (de
Inventor
Irit Haviv-Segal
Amir Viner
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
E-Base Ltd
Original Assignee
E-Base Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by E-Base Ltd filed Critical E-Base Ltd
Publication of EP1282844A2 publication Critical patent/EP1282844A2/de
Publication of EP1282844A4 publication Critical patent/EP1282844A4/de
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • G06N5/025Extracting rules from data

Definitions

  • the present invention offers a new approach to knowledge management and the reorganization of professional electronic databases.
  • US patent 692,181 describes a system and method for generating reports from a computer database.
  • This invention enables the user to make decisions, without requiring the user to understand or interpret data itself.
  • This invention includes a method of creating data types and data relationships within a database, for generating reports for users, that includes the steps of: organizing the data within the database into columns of tables, providing a computer coupled to the database that executes an application program that generates the report, recording a business concept by the application program, recording an attribute associated with the business concept by the application program, displaying a list of the columns of tables in the database by the computer, recording a mapping of the attribute to one of the columns in the list, displaying a list of business indicators by the computer, recording a mapping of one of the business indicators to the column, joining the attribute table with the business indicator table so that the application program can use the additional table to create the report.
  • This system does not extract the important information from the different sources but rather gathers them together according to a specific terminology inserted by the user.
  • US patent 5,768,578 describes an improved information retrieval system user interface for retrieving information from a plurality of sources and for storing information source descriptions in a knowledge base.
  • the user interface includes a hypertext browser and a knowledge base browser/ editor.
  • the hypertext browser allows a user to browse an unstructured information space through the use of interactive hypertext links.
  • the knowledge base browser/ editor displays a directed graph representing a generalization taxonomy of the knowledge base, with the nodes representing concepts and edges representing relationships between concepts.
  • the system allows users to store information source descriptions in the knowledge base via graphical pointing means. By dragging an iconic representation of an information source from the hypertext browser to a node in the directed graph, the system will store an information source description object in the knowledge base.
  • the knowledge base browser/ editor is also used to browse the information source descriptions previously stored in the knowledge base.
  • the result of such browsing is an interactive list of information source descriptions which may be used to retrieve documents into the hypertext browser.
  • the system also allows for querying a structured information source and using query results to focus the hypertext browser on the most relevant unstructured data sources.
  • the new invention is aimed at integrating all four of them creating a combined solution.
  • the four markets are:
  • Forrester www.forrester.com
  • Forrester www.forrester.com
  • content sites those that use information and entertainment to attract or retain an audience, in order to sell advertising or subscriptions.
  • the market's leading players are:
  • FIGURE 1 shows an example of a search request using a content specific database from Lexis (www.Lexis.com). Lexis is an example of a popular existing search tool that uses professional on-line databases.
  • Knowledge management platforms The topic of knowledge management encompasses a myriad of concepts and appUcations having to do with the purposeful generation, diffusion, and appUcation of knowledge towards fulfilling an organization's objectives.
  • the market's leading players are: www.Microsoft.com, Lotus notes (http://www.lotus.com/home.nsf/welcome/km), www.kmsoftware.com, www.Adexperts.com, www.inova.com, www.equifax.com.
  • Current knowledge management platforms, such as those listed above, are intended to supply users with an integrated platform to organize their database in order to efficiently extract information. No known system represents an integrated solution that combines the technology with the specific terminology of a professional field. Therefore no system can sUce down actual content from a textual source, and automatically extract relevant pieces of information.
  • search engine is really a general class of programs, the term is often used to specifically describe systems like Alta Vista (www.alta vista.com) and Excite (www.excite.com) that enable users to search for documents on the World Wide Web and USENET newsgroups.
  • a search engine works by sending out a spider (an intelligent software agent, or program, that searches for information on the World Wide Web by locating new documents and new sites by following hypertext links from server to server) to fetch as many documents as possible.
  • a spider an intelligent software agent, or program
  • Another program called an indexer
  • Each search engine uses a proprietary algorithm to create its indices such that, ideally, only meaningful results are returned for each query.
  • the market's leading players are: Zapper (www.zapper.com), Copernic (www.copernic.com), Google (www.google.com) and Alta Vista ( ww .altavista .com) .
  • These search engines and other smart engines are constantly improving their abiUty to index online sites and utilize sophisticated spiders.
  • Google for example highlights search phrases within the search results page.
  • Zapper can "understand” the contextual environment of the terms from within the paragraph they were invoked from.
  • Copernic uses leading engines to aggregate all of their search results on one screen.
  • Octopus for example, cUps relevant data and content from various Web sites and pulls it all together in one dynamic browser page, caUed a "View.” Correlate enables a user to create visual Knowledge Maps by dragging & dropping
  • MS-Office documents emails, web content and other data.
  • the above tools significantly limit user research, owing in general to several setbacks.
  • the first setback is that navigation or information research is generally based on a links that are scattered within web sites. The main trigger for clicking a link is to advance to a different location that might enfold another aspect of the desired information. This kind of navigation is completely unstructured and relies heavily on intuition and luck.
  • the second setback is owing to the employment of search engines that enable the user to articulate a desired phrase and then check sequentiaUy each one of the search results. Users are usuaUy overwhelmed with an enormous numbers of results following a query, which they must filter and screen manually in order to retrieve the required pieces of information. This procedure often leaves the user empty handed, frustrated and exhausted.
  • the present invention solves many of the above-mentioned problems, and enables the execution of many of the above-mentioned limitations. This is achieved by providing a user-friendly platform for an automated construction of content-oriented databases, where knowledge is organized according to content, rather than according to its initial sources.
  • the invention includes an innovative platform for an automated reorganization of knowledge, where the system automatically filters, slices, maps and links fragments of the initial files onto a modular structure of knowledge.
  • the present invention organizes knowledge in a context driven way, so that it may be integrated within the corpus of any different professional field.
  • the present invention organizes only relevant paragraphs from different textual sources, according to sophisticated linguistic rules. This innovative procedure dramatically improves the quality and relevance of the paragraphs that are retrieved and decreases the initial amount of text the user had to go through.
  • the present invention offers substantial benefits over the traditional keyword based search procedures.
  • the new invention presents a platform for an automated construction of content-oriented databases, where knowledge is organized according to content, rather than according to its initial sources.
  • the invention includes an innovative platform for an automated reorganization of knowledge, where the system automatically filters, slices, maps and links fragments of the initial files onto a modular structure of knowledge. EventuaUy, the system virtually substitutes the initial source files by content-files, where aU of the relevant fragments from all relevant source-files are automatically integrated and hung onto the relevant node of a modular structure of knowledge.
  • the new invention offers to substitute the concept of "search” by the concept of "mapping,” such that instead of running Boolean searches, the user is guided to the relevant pieces of information via a map of links, which reflects the modular structure of the relevant field of knowledge. Because each node is linked to a content-file, the user is further guided to relevant fragments of information, with no need to engage in time consuming costly search-processes.
  • the new platform presents a novel integration of the foUowing new concepts: 1. Let's go backwards - While in current databases, the user proceeds from huge databases to concise pieces of information, the present invention guides the user from concise knowledge to more elaborate information.
  • a modular structure of knowledge (forms the basis of the database) - Unlike the content-neutral technological platforms for knowledge management, the present invention reorganizes the database onto a modular structure that reflects knowledge.
  • the modular structure of knowledge contains all the ideas in a specific field of knowledge and is arranged according to a hierarchy where the top nodes are more general and the lower ones are more specific. This structure is initiaUy created by an expert in a particular field, according to industry standards.
  • Content files - Instead of constructing the database according to source-files, the present invention creates content-files.
  • the content-file is a "multiple windows" window which integrates all of the relevant fragments of the source-files within one virtual file. Accordingly, all the paragraphs in the content file deal with the same idea, and are linked back to initial source file.
  • the database is content-oriented - Instead of the current content-neutral construction of databases, the present invention reconstructs a content-oriented database, where the information fragments are allocated according to the modular structure of knowledge.
  • the database is made up of links - While in current knowledge tools, the textual sources are generaUy part of the database, the database of the present invention contains only the links to textual sources. This feature enables the database to be Ught in size and allow the saving of CPU process time.
  • Virtual retrieval Instead of overloading the system with real time processes for each search query, the present invention achieves virtual retrieval of knowledge. This is achieved by doing "pre-analysis" of texts before they are uploaded to the system. As the user activates a node, the system just has to retrieve all the relevant paragraphs that are allocated to the nodes using pointers. This procedure creates a virtual content file that is instantly retrieved and is constantly updated.
  • the present invention enables an automated process that includes: filtering and mapping of fragments of knowledge onto the modular structure.
  • the invention further enables the automatic creation of an objective modular structure of knowledge that is based on the structure that was found in relevant sources.
  • An additional embodiment of the present invention enables integration of the present invention within old information searching formats, such that the searcher , when using conventional search tools, is instantly directed to the relevant content file.
  • a solution is provided for researchers, wherein prior classifications of experts in the field are utilized, in order to enable professional-level searches by non-experts.
  • a further embodiment of the present invention is an appUcation for content providers, enabling automated ideas aggregation, fragmentation and organization.
  • a further embodiment of the present invention is an appUcation for enterprise information portals, wherein personal and pubUc content is integrated into one knowledge base, such that a personalized enterprise's portal is created that replaces the worker's desktop, and allows access to the enterprise and personal knowledge, online and offline.
  • FIGURE 1 shows an example of a current search method using a content specific database.
  • FIGURE 2 clarifies the structure and role of the outlines, as seen in a content file, according to the present invention.
  • FIGURE 3 illustrates a user navigation session, or the process whereby the user navigates through various outlines, until arriving at the desired content file.
  • FIGURE 4 Ulustrates a multiple windows window according to the present invention.
  • FIGURES 5A and 5B iUustrate the system architecture and workflow, according to the present invention.
  • FIGURE 6 Ulustrates a visual presentation of a node and idea it conveys.
  • FIGURE 7 illustrates examples of the table structure within the present invention.
  • FIGURES 8.1 - 8.4 demonstrate the filtering and mapping procedures upon one modular structure of knowledge.
  • FIGURE 9 summarizes the novel elements in the new platform of the present invention.
  • FIGURE 10 describes the novelties in the various system elements.
  • the present invention relates to a system and method for enhancing both the retrieval and the acquisition of knowledge from electronic databases, incorporating content expertise, linguistics, and search technology.
  • the present invention presents a platform for an automated construction of content-oriented databases, where knowledge is organized according to content, rather than according to its initial sources.
  • the invention includes an innovative platform for an automated reorganization of knowledge, where the system automaticaUy filters, sUces, maps and links fragments of the initial files onto a modular structure of knowledge.
  • the present invention provides for an innovative knowledge management application, according to the following features: After an electronic database is organized according to the concept and application of the present invention, a user is able to retrieve the relevant pieces of information in just a few cUcks. The system does not settle for guiding the user to the relevant files, but further extracts the relevant fragments from within each source-file. All fragments that are relevant to one specified subject are integrated within one virtual content-file. Most importantly, the new concept is designed in a way that makes automation of knowledge management possible. Accordingly, the present invention presents a system for knowledge management that automatically filters, maps and retrieves fragments of information according to the user's needs.
  • the present invention fosters the automated attachment of all relevant paragraphs from various relevant sources to a modular structure of knowledge.
  • This "modular structure" refers to a hierarchy-based index that covers aU the ideas in a content specific field of knowledge.
  • the structure is buUt so that the upper nodes (which may be describe as subjects or information categories) are more general and the lower ones convey specific ideas.
  • the invention achieves intuitive access to concise content.
  • This new format further overcomes the setbacks of current navigation methods, as described above, by automatically mapping databases and guiding the user in a tailored path to concise content within just a few clicks.
  • the present inventions' innovative approach to knowledge is content-oriented, rather than source-oriented. Instead of overwhelming the user with huge amounts of "full text sources”, as a result of a search process undertaken, the present invention supplies concise content with an option to go back to the full text if needed.
  • the user instead of making the user go through a coUection of "search results", the user is provided with a smart coUection of paragraphs, or actual fragments of content, that convey the solution to a desired question.
  • the desired result of a search is seen as a combination of different angles that explain the same issue. FinaUy if the user chooses to elaborate on a specific angle, the platform can faciUtate a simple connection back to the full text. The present invention thereby faciUtates direct access to paragraphs rather than files.
  • Prior art tools for information research typicaUy provide access to a wide information base, content-neutral search- engines, and arbitrary categorical organization of the database. This in turn requires of the user to run the content-dependent searches, overview the files detected by the search-engine, filter, screen, map and patch fragments of information manually from the initial "full text” sources, and digest the relevant pieces of information.
  • the present invention automatically filters, slices, maps and links fragments of every file onto a modular structure of knowledge; dynamicaUy creates a modular structure to guide the user to the desired concise content; virtually creates content files that integrate aU of the relevant fragments of the relevant source-files within one editable virtual file; and interacts with the user in order to deUver a comprehensive tailored solution on one screen, using three complementary cognitive modes of presentation. Consequently, according to the present invention, a user is guided through the platform's modular structure and receives the relevant pieces of information that reflect knowledge, within just a few clicks. The user can optionally jump to the "full text" mode of presentation that is linked to every fragment; and can create and save his ⁇ her own personal modular structure for a research project.
  • the present invention is attuned to the needs of users to define independently the exact search phrase.
  • the present invention provides a renewed concept of a "search engine" that includes an interactive interface that is responsive to the user's requests.
  • the system of the present invention digests the various meanings that emerge from the search phrase. This means that the system can locate the various routes that end with a node that contains the search phrase.
  • the user is then invited to choose among several different contexts that might match his/her specific point of reference.
  • the user is then transferred to the relevant node in the modular structure and is presented with a content file that deals with the desired search term within the correct contextual reference.
  • the present invention thereby delivers an interactive interface that is responsive to the user's requests and redefines the traditional
  • the present invention allows the user a simple yet highly effective way of gathering information from a substantial quantity of electronic sources.
  • the system of the present invention facilitates the user's access to the relevant pieces of information, and to concentrate the concise content on one computer screen.
  • the interface is currently designed with ASP technologies using compiled components (COM).
  • COM compiled components
  • DNA concept AU access to the database is achieved by using pointers, without the need to scan the whole database. For this reason, the results appear on the user screen substantially faster than those attained using conventional processing of search queries.
  • the user can navigate in the modular structure using a set of links. The links direct the user to the required node. Once the user reaches this node, all the paragraphs that appear in the table are virtuaUy presented, meaning that the actual content from the relevant paragraphs are presented, extracted from their actual sources. A click on one of the paragraphs connects the user to the relevant source in the "source table".
  • the present invention reorganizes the database onto a modular structure that reflects knowledge.
  • a user begins by foUowing a map of links, presented on a floating window. The tour through the links does not require any expertise, as the user is guided from more general subjects to the more detailed ones.
  • the map of links mirrors the modular structure of the knowledge base, and is presented within a "knowledge tree.”
  • a "knowledge tree” refers to the directory structure that is hierarchical, reflecting at one time potentially multiple information options on multiple levels.
  • each node is accompanied by short outlines.
  • the Outlines are usually a summary that is written by experts in higher and more general levels of the modular structure and later are taken from the content file as the paragraph that is highly representative of the node's idea. Outlines are used to guide the user in choosing the correct node in the foUowing stage.
  • Figure 2 iUustrates the structure and role of the outlines: Assume, for example, a layman seeks materials on a legal subject, in the field of corporate law. In following the map of links 20, the user wiU begin by double cUcking the word "corporations" on the floating window. In reaction, the system introduces the four main subjects, or nodes, of corporate law.
  • the accompanying outlines 21 briefly explain the content of each subject.
  • the outlines guide the user in choosing among the four nodes. By double clicking the desired node, s/he wUl proceed to the next stage on the knowledge tree, wherein more specified nodes are shown, with their accompanying outlines. In this manner, the system enables users who are not famiUar with the professional terminology to get access to the relevant sources.
  • Figure 3 describes the "guided tour", or an example of a user navigation session, in which the user 31 navigates through various outlines 30, until arriving at the most relevant outline.
  • Each outline contains basic paragraphs that describe the current node in which the user is stationed. The paragraphs describe each branch of the knowledge tree, so that the user can see what the various nodes are about, and thereby navigate to links that are connected or flow from the current node, according to the criterion of the user. This provides the user with a roadmap to know where to navigate.
  • This tour enables the user to proceed from the initial node 32, which in the example is the general topic of "corporations", to the foUowing nodes 33, 36-39, until arriving at the desired content file.
  • This window is an aggregated window, further subdivided into a plurality of separate windows, each able to be controlled by the user.
  • This multiple-windows window can be seen in Figure 4.
  • the system of the present invention smartly integrates all of the relevant fragments of aU relevant files that deal with the specified node and convey its meaning. In this way the user is able to simultaneously gain access to multiple highly relevant extracts.
  • Every sub window in the content file reflects a paragraph that is tagged with a pointer from the original source. The paragraph conveys the node's idea.
  • a link back to the "full text" source is assigned to every sub window.
  • every sub- window's title is a reference of the source file so that the user can easily cite it.
  • Activating aU the pointers that lead from a desired node to tagged paragraphs from relevant sources creates the content file.
  • the content file relies on pre-analysis of the texts - this means that every new source that is added to the content oriented database is first tagged with the ideas it conveys. Every one of the source's paragraphs is scanned for the ideas it conveys. The relevant paragraphs are then attached with pointers to the relevant node.
  • the content file is virtual - This means that when the content file is activated, all the pointers that currently lead from the node to the relevant paragraphs wUl be gathered in a multiple-windows window. The activation of the content file is therefore always updated with aU the latest content that was added to the content sources.
  • the "view source” button enables one cUck access to the full text of each source file.
  • Figure 4 iUustrates an example of such a situation, wherein three internal windows can be viewed in the large left hand block. The visual tree can be seen in the right hand block.
  • WhUe source-files are organized according to their initial sources, content-files are organized according to content and meaning. Unlike current navigation and research systems, which encounter the user with source-files, the system of the present invention enables direct access to the content files, without requiring a prior viewing of the source file.
  • the content-file provides a powerful way of presenting concise content: • No need to engage in search -
  • the modular structure of knowledge guides the user to the desired content file (using the outlines) in a way that: o
  • the user does not have to master the relevant terminologies.
  • o The user does not have to use Boolean logics.
  • o The user does not get an overflow of search results.
  • the content file deUvers in one aggregated file the end results of a research work.
  • the paragraphs that users get in the content file reflect a comprehensive and concise knowledge about one professional idea within just a few cUcks.
  • the present invention gathers relevant textual sources from dedicated databases, according to particular subjects as required.
  • This process employs smart searches executed manuaUy by experts. The number of searches is relatively smaU, as there are relatively few higher nodes.
  • the higher node's structure follows the main classification of the commonly used professional Uterature 502. This procedure ensures that the upper structure is known and familiar to the user. This process also includes categorization of the data according to the primary levels of a knowledge tree.
  • Modular Structure Creation 505. The experts goes back to the professional literature 502, and according to the order of appearance in these texts, constructs a modular structure of knowledge 506 for the particular subject being researched. This means, for example, that if identity term A proceeds term B in a sufficient number of times within the textual sources, it wuT be positioned above term B in the modular structure of knowledge. Furthermore groups of terms that convey the same professional meaning are grouped into Word groups and are aUocated to the same node. The modular structure is built in a hierarchical way, such that every node has only one father. This process is based on the inner structure of texts, as determined by an expert, or as compUed automaticaUy according to the inherent structure of language, as described below. 4. Filtering - The filtering procedure 508 is automatic and, as can be seen in figure 5B, relies on the professional terminology 507 as well as on the initial database
  • each paragraph within every source is scanned by a filtering engine.
  • the scanning procedure checks each paragraph for the existence of professional terminology within it. If the paragraph does not include any professional terms it wiU be filtered out of the system. This means that such a paragraph will remain in the initial database 501, alternatively referred to as the Documents table (70 in Figure 7), where it will be untagged, and therefore no linked to any nodes or other tables.
  • the paragraphs which have professional terms are tagged within the initial database 501, and the links to these paragraphs are stored in the paragraphs table 72.
  • This paragraphs table 72 therefore does not store actual content from the source documents, but stores only Unks or pointers to the relevant paragraphs in the original source documents.
  • the paragraphs table 72 is therefore extremely Ught and fast, and is able to instruct the content oriented database 701 to compile the relevant content on demand.
  • the documents table 70 is equivalent to the initial database, storing the original fuU text documents for possible future reference. This ensures that the filtered out paragraphs are mapped by the system, even though logically they are not part of the knowledge tree and outlines. Other criterion may be used such as excluding paragraphs that are considered short (for example, if they are less than three lines long). The rational behind these rules is that if a paragraph is less than tree lines it is not likely that it will be able to convey a professional idea.
  • node A and node B that have the same term within their word groups
  • a paragraph that includes this term would be assigned to the relevant node according to context. This is done by examining the source of the paragraph for indications of the existence of one of the fathers of the nodes. If the father of node
  • the content file 513 is the coUection of all the paragraphs that were allocated to the node during the mapping phase.
  • the content file 513 as described above, is a new mode of fragmental presentation, which enables the user to get acquainted with a variety of fragments that deal with the same professional idea.
  • a link back to the source is attached to every paragraph.
  • the paragraphs are organized in a format of a "multiple windows" window, which allows the user to navigate each paragraph separately, as can be seen in phase 513.
  • the method of creating the content files based on the modular structure of knowledge is a s follows: 1. From Initial Files to the Structure of Knowledge:
  • the system In order to enable the user immediate access to content-files, the system must substitute the initial data sources by a content-oriented database, where the allocation of texts to units is determined by content (e.g., shareholders' liabiUty for corporate actions), and not by source (e.g., The Delaware Code).
  • content e.g., shareholders' liabiUty for corporate actions
  • source e.g., The Delaware Code
  • Every source is fragmented into paragraphs that convey meanings and ideas. Only those fragments that convey the ideas are mapped according to the suitable node in the modular structure of knowledge. Other paragraphs are not mapped.
  • This method replaces the current system of classifying each "fuU text" with the relevant category such as topic, place of issuing, origin etc.
  • the system must also preserve the initial allocation to source fUes, in order to enable users access to the "fuU-text" (i.e., the "view source” button).
  • the new platform of the present invention achieves these goals by splitting between the initial database 501, containing the full data sources, and a new, content oriented database, which is constructed from the set of links according to the modular structure of knowledge.
  • This separation between the physical database (full text documents) 501 and the logical database (content oriented database) has the following advantages:
  • the content oriented database can be implemented on any given "full text” database and reorganize it according to the modular structure of knowledge. The only requirement is that the initial database contains files that deal with the content oriented field of knowledge.
  • Every paragraph is contains an average of 2k of information, so the time it takes to upload it is significantly shorter, compared to a full text that contains an average of 200k of data. •
  • the database is extremely light and therefore enables extremely quick retrieval of content files and search results.
  • Node-to-Node - Links that form the structure of knowledge by means of father and son (hierarchical) relations.
  • the fixed window of the multiple-windows window screen may contain the visual presentations, which simulate and clarify the linguistic ideas.
  • the visual presentations are static or dynamic illustrations that vividly convey the idea of the node. These visual presentations might include a specific use of the professional idea within the text or the general idea.
  • Figure 6 describes a professional idea from the legal field of knowledge in a specific environment.
  • the visual presentation presents the legal idea of controUing shareowner. This is a template that wiU be later filled with data. At present this iUustration can describe a legal situation.
  • the presentation includes the name of the court opinion 61, the controUing share that are people 62, the controlling shares of institutions or organizations 63. Furthermore the company that is being dealt with is illustrated with reference to 64. This uTustration can help the user understand the concepts that each full text describes.
  • Figure 7 represents an example of tables that constitute the content oriented database of the present invention. According to Figure 7, each rectangle represents a table within the new content oriented database.
  • initial documents are stored within the document table 70. Every document is fUtered in order to detect relevant paragraphs. These paragraphs contain parts of the original, or source, document and convey relevant content (depending on the field of knowledge). For example within the legal profession relevant paragraphs from court opinions would convey the ruling.
  • These paragraphs are tagged 71 within the source file, stored in the documents table 70, and Unks to these tags are added to the paragraph table 72. The fUtering procedure is described below.
  • the nodes table 76 contains all the ideas within a specific field of knowledge.
  • the table represent a hierarchy of ideas where the initial node is the most general and is linked in a father-son relation to the sub ideas it conveys etc. Every node can be conveyed in a finite number of ways using a finite set of terminologies.
  • the Word Group table 78 attaches to every node (idea) aU the relevant terminologies that can sum up to convey the same idea.
  • the Word Group table 78 contains all the similar phrases or synonyms that are attached to the node. In this way, user searches may locate content fUes that were not directly searched for, based on sirrdlarity of context of the searched phase.
  • Content Table 74 therefore contains links to aU the paragraphs that are attached to every node, using the Word Group table 78 in order to detect relevant terminologies within the paragraphs. If a paragraph reference from within the paragraph table 72 was not assigned to any node via the Content Node table 74, it is passed on to an expert 79.
  • the expert wiU detect the idea that the paragraph conveys, that he/she wiU add the appropriate node to the Node table 76 with the appropriate synonyms or phrases to the Word Group table 78. This procedure will ensure that the next time a paragraph that conveys the same idea is added to the Paragraph table 72, it would find an appropriate node that represents its idea.
  • Any search for terms within the content oriented database does not include the documents table 70, but rather only the Word Group table 78, which contains aU the terminology in a content specific field of knowledge. This procedure saves CPU time and aUows the distinction of the different ideas that can be conveyed by the same terminology.
  • the new database is structured upon the set of links, and only contains pointers to the relevant paragraphs of the relevant files of each link.
  • the pointers enable the system to undertake dynamic retrieval of fragments from the initial files, taUored according to the subject matter.
  • the set of links reflects the structure of knowledge, and the pointers reflect the reorganization of initial texts in content-dependent units, wherein the retrieved fragments reflect the actual content.
  • the set of Unks, together with the pointers and sets of fragments, form the content-oriented database.
  • the Internet invites a reorganization of knowledge which puts the Unks ahead of the texts, such that the Unks would determine the access to the texts and not vice versa.
  • the platform of the present invention makes this shift, and constructs its database on sets of Unks.
  • the fragments that "survived" the filtering are then mapped using the mapping engine on the modular structure according to their content by means of assigning pointers from the nodes to the relevant paragraphs, which it represents.
  • This procedure is based on the modular structure, which conveys aU the possible Unks. Relying on the assumption that knowledge rarely changes, most of the sUced paragraphs find their place according to their meaning onto the modular structure. Sometime a paragraph would convey more than one idea. In this situation it would be Unked to more than one node. If the system was not able to find a suitable node for the paragraph, it means that there is a new node in the modular structure. The modular structure of knowledge would then be updated manually, by an expert, according to the nodes' context.
  • the present system is constructed upon the systematic structure of the links, as the set of links mirrors the modular structure of knowledge in each of the specified fields.
  • the present platform enables dynamic retrieval of paragraphs referred to by the Unks, and thereby further enables the construction of novel combinations of the textual fragments into a content fUe. This means that all the paragraphs that are Unked to a node can easUy be retrieved following the users request. These fragments are taken form the initial texts "as is" and their collection within the content file can provide a comprehensive collection of references on a certain idea.
  • the system can identify the relevant paragraphs by the pre-analysis that the textual sources have gone through upon their arrival to the system.
  • Every new source is f tered by the filtering engine into relevant paragraphs that are automatically linked to the relevant nodes, aUowing easy retrieval later on when the node is activated by the user.
  • These fragments refer to actual content extracts taken from source texts.
  • the system can pre-analyze the knowledge base, so that searches are not required to comb actual source documents but rather the content oriented database within the word group table. This procedure saves on CPU time as well and enables immediate retrieval.
  • the database is constructed upon the links, the latter are not apparent to the user, and therefore the desired outcomes are accomplished with no need for intensive labor or elaborate activities of quaUty assurance.
  • the database of Unks does not include the texts, but rather, pointers to the relevant fragments in the initial database 501.
  • a pointer is a link from the node to a relevant paragraph.
  • Each node may have many pointers that are linked to several paragraphs. Furthermore there some paragraphs have several pointers from different nodes attached to them.
  • the system can dynamically retrieve the relevant fragments due to the pointers.
  • the foUowing description enumerates the three fundamental elements of knowledge management, as defined by the present invention: 1. Construction of the modular structure of knowledge -The construction is a semi automated procedure wherein the computer traces the terminology and suggests a formation. The formation is based upon the inner structure of ideas in the texts, as wuT be described below. This way ensures that reappearing phenomena are captured, thereby achieving an objective and comprehensive formation of knowledge. A human expert then has to refine the initial structure according to context. 2. Filtering the initial files, such that only the relevant fragments are entered into the mapping system. Every field of knowledge is given different filtering rules according to the content specific needs and interests.
  • Tagging is the process whereby relevant paragraphs are automatically aUocated to the relevant nodes within the modular structure of knowledge. This tagging is executed by the mapping engine, during the mapping process. Mapping is the process, whereby according to the aUocations of each paragraph, a pointer is assigned from the node to the corresponding paragraph.
  • Clusters of Meaning are accurate guidelines for the automated Mapping:
  • the present invention claims that every professional field consists of a finite number of "terms" or phrases that convey content specific meanings within the field of knowledge.
  • the automated mapping can be guided by the rules governing the appearance of such terms within the texts.
  • a set of content driven synonyms In order to locate the terms according to their meaning and group them into clusters, there is a need to identify similarity of meaning among the terms. Professional experts that have the ability to recognize the content-specific meaning of the terms and find different means to articulate them undertake this procedure. After the experts recognize the synonyms, the system creates word groups out of them. A label is assigned to every group, capturing the core idea it encompasses.
  • the textual expression tends to proceed from the more general terms and ideas to the more specific and concrete ones. Accordingly, the more general term wiU always appear within the text before the more concrete and specific term is used.
  • the modular structure of knowledge is grounded within the textual expression itseU: in presenting some detailed idea, the author always begins by reference to the more general idea.
  • the more general content-specific term wUl always appear in the text before the detailed ones.
  • the various terms or ideas may be placed upon on each other, in order to represent repeating structures in a text. The sum of all these structures make up the modular structure of knowledge, which reflects the content specific knowledge.
  • This modular structure is made up of two categories. The initial higher levels are those categorized by subject specific experts. The lower levels are those derived from the inner structure of the text, as described above.
  • the purpose of the fUtering tool of the present invention is to filter and thereby limit irrelevant paragraphs from textual sources.
  • the relevance of the paragraphs is content dependent. This is done by allocating textual cues to a filtering algorithm. For every contextual field there exists different contextual cue.
  • the filtering tool tags paragraphs that are not filtered out, from the documents table. These paragraphs are linked to a paragraph table, which is subsequently Unked to a relevant nodes content table, according to the word groups of the node.
  • the tables of the present invention contain links to relevant content, and not source data. This substantially speeds up searching and processing abUity of the present invention.
  • mapping tool of the present invention is to allocate paragraphs from the "paragraph table" to the modular structure of knowledge.
  • This tool tags every paragraph with indicatory terms that identify several nodes of the modular structure by using several combining devices.
  • a searching mechanism that has a few guiding rules for the identification of indicatory terms within the paragraph. The combination of these rules assures that the paragraph deals with the node's idea.
  • the tables that are used in this section are:
  • paragraph table the system of the present invention examines if one of the indicatory terms or a combination of indicatory terms appears in it. If so the "inter node container” table adds the paragraph to the appropriate node.
  • the Structure The modular structure of knowledge is then constructed upon the clusters of meaning, as weU as their use within the texts. » Features, Qualities and Capabilities - o Depth v. Breadth - The basic guideline to construct the knowledge tree is to avoid unnecessary depth. This guideline allows the user to reach the most distant lexical term in the shortest number of clicks possible.
  • o Links Organization The modular structure is linked in a way that every lexical item has only one generalized term that encompasses it. This organization ensures that there is only one route leading from the most distant and specific lexical item to the most generalized one.
  • o Expertise Representation The choice of lexical terms and the their organization in the modular structure represents the whole field of knowledge and the expert's knowledge.
  • the Content Editor is a tool, used by an expert 515, or some other person responsible for creating, defining and mamtaining the structure and rules (content keys) used for the mapping / fUtering of content files.
  • Content Editors are mostly professionals with extensive knowledge in their particular field of expertise (i.e. Corporate Law). They require an easy-to-use, easy to understand interface, in which to bmld and maintain the "knowledge trees". This interface, or content editing tool, is currently created using ASP (active Server Pages) software and MS SQL Server 2000.
  • ASP active Server Pages
  • the Content Editor builds the hierarchical structure of the knowledge tree. He/she also assigns the mapping / fUtering parameters, effectively giving meaning to a vast amount of data.
  • the use of human content editors enables the preparation of highly professional content structures.
  • a specific discipline would thereby require a basic initial infusion of an infrastructure for a specific body of content, by an editor. Following this initial stage, the content base for the specific discipline may expand infinitely with a negligible investment in re-editing, as it based on the initial programming.
  • the content editor uses a set of basic editing tools to construct the modular structure and to feed to the system all the indicatory terms. There are, however, preparations that take place before this procedure can take place.
  • the first is the collection of aU the indicatory words and their synonyms.
  • a semi- automatic procedure arranges these terms onto a modular structure. This semi-automatic procedure includes the automatic detection of the relevant terminologies from a bank of relevant sources and their automatic arrangement according to semantic relations within a modular structure of knowledge.
  • An expert refines the structure according to context, classif ing relevant nodes and word groups using the content editing tool. Professional dictionaries are inefficient since they do not convey aU the possibiUties that are used in the professional field. Relying on the sources themselves, only words that appear in texts are extracted. The system, therefore, scans new sources, and automatically looks for these terms when the commonly used terms are already detected.
  • the system can detect new terms that were not used before. This is done using the combination of the filtering and mapping engines in the following way. If a paragraph that was not filtered is a relevant paragraph. This paragraph has to be aUocated to a node on the modular structure of knowledge according to the indicatory words that it contains. If the mapping engine was not able to aUocate the paragraph to the modular structure it means that there is a new term hiding within it. The paragraph is transferred to an expert, which according to its context, can add a node to the tree with the new indicatory term that was not detected by the system. This can assure that the next time a paragraph containing the new term is mapped the system wUl be able to allocate it properly automaticaUy.
  • This component is a software means, currently created using Perl, XML, an algorithm language, ASP, SQL, and Visual Basic software, wherein Visual Basic Components store tables in the MS-SQL database.
  • the purpose of the fUtering tool is to filter out irrelevant paragraphs from textual sources.
  • the relevance of the paragraphs is content dependent. This is achieved by aUocating textual cues to a filtering algorithm.
  • the filtering algorithm reUes on the linguistic expert to extract those rules according to a coUection of representative sample of relevant sources within the specific field of knowledge.
  • the filtering tool uses two main tables, a "source table” and a "paragraph table". Where the paragraphs in the "paragraph table” are taken from a "full text" source in the source Table.
  • the Mapping Engine appUes the content keys assigned by the Content Editor and performs mapping of the text objects (Word, Excel, HTML, raster fUes, PDF, etc.) in a File Bank.
  • a file bank is a coUection of tagged sources that have gone through the filtering process. These tagged paragraphs are later assigned to the relevant node.
  • Content Keys are a new technological concept which utilize mapping algorithms. These algorithms are based on the mathematical set theory (for example, hierarchical father / son relationship, property inheritance, etc.).
  • the purpose of the mapping tool is to aUocate paragraphs from the "paragraph table" to the modular structure of knowledge. This tool tags every paragraph with indicatory terms that identify several nodes of the modular structure, by using several combined devices.
  • This procedure is made up of two main functions:
  • FIGURES 8.1 - 8.4 iUustrate the 4 stages of system analysis.
  • Figure 8.1 an illustration is provided of a modular structure of knowledge in the legal field dealing with takeover. Each node is foUowed by its nodelD. The figure represents just a segment from the whole modular structure in corporate law. The categorization into nodes is automaticaUy constructed upon a sample of highly relevant textual sourced deaUng with takeover, in this case. An expert later refines the construction.
  • Figure 8.2 provides a segment of an example iUustrating the division of a single source into paragraphs. As can be seen, each paragraph, or section of the text that is separated by at least a tab, is placed on its own and defined as a paragraph.
  • Figure 8.3 iUustrates a section from the output of the fUtering engine, whereby the original text is divided up into those texts that are fUtered out, and those that must be mapped.
  • the underlined texts are to be filtered out are bold texts represent the paragraphs that need to be mapped.
  • Figure 8.4 iUustrates the mapping of the paragraphs onto the different nodes of the relevant modular structure of knowledge. As can be seen in the figure, aU the nodes that appeared in the text are presented as titles. Following the node name, aU the paragraphs from the sample sources that deal with the specified idea are accumulated. This provides a coUection of the ideas that were conveyed in the sample text and the paragraphs that were automatically detected by the system and assigned to them.
  • the best mode of the present invention is the development and usage of a knowledge management tool for a speciaUzed field of knowledge, such as the legal profession.
  • a knowledge management tool for a speciaUzed field of knowledge, such as the legal profession.
  • Such a tool provides substantiaUy improved accuracy and efficiency in conducting online research, and subsequently managing the research.
  • An additional embodiment of the present invention enables integration of the present invention within old information searching formats.
  • the user may gain access to the content-fUes through a conventional smart search engine.
  • the user enters the subject matter which she is searching for into the search box and clicks "search".
  • the system recognizes the relevant node on the knowledge tree, and instantly directs the user to the relevant content file. In this manner, the user can get enjoy the "brain" of the system, as well as the advantages of content-files with no need to "foUow the map of Unks.”
  • a solution is provided for professional researchers, such as legal firms performing legal research.
  • the research is done by assistants, who must go through the following procedure:
  • the system provides an effective solution to the overflow of information, whereby the user can achieve superior results even to those of an experienced professional expert.
  • the bUling system attunes the fees to each "tour" on the modular structure, such that the professional pays “per-use”.
  • a further embodiment of the present invention is an application for content providers.
  • Content is traditionally compUed manually, which is generaUy requires a significant quantity of workers.
  • prior art content provider systems as information doubles in shorter time frames, the manual method is becoming increasingly impractical.
  • the "cheap" personnel who are hired are generally not capable of dealing effectively with the overflow of information, and inevitably results in lower standards of content.
  • the present embodiment of the present invention includes the foUowing innovative aspects:
  • Automated content aggregation - the present invention acquires all the relevant textual sources needed for each topic using advanced searches.
  • a further embodiment of the present invention is an application for enterprise information portals, wherein the proliferation of interest in "knowledge management" in the last few years is a reflection that information has finally gained visibiUty as a major corporate asset. Furthermore, sharing information across the organization and between organizations to support greater learning and competitiveness, has resulted in moving to the next level of information management (IM) — knowledge management.
  • IM information management
  • enterprises using prior art knowledge management systems, loose bilUons of doUars a year because of inefficiencies resulting from inteUectual rework, substandard performance and an inabUity to find knowledge resources. This is expected to become substantially more acute.
  • the current embodiment of the present invention provides:
  • the system enhances knowledge acquisition in several ways:
  • a smart process of uploading sources onto databases shifts the process of uploading sources on the computer, from a source-based upload to a content-based upload.
  • a file is uploaded on the system of the present invention, its paragraphs are automatically linked to the relevant content-files.
  • Smart tools the present invention substitutes the Boolean search engines by smart tools for filtering and mapping the sources.
  • the smart tools dramaticaUy reduce the amounts of information, and resolve the problems of content neutral search tools.
  • An "automated duplication" of expert-searches Because the mapping process is attuned to the modular structure of knowledge, and because the modular structures are constructed by content experts, the system of the present invention enables the automated duplication of expert searches. 5.
  • the content files are continuously updated.
  • a new concept of integration The present invention integrates the organization of databases with the knowledge tree, the user's interface, and the user's workplace. Accordingly, aU sources are automaticaUy organized within a synchronized structure without burdening the user.
  • Figure 9 illustrates the novel elements in the platform of the present invention.
  • Figure 10 is a table summarizing the novelty in each of the new system's elements.

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
EP01923952A 2000-04-19 2001-04-19 Verfahren zur erzeugung von inhaltsorientierten datenbanken und inhaltsdateien Withdrawn EP1282844A4 (de)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US19900800P 2000-04-19 2000-04-19
US199008P 2000-04-19
US22669400P 2000-08-22 2000-08-22
US226694P 2000-08-22
PCT/IL2001/000364 WO2001079957A2 (en) 2000-04-19 2001-04-19 A method for creating content oriented databases and content files

Publications (2)

Publication Number Publication Date
EP1282844A2 true EP1282844A2 (de) 2003-02-12
EP1282844A4 EP1282844A4 (de) 2005-03-02

Family

ID=26894366

Family Applications (1)

Application Number Title Priority Date Filing Date
EP01923952A Withdrawn EP1282844A4 (de) 2000-04-19 2001-04-19 Verfahren zur erzeugung von inhaltsorientierten datenbanken und inhaltsdateien

Country Status (3)

Country Link
EP (1) EP1282844A4 (de)
AU (1) AU5063201A (de)
WO (1) WO2001079957A2 (de)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080005667A1 (en) 2006-06-28 2008-01-03 Dias Daniel M Method and apparatus for creating and editing electronic documents
CN113641782A (zh) * 2020-04-27 2021-11-12 北京庖丁科技有限公司 基于检索语句的信息检索方法、装置、设备和介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5692181A (en) * 1995-10-12 1997-11-25 Ncr Corporation System and method for generating reports from a computer database
WO1998020436A2 (en) * 1996-11-07 1998-05-14 Natrificial Llc Method and apparatus for organizing and processing information using a digital computer
US5768578A (en) * 1994-02-28 1998-06-16 Lucent Technologies Inc. User interface for information retrieval system
WO1998028696A1 (en) * 1996-12-24 1998-07-02 Correlate Technologies Ltd. Computer software and user interface for information management
US6037944A (en) * 1996-11-07 2000-03-14 Natrificial Llc Method and apparatus for displaying a thought network from a thought's perspective

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6070160A (en) * 1995-05-19 2000-05-30 Artnet Worldwide Corporation Non-linear database set searching apparatus and method
US6144944A (en) * 1997-04-24 2000-11-07 Imgis, Inc. Computer system for efficiently selecting and providing information
US6134532A (en) * 1997-11-14 2000-10-17 Aptex Software, Inc. System and method for optimal adaptive matching of users to most relevant entity and information in real-time

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5768578A (en) * 1994-02-28 1998-06-16 Lucent Technologies Inc. User interface for information retrieval system
US5692181A (en) * 1995-10-12 1997-11-25 Ncr Corporation System and method for generating reports from a computer database
WO1998020436A2 (en) * 1996-11-07 1998-05-14 Natrificial Llc Method and apparatus for organizing and processing information using a digital computer
US6037944A (en) * 1996-11-07 2000-03-14 Natrificial Llc Method and apparatus for displaying a thought network from a thought's perspective
WO1998028696A1 (en) * 1996-12-24 1998-07-02 Correlate Technologies Ltd. Computer software and user interface for information management

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of WO0179957A2 *

Also Published As

Publication number Publication date
AU5063201A (en) 2001-10-30
EP1282844A4 (de) 2005-03-02
WO2001079957A2 (en) 2001-10-25
WO2001079957A3 (en) 2002-02-21

Similar Documents

Publication Publication Date Title
US20020049705A1 (en) Method for creating content oriented databases and content files
US9384245B2 (en) Method and system for assessing relevant properties of work contexts for use by information services
US7895595B2 (en) Automatic method and system for formulating and transforming representations of context used by information services
US20030061209A1 (en) Computer user interface tool for navigation of data stored in directed graphs
US8131779B2 (en) System and method for interactive multi-dimensional visual representation of information content and properties
US6327586B1 (en) System method and computer program product to automate the management and analysis of heterogeneous data
US9449080B1 (en) System, methods, and user interface for information searching, tagging, organization, and display
US9659084B1 (en) System, methods, and user interface for presenting information from unstructured data
US10133823B2 (en) Automatically providing relevant search results based on user behavior
US20050055321A1 (en) System and method for providing an intelligent multi-step dialog with a user
EP1667034A2 (de) System und Verfahren für interaktive mehrdimensionale visuelle Darstellung von Informationsinhalten und Eigenschaften
US11308177B2 (en) System and method for accessing and managing cognitive knowledge
EP1212697A1 (de) Verfahren und gerät um durch anwender definiertes technisches wörterbuch, das online datenbasen verwendet, zu bauen
Sciascio et al. Supporting exploratory search with a visual user-driven approach
EP1282844A2 (de) Verfahren zur erzeugung von inhaltsorientierten datenbanken und inhaltsdateien
Venkatsubramanyan et al. Techniques for organizing and presenting search results: A survey
Stuckenschmidt et al. A topic-based browser for large online resources
CA2528506A1 (en) System and method for interactive multi-dimensional visual representation of information content and properties
Greene et al. Browsing publication data using tag clouds over concept lattices constructed by key-phrase extraction
Spinakis et al. Text mining tools: Evaluation methods and criteria
Buchanan Integrating information seeking and information structuring: spatial hypertext as an interface to the digital library.
FR3136298A1 (fr) Procede d’association d’une donnee a un document numerique, systeme associe
Edwards et al. MeSH represented MEDLINE query results
White et al. Features of Exploratory Search Systems
Vrandecic et al. D7. 2.1 SEKT Methodology: Initial Lessons Learned and Tool Design

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20021015

AK Designated contracting states

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE TR

AX Request for extension of the european patent

Extension state: AL LT LV MK RO SI

A4 Supplementary search report drawn up and despatched

Effective date: 20050119

RIC1 Information provided on ipc code assigned before grant

Ipc: 7G 06F 17/00 A

17Q First examination report despatched

Effective date: 20050415

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20051026