EP1240596A2 - Verfahren und gerät zum speichern und wiederauffinden von wissen - Google Patents
Verfahren und gerät zum speichern und wiederauffinden von wissenInfo
- Publication number
- EP1240596A2 EP1240596A2 EP00980758A EP00980758A EP1240596A2 EP 1240596 A2 EP1240596 A2 EP 1240596A2 EP 00980758 A EP00980758 A EP 00980758A EP 00980758 A EP00980758 A EP 00980758A EP 1240596 A2 EP1240596 A2 EP 1240596A2
- Authority
- EP
- European Patent Office
- Prior art keywords
- knowledge
- noun
- verb phrase
- linked
- pair
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/02—Knowledge representation; Symbolic representation
- G06N5/022—Knowledge engineering; Knowledge acquisition
- G06N5/025—Extracting rules from data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
Definitions
- the present invention relates to systems and methods for organizing, storing and retrieving knowledge.
- Embodiments of the present invention help a user to find knowledge pertinent to a user, rather than finding unconnected, raw information.
- Knowledge is used herein to refer to information which has been organized according to context and meaning, while information is used herein to refer to raw information which is unorganized or organized according to other criteria, such as input format, vocabulary used, source, etc.
- embodiments of the present invention receive information as natural language and produce therefrom a linked knowledge model.
- the linked knowledge model can then be navigated to extract knowledge, according to the relationships "How", “Why”, “What” and “What is” discussed above, using a user interface, which visually facilitates making the connections between pieces of information.
- Embodiments of the present invention can increase a user's knowledge by augmenting the knowledge base available to the user in the form of a linked knowledge model. Gaps in the linked knowledge model and edges or boundaries of the linked knowledge model are susceptible to automatic detection. Embodiments of the invention can therefore detect such gaps and edges and search for missing information on an ad hoc, regularly scheduled or other basis.
- Some embodiments of the invention can include user interfaces which automatically analyze the knowledge sought in inquires, for example received by email, and prepare responses to those inquiries based on the contents of a dynamic linked knowledge model accessible to the user. Moreover, the contents of queries and other received information can be related to the information already contained in the linked knowledge model, to expand the model, automatically.
- the linked knowledge model can also be presented in various useful formats, depending on the type of knowledge sought.
- a system for storing and retrieving knowledge comprising a text analyzer having an input which receives information as natural language and which produces at an output annotated information, a parser having an input which receives the annotated information from the text analyzer and which produces and stores in a memory a linked knowledge model thereof, and a user interface presenting at least one pair of complementary relationships to an operator, through which the operator can navigate the linked knowledge model to retrieve knowledge.
- the system can further comprise a knowledge research agent having an input receiving the linked knowledge model and having an output on which the knowledge research agent produces a query compatible with an external search engine.
- the Hyperknowledge agent may automatically augment the linked knowledge model with new information found by executing the query on the external search engine.
- the parser can further comprise a Rules Engine executing a plurality of rules defining a structure of knowledge embedded in information expressed in natural language.
- the linked knowledge model may include connected noun/verb phrase pairs, and the plurality of rules may define how sentences in natural language relate to connected noun/verb phrase pairs in the linked knowledge model.
- the rules may include at least one semantic rule, at least one syntactic rule or at least one context rule.
- the method comprises presenting to an operator a selector by which the operator indicates one of a pair of complementary relationships and a query input by which the operator indicates query text, and combining the one of the pair of complementary relationships indicated and the query text to form a query by which knowledge is retrieved.
- the selector may further define four directions characterized by complementary pairs of directions, each direction representing a relationship.
- the selector may further be presented in a graphical form resembling a compass rose.
- the method may further comprise storing knowledge in a linked knowledge model including noun/verb phrase pairs, searching the linked knowledge model for a first noun/verb phrase pair defined by the query text, and retrieving a second noun/verb phrase pair from the linked knowledge model bearing the indicated one of the pair of complementary relationships to the first noun/verb phrase pair.
- searching may further comprise parsing a natural language query text into a searchable Boolean expression.
- the identified portions of the linked knowledge model may further include a first noun/verb phrase pair lacking a second noun/verb phrase pair related by at least one of two complementary relationships defined between the noun/verb phrase pairs.
- each of the foregoing aspects of the invention may be embodied as a software product including a machine readable medium on which is encoded a sequence of software instructions which when executed direct the performance of one of the methods discussed above, or which cause a general purpose computer to act as a special purpose system as defined above.
- Fig. 1 is a schematic drawing of a compass rose used to indicate connection between information comprising knowledge
- Fig. 2 is a screen shot of a user interface for searching a knowledge base
- Fig. 3 is a screen shot of a user interface presenting search results in a navigable format
- Fig. 4 is a block diagram of a system in which an embodiment of the invention is incorporated;
- Fig. 5 is a block diagram of a Natural Language Parser embodying aspects of the invention;
- Fig. 6 is a portion of a screen shot showing a user interface for indicating relationships between information in input text comprising knowledge
- Fig. 7 is a knowledge diagram relating information in a structure representing knowledge.
- Directional search capability for example using a Compass Search Agent (see Fig. 4, 401), allows a user to specify a knowledge direction in which to execute a specified search. Thus, a search produces more focussed answers to a user's query. Using the data structures and visual presentations according to Davies facilitate this capability.
- a search phrase When a user desires to retrieve knowledge, the user enters a search phrase into a search box 201, as shown in Fig. 2, for example.
- the user selects the desired knowledge direction, for example by selecting using a conventional pointing device such as a mouse, one of the compass points in a compass rose 202 representation of the four knowledge directions.
- the search is executed on a linked knowledge model, the operation is simple. Processes or noun/verb phrases matching the search phrase are retrieved using a fuzzy match algorithm. The linked knowledge model is then navigated, i.e., the selected concept is changed, one step in the requested direction, and the resulting processes or noun/verb phrases are returned.
- a search for "employ staff in the "Why” direction will return knowledge explaining why staff should be employed, as opposed to general information on employing staff, how to go about employing staff or what staff should be employed. If a "What" or "What is” direction is requested, then noun phrases in that direction are returned, together with a selection list of verbs related to the noun phrases returned, also extracted from the linked knowledge model. If a general search is desired, the user can elect to specify no direction, however, information overload is likely to result when a large linked knowledge model is involved.
- FIG. 3 Another example search is shown in Fig. 3.
- the user has entered the search phrase 301 "maximize web market.”
- the user has asked the question "How do I maximize my web market?" by clicking on the "How" arrow 302 of the compass rose.
- the search results are narrowly focussed on "How to maximize web market.”
- the user can click on the desired search direction of a compass rose 303 adjacent the topic of interest 304.
- a generalized information source such as the collection of hypertext documents commonly known as the World Wide Web, stored on the international computer network commonly known as the Internet, for example, then a connection to a conventional search engine is first established.
- the search engine is passed a query, which comprises the requested direction appended to the search phrase.
- a Natural Language Parser (see Fig. 4, 403), as shown in Fig. 5, and described in greater detail below, analyzes input information and structures the information in a linked knowledge model.
- a Text Analyzer analyses natural language sentences, annotating them with linguistic and semantic tags.
- a Context Engine ensures that cross-sentence relationships are tracked, allowing information from one sentence to be placed in proper context relative to information from other sentences.
- a Parser Engine then creates noun/verb phrase pairs and the links between them, by first annotating the noun/verb phrase pairs created as described below in connection with the Text Analyzer and then comparing the annotated noun/verb phrase pairs with a linked knowledge model corpus (KMC).
- KMC linked knowledge model corpus
- the Parser Engine also relies on rules supplied and applied by a Rules Engine.
- the KMC consists of a large number of linked knowledge models, built manually from a range of texts and other knowledge sources. There may be many types of corpora, each focussing on a specific domain of knowledge, such as the knowledge contained within popular business books. The structure of these corpora is described below. The user may select a particular corpus if the input text matches the corpus' subject. Each noun/verb phrase pair in the KMC is tagged with a selection of annotations, specifically selected to enable patterns between the text and corresponding noun/verb phrase pair to be detected. These annotations can be regarded as belonging to various levels of knowledge and are described now.
- the lowest level annotations are pure linguistic analysis of the text. These include part of speech (e.g. using is a verb, customer is a noun) and word frequencies compared to Standard English, possibly extracted from the British National Corpus (to highlight where a word or phrase appears more often than would be expected).
- base word forms e.g. customers to customer, using to use
- syntactic structures commonly known as the "parse tree" of a sentence. All these annotations can be automatically added to the KMC using existing tools such as Link Grammar and WordNet.
- the middle level annotations are at the semantic level. Here, each word or multiword phrase in both the text and noun/verb phrase pairs is tagged with their semantic field.
- the highest level of annotation is the noun/verb phrase pairs themselves. Due to their structure (connected to other noun/verb phrase pairs via What, How, Why and What-is links) and content (verb phrases and noun phrases) they constitute the most semantically rich data in the KMC. Because of this, and to ensure accuracy, the noun/verb phrase pairs and its related natural language text are manually co-referenced. To facilitate pattern matching within the KMC, the noun/verb phrase pairs need to be represented in a textual markup form that describes the structure of a linked knowledge model.
- One possible implementation is a format based on SGML (Standard
- HKML HKML
- the HKML structure contains any number of tags, some of which are defined below. Each tag is used to describe a unique part of a linked knowledge model. The majority of tags are required to be closed by their closing counterpart - usually denoted by a forward slash 7' inside the tag. Some of the tags are described below, but others could be added to define further parts of the linked knowledge model. Also, other names or syntactic arrangements could be used, as known in the art. • ⁇ ENVIRONMENT> - defines the scope of the entire environment - closed by the ⁇ /ENVIRONMENT> tag.
- • ⁇ KNOWDES> - defines the noun/verb phrase pair section of the model - closed by the ⁇ /KNO WDES> tag.
- • ⁇ KNOWDE> - defines a single noun/verb phrase pair in a model - closed by the ⁇ /KNOWDE> tag.
- o attribute - KEY number.
- o attribute - CONNECTOR cnAND/cnOR/cnTHEN.
- o attribute - SUPERIOR number (defines the Type noun/verb phrase pair of an Instance noun/verb phrase pair).
- • ⁇ OBJECTIVE> - defines a special noun/verb phrase pair. - closed by the ⁇ /OBJECTIVE> tag. Used to surround the ⁇ K.NOWDE> tag in the case that the noun/verb phrase pair is an objective.
- ⁇ WHY> - defines the Why noun/verb phrase pair of a relation - closed by the optional ⁇ /WHY> tag.
- o attribute - WHYKEY number.
- One or more linked knowledge models can be contained in an environment file. Hyperknowledge environment files are written in HKML, although other environments can be constructed using other markup languages. Each Hyperknowledge environment file (denoted by the ⁇ ENVIRONMENT> tag) contains an unlimited number of linked knowledge models (denoted by the ⁇ MODEL> tag). Each model contains two sections; the noun/verb phrase pair section, denoted by the ⁇ KNOWDES> tag, and the Relations noun/verb phrase pair connections, section, denoted by the ⁇ RELATIONS> tag.
- the noun/verb phrase pair section contains an unlimited number of noun/verb phrase pairs, each one denoted by the ⁇ KNOWDE> tag.
- the ⁇ OBJECTIVE> tag can be used to surround a ⁇ KNOWDE> tag to denote that the noun/verb phrase pair is an objective of the model.
- Each noun/verb phrase pair contains several attributes; its unique identifier (key), its connector type and its owner noun/verb phrase pair in the case of it being a WHAT of a noun/verb phrase pair.
- the tags ⁇ VERBPHRASE> and ⁇ NOUNPHRASE> must be included between the opening and closing ⁇ KNOWDE> tags. These two tags define the two pieces of text that make up the noun/verb phrase pair.
- the Relations section contains an unlimited number of Relations (each one denoted by the ⁇ RELATION> tag).
- Each ⁇ RELATION> tag defines the relationship between a pair of noun/verb phrase pairs.
- the ⁇ HO W> and ⁇ WHY> tags must be included between the opening and closing ⁇ RELATION> tags. These two tags define the two actual How and Why noun/verb phrase pairs involved in the Relation.
- Each Relation contains several attributes including its unique identifier, relKey, and the next Relation in the chain, nextRel.
- unique identifiers i.e., keys are used to match up individual noun/verb phrase pairs to those included in Relations. This helps describe the unique connections between noun/verb phrase pairs so that the structure is unambiguous.
- the Rules Engine includes syntactic and semantic rules, both manually developed and automatically created via Data Oriented Parsing of the corpus.
- the final output of the Parser Engine is a linked knowledge model including noun/verb phrase pairs connected via "How- Why-What- What is” links.
- Various interfaces are then employed to output this structure to the user. These are described in more detail later.
- the Natural Language Parser (Fig. 4, 403) shown in detail in Fig. 5 can reside on a remote or local server computer, or can reside on a client computer, such as a personal computer used by the user.
- Locating the parser on a server on the Internet is particularly advantageous when used for extracting knowledge from the World Wide Web.
- Locating the Natural Language Parser (Fig. 4, 403) on a user's client computer is particularly advantageous when used for extracting knowledge from the user's personal documents, reports, presentations, email, etc.
- Compass Rose of Fig. 1 delimit the boundary of "Why" understanding and knowledge, while processes at the eastern edge (See Compass Rose of Fig. 1) of the linked knowledge model delimit the boundary of "Why" understanding and knowledge, while processes at the eastern edge (See Compass Rose of
- Fig. 1 of the linked knowledge model delimit the boundary of "How" understanding.
- Processes at the edges of the model or whose number of connections indicates gaps in knowledge are first identified. Then directional searching is used to find the missing information, for example on the Internet or in a user's unanalyzed documents.
- Agents facilitate the performance of the above described processes by facilitating importation of natural language information in a variety of formats.
- Agents have been implemented based on application programming interfaces (APIs) included in a product of Hyperknowledge Corporation of Woburn, Massachusetts, which produces linked knowledge models according to Davies.
- APIs application programming interfaces
- the Hyperknowledge product used includes standard Automation, DCOM and CORBA interfaces. Other standard and custom interfaces could be implemented and used in other products.
- the interface agent detects the arrival of new email.
- the interface agent invokes the natural language extraction technology described above to extract knowledge from the email message.
- a model built from a single email can be useful as an aid to understanding a particularly long or complex message.
- Processes extracted from the email to form the model of the email message can next be searched for in a local or other linked knowledge model accessible to the user. References to related processes and other information can then be integrated with the model of the email. Likewise, the model of the email can be integrated with the existing linked knowledge model, thus extending the knowledge and understanding contained therein.
- An advantage of extending the email model as described is that the user has more complete information when the email is finally read.
- An advantage of extending the local linked knowledge model as described is that users of that local linked knowledge model have more complete information when the local linked knowledge model is consulted.
- Extracted knowledge can be used to dynamically create organized structures within the email reader client software. For example, the key subject matter of the email could be used to create folders to which the email can be stored or linked. Extracted knowledge can also be used to automatically generate informative replies to emailed questions. When extracted knowledge points to a particular part of the local linked knowledge model, and the agent finds the connection has a desired level of confidence, then an automatic reply can be generated which includes the related part of the local linked knowledge model.
- the user has received the following email: John ,
- the user decides to have the Natural Language Parser 403 read the text, in order to (a) summarize the content, (b) markup the key points as hypertext links and (c) use the links created at step (b) to find out more.
- the email text is annotated with syntactic and semantic. This information is used to extract candidate noun/verb phrase pairs from the email text. These will include:
- the noun/verb phrase pair Matcher searches the KMC for noun/verb phrase pairs that are near matches, by comparing the textual, syntactic and semantic information of the candidate noun/verb phrase pairs with the annotated noun/verb phrase pairs in the KMC.
- the candidate noun/verb phrase pair "deal with worldwide economy” is found to have a good semantic and syntactic match with the KMC noun/verb phrase pair Address Global Economy.
- the Model Builder 405 consequently stores the "deal with worldwide economy" noun/verb phrase pair and along with it a link to the Address Global Economy noun/verb phrase pair in the KMC.
- This link is tagged with a percentage confidence figure, based on the level of match between the candidate noun/verb phrase pair ' s annotations with the KMC noun/verb phrase pair's annotations.
- all the annotations are stored with the noun/verb phrase pairs, including links back to the original email text. This association allows a hyperlink to be displayed in the email text.
- the HOWs and WHYs of the Address Global economy KMC noun/verb phrase pair are examined, and compared against other candidate noun/verb phrase pairs extracted from the email text. A good match as found also between the candidate noun/verb phrase pair "be successful with the proposed electronic commerce strategy " and the KMC noun/verb phrase pair Experience E-Commerce Success.
- the Model Builder 405 stores this connection and the percentage match. Also, this candidate noun/verb phrase pair is connected as a WHY of the "deal with worldwide economy" candidate noun/verb phrase pair by the Model Builder 405.
- the Natural Language Parser 403 includes several subcomponents. Various types of input 500 are all converted to text 501, ready for analysis by the Text Analyzer 502. Once analyzed, noun/verb phrase pairs can be extracted from the text as candidates for inclusion in the final model. The Text Analyzer tags the text with the same annotations used in the KMC. These are described in detail below.
- Language Structure Tools 503 are used to create these annotations, including, but not limited to, Link Grammar, WordNet and Longman's Dictionary of Contemporary English.
- the Text Analyzer then summarizes the supplied text using techniques such as, but not limited to, Shannon's Information Theory.
- the Context Engine 504 is used to ensure that cross-sentence relationships are tracked, allowing information from one sentence to be place in proper context relative to information from other sentences. Resources such as Anaphoric Resolution Tools assist in this process.
- sentences are not always self-contained units of knowledge on which the Parser Engine 505 can independently operate. For example, consider the sentence, "Click it to start the music.” The referent of the pronoun it cannot be determined without examining a previous sentence.
- the Context Engine 504 builds noun associations that are refined over plural sentences, thus allowing resolution of cross-sentence context references.
- the Text Analyzer 502 builds candidate noun/verb phrase pairs from the text, using the annotations previously built. For example, the Parts of Speech analysis - 15 -
- noun/verb phrase pairs will indicate the presence and location of verbs and nouns, which can be used to produce noun/verb phrase pairs.
- An "importance" value can be assigned to noun/verb phrase pairs at this point, using results of the summary analysis (if applied) and the word-frequency annotation. If a word or phrase appears more often than would be expected, the importance of the noun/verb phrase pair it creates is increased.
- the Parser Engine 505 receives the annotated text and linked candidate noun/verb phrase pairs 506 from the Text Analyzer 502. It then utilizes the Rules Engine 501 and Hyperknowledge Corpus 508 to produce the final linked noun/ verb phrase pair structure. Before describing the constituents of the Parser Engine 505, the Rules Engine 507 will be described.
- the Rules Engine 507 contains a list of rules that facilitate faster analysis of natural language text, leading to quicker and more accurate output of corresponding noun/verb phrase pairs. Such rules can be manually discovered, and the KMC used to develop, test and refine those invented rules. A potential rule can quickly be applied to the KMC and the number of cases where the rule works, and where there are exceptions, discovered. If the number of exceptions is low, the rule can then be accepted and further refined to deal with those exceptions.
- the Natural Language Parser 403 can begin to automatically extract patterns and rules contained within the KMC.
- the first task of the Parser Engine 505 is to take the candidate noun/verb phrase pairs, in order of importance, and find matching noun/verb phrase pairs in the KMC. Matches are found by a Matcher 508 comparing the annotations of a candidate noun/verb phrase pair with the noun/verb phrase pairs and their annotations inside the KMC. Matching patterns of linguistic and semantic data in the KMC indicate a matching noun/verb phrase pair. The noun/verb phrase pair that best matches, i.e., matches most of the annotations, is selected, and is passed into the Model Builder 509.
- Noun/verb phrase pairs in the KMC connected to this noun/verb phrase pair are then analyzed semantically, and the candidate noun/verb phrase pairs are searched for matches. If found, new noun/verb phrase pairs are created, replacing the words of the KMC noun/verb phrase pair with the words of the supplied text, whilst retaining the same semantic content. This process is iteratively repeated until a significant proportion of the supplied text is covered, and linked to noun/verb phrase pairs.
- the Rules Engine 507 is employed to assist with finding matching noun/verb phrase pairs in the KMC, or to suggest new noun/verb phrase pairs and connections. For example, consider the sentence, "He put the food in the fridge because he wanted to freeze the food.” Here, two candidate noun verb phrase pairs, identified by the Text Analyzer are "Put food in Fridge” and "Freeze food”. The presence of the because conjunction indicates that there is a likelihood that these noun/verb phrase pairs should be connected by a WHY connection. This can further be confirmed by the presence of semantically matching examples in the KMC.
- the Model Builder 509 accepts noun/verb phrase pairs from the Noun/verb phrase pair Matcher 508 and constructs a model consisting of connected noun/verb phrase pairs. Some of the noun/verb phrase pairs will match with text supplied to the parser engine, and links between these items are maintained. Other noun/verb phrase pairs will exist only in the KMC, along with a probability of the accuracy and relevance of the noun/verb phrase pair. These noun/verb phrase pairs are stored as links from the noun/verb phrase pairs in the built model to noun/verb phrase pairs in the KMC. It is then clear to see which noun/verb phrase pairs represent the structure of the supplied natural language text, and which noun/verb phrase pairs extend the knowledge not contained in the text, but already known about in the KMC.
- Figure 6 demonstrates a sample output of parsed natural language text.
- Text 600 in which noun/verb phrase pairs have been identified is displayed with underlined hyperlinks 601 for each noun/verb phrase pair. Floating the cursor over a hyperlink presents the user with a compass 602, which is used to navigate the text according the What/How/Why/What-is structure created by the ModefBuilder. This structure is further described below.
- the hyperlink 601 "designing killer apps" can be navigated in the HOW direction by clicking the blue east-pointing arrow 603. The destination will be another part of the text within the document.
- This hyperlink 601 also corresponds to a noun/verb phrase pair in the KMC that has WHY noun/very phrase pairs which is not mentioned in the supplied text. This is indicated by the hollow red west-pointing arrow 604. Following this link will take the user to a page listing the WHY noun/verb phrase pairs of the "Design Killer App" noun/verb phrase pair and further knowledge navigation can occur.
- the complete system as shown in Fig. 4, is now discussed.
- a user enters the system through a home page 406.
- the home page 406 may be implemented as a hypertext page on the World Wide Web, as a presentation using proprietary software or otherwise, as would be known to those skilled in this art.
- the home page presents the user with options for browsing a linked knowledge model 407 directly or for searching the model 407 using directional searching, as described above. Browsing the linked knowledge model 407 directly has been described in Davies.
- directional searching is invoked by a user, the parameters of the user's query are passed to a Compass Search Agent 401 which operates as described above in the section Directional Search Capability.
- the Compass Search Agent 401 therefore connects to a local linked knowledge model 406 and other search engines 407, as desired, to implement any desired level of functionality.
- a Knowledge Research Agent 402 can also be connected to the Compass Search Agent 401, for example to operate in the background. Depending on the results of searches performed by the Compass Search Agent 401, the Knowledge Research Agent 402 can continually increase the knowledge available in the linked knowledge model. Moreover, the Knowledge Research Agent 402 can be connected to receive inputs from a Natural Language Parser 403 which in turn receives inputs from new documents and information sources, including the queries made and results returned by the Compass Search Agent 401.
- Interface Agents 404 to other products and a Builder 405 may also be provided.
- Interface Agents 404 to other products have been explained above, for example in connection with researching answers to email queries.
- the Builder 405 is a conventional product for manually constructing linked knowledge models 407, and has been described in Davies and mentioned above. The Underlying Data Structure
- the output from the Natural Language Parser can be stored by a variety of mechanisms.
- One appropriate mechanism is a relational database, but alternative storage mechanisms could be employed.
- the output is similar in its structure to that of the KMC and it represents the links between the output linked knowledge model and the original Natural Language Parser input text.
- the structure also includes associations between candidate noun/verb phrase pairs created by the Model Builder and noun/verb phrase pairs in the KMC. Against each association, the percentage of accuracy of the match is stored.
- the KMC structure is implemented as a large relational database.
- This relational database consists of many relational tables, each holding information concerning individual annotations in the corpus.
- the KMC structure is not limited to that described below as more relational tables could be added to cope with new annotation types.
- the highest-level relational table lists all the noun/verb phrase pairs. Each noun/verb phrase pair has a unique key. This identifies an individual noun/verb phrase pair and allows it to be cross-referenced against entries in other relational tables. An example of a small part of this relational table is show below. The numbers in each field are the foreign keys of the other relational tables.
- Each noun/verb phrase pair is broken down into a noun phrase and a verb phrase. This results in two further relational tables, each one storing the structure of the respective phrase. To break down the phrases further, two more relational tables hold the individual verb phrase words and noun phrase words. The individual words in these tables are linked to other relational tables, which include those storing parts-of-speech, semantic mark-up, base form, word frequency, syntactic structure and more types. Also stored against each entry in the individual word tables is the position of that word in the original corpus text.
- the original corpus text is listed, word-by-word, in a single relational table. Each word in this table is again linked to the other relational tables storing the different annotation types referred to in the previous paragraph.
- Further relational tables include those storing connections between noun/verb phrase pairs, and those storing connections between words.
- Each noun/verb phrase pair stored in the main relational table has a link to a relational table storing how connections and also to a relational table storing why connections.
- Each of these relational tables stores uniquely identified connections and their respective how and why noun/verb phrase pairs.
- a relational table storing the syntactic structure of the text is also defined. This stores the types of links between individual words, i.e., subject or object links.
- An example of the noun/verb phrase pair connection table for how connections is shown below.
- Relation 4 defines the connection between the two noun/verb phrase pairs listed in the first table above.
- collocation is stored in a single relational table. Collocation is defined as the frequency of repetition of word patterns, i.e., two words appear next to each other a certain number of times within the corpus.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16709299P | 1999-11-23 | 1999-11-23 | |
US167092P | 1999-11-23 | ||
PCT/US2000/032252 WO2001039025A2 (en) | 1999-11-23 | 2000-11-22 | Methods and apparatus for storing and retrieving knowledge |
Publications (1)
Publication Number | Publication Date |
---|---|
EP1240596A2 true EP1240596A2 (de) | 2002-09-18 |
Family
ID=22605900
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP00980758A Withdrawn EP1240596A2 (de) | 1999-11-23 | 2000-11-22 | Verfahren und gerät zum speichern und wiederauffinden von wissen |
Country Status (5)
Country | Link |
---|---|
EP (1) | EP1240596A2 (de) |
JP (1) | JP2003515815A (de) |
AU (1) | AU1797801A (de) |
CA (1) | CA2392539A1 (de) |
WO (1) | WO2001039025A2 (de) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113761162B (zh) * | 2021-08-18 | 2023-12-05 | 浙江大学 | 一种基于上下文感知的代码搜索方法 |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB9322137D0 (en) * | 1993-10-27 | 1993-12-15 | Logical Water Limited | A system and method for defining a process structure for performing a task |
JP3225900B2 (ja) * | 1997-09-12 | 2001-11-05 | 日本電気株式会社 | 事象解析方法および装置 |
-
2000
- 2000-11-22 JP JP2001540619A patent/JP2003515815A/ja active Pending
- 2000-11-22 AU AU17978/01A patent/AU1797801A/en not_active Abandoned
- 2000-11-22 CA CA002392539A patent/CA2392539A1/en not_active Abandoned
- 2000-11-22 WO PCT/US2000/032252 patent/WO2001039025A2/en active Application Filing
- 2000-11-22 EP EP00980758A patent/EP1240596A2/de not_active Withdrawn
Non-Patent Citations (1)
Title |
---|
See references of WO0139025A2 * |
Also Published As
Publication number | Publication date |
---|---|
WO2001039025A3 (en) | 2002-01-10 |
AU1797801A (en) | 2001-06-04 |
JP2003515815A (ja) | 2003-05-07 |
CA2392539A1 (en) | 2001-05-31 |
WO2001039025A2 (en) | 2001-05-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6823325B1 (en) | Methods and apparatus for storing and retrieving knowledge | |
Strötgen et al. | Multilingual and cross-domain temporal tagging | |
Kowalski | Information retrieval architecture and algorithms | |
US6662152B2 (en) | Information retrieval apparatus and information retrieval method | |
JP4658420B2 (ja) | 文字列の正規化表示を生成するシステム | |
RU2488877C2 (ru) | Идентификация семантических взаимоотношений в косвенной речи | |
US20030101182A1 (en) | Method and system for smart search engine and other applications | |
Al-Zoghby et al. | Arabic semantic web applications–a survey | |
CN102253930B (zh) | 一种文本翻译的方法及装置 | |
EP2162833A1 (de) | Verfahren, system und computerprogramm für intelligente textkommentierung | |
US20040117173A1 (en) | Graphical feedback for semantic interpretation of text and images | |
JP2009048441A (ja) | 情報検索システム及び方法及びプログラム並びに情報検索サービス提供方法 | |
Pyshkin et al. | Approaches for web search user interfaces | |
AU2005202353A1 (en) | Methods and apparatus for storing and retrieving knowledge | |
Demetriou et al. | Utilizing text mining results: The pasta web system | |
WO2001039025A2 (en) | Methods and apparatus for storing and retrieving knowledge | |
KR20000063488A (ko) | 전자화된 문서의 의미적 지식 데이터베이스 자동구축장치와 방법 및 그 기록매체 | |
KR20200122089A (ko) | 지역 색인을 이용한 전자문서 검색 방법 및 장치 | |
JP2000105769A (ja) | 文書表示方法 | |
Amitay | What lays in the layout | |
Shidha et al. | Chem Text Mining-An Outline | |
JPH1145269A (ja) | 文書管理支援システムおよびそのシステムとしてコンピュータを機能させるためのプログラムを記録したコンピュータ読み取り可能な記録媒体 | |
Chanod | Natural language processing and digital libraries | |
JP3281361B2 (ja) | 文書検索装置及び文書検索方法 | |
Wouda | Similarity between Index Expressions |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20020614 |
|
AK | Designated contracting states |
Kind code of ref document: A2 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE TR |
|
AX | Request for extension of the european patent |
Free format text: AL;LT;LV;MK;RO;SI |
|
RAP1 | Party data changed (applicant data changed or rights of an application transferred) |
Owner name: DAVIES, TREVOR BRYAN |
|
RIN1 | Information on inventor provided before grant (corrected) |
Inventor name: DAVIES, TREVOR BRYAN |
|
17Q | First examination report despatched |
Effective date: 20050107 |
|
RIN1 | Information on inventor provided before grant (corrected) |
Inventor name: DAVIES, TREVOR BRYAN Inventor name: TROMANS, CHRISTOPHER R. Inventor name: HANLON, MARK D. |
|
RAP1 | Party data changed (applicant data changed or rights of an application transferred) |
Owner name: HYPERKNOWLEDGE MANAGEMENT SERVICES AG |
|
GRAP | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOSNIGR1 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 20071212 |