WO2005093601A2

WO2005093601A2 - Multi-agent text analysis

Info

Publication number: WO2005093601A2
Application number: PCT/GB2005/001103
Authority: WO
Inventors: George Rzevski; Petr Skobelev; Igor Minakov
Original assignee: Magenta Corporation Ltd
Priority date: 2004-03-26
Filing date: 2005-03-23
Publication date: 2005-10-06
Also published as: GB2412451A; WO2005093601A3; GB0406866D0

Abstract

A method of operating a computer system to generate a semantic descriptor representing a piece of text, the method comprising: instantiating a plurality of agents as executable programs, each agent representing a word in said piece of text, wherein each agent is operate to exchange messages with another agent and comprises a decision engine for executing a decision making process based on information from an ontology database and messages from said other agent to implement a relation with said other agent and to determine whether the relation is satisfactory or not based on grammatical rules stored in the ontology database.

Description

AUTOMATED TEXT ANALYSIS

The present invention is concerned with automated text analysis and is particularly but not exclusively concerned with software agents and a computer system for executing such agents for implementing text analysis.

Text analysis, in particular text understanding, is required in a variety of circumstances. Possible applications of text understanding include: • Written communication between people and computers Written communication among computers Software translators Text referencing engines Search engines • Auto-abstracting engines Annotation and classification systems Document-flow management systems

Despite a considerable research effort in areas such as computer linguistics, artificial intelligence and neural networks the problem of text understanding by computers has not been previously solved. The reason for this failure lies in the fact that the previously proposed solutions to this problem have always been strictly centralised, sequential and static.

One object of the present invention is to provide an improved method of text analysis, in particular text understanding.

One aspect of the invention provides a method of operating a computer system to generate a semantic descriptor representing a piece of text, the method comprising: instantiating a plurality of agents as executable programs, each agent representing a word in said piece of text, wherein each agent is operable to exchange messages with another agent and comprises a decision engine for executing a decision making process based on information from an ontology database and messages from said other agent to implement a relation with said other agent and to determine whether the relation is satisfactory or not based on grammatical rules stored in the ontology database.

Another aspect of the invention provides a computer system configured to generate a semantic descriptor representing piece of text, the system comprising: a plurality of word agents implemented as executable programs, each comprising at least one property defining a category of a word represented by the agent; an ontology database storing grammatical rules defining acceptable relationships between words of different categories, wherein the word agents are operable to negotiate by exchanging messages, said messages containing said at least one property corresponding to the category of the word represented by the agent so that a decision engine can establish a relation based on said grammatical rules; and a store for holding a semantic descriptor generated by the word agents.

A still further aspect of the invention provides a method of automated comparison of two pieces of text, each piece of text being represented by a semantic descriptor in the form of an ontology network with objects representing categories of words in the text and relations based on grammatical rules linking said objects, the method comprising: determining kinships between objects and relations in the semantic descriptors of the pieces of text and assigning values, said values representing an indicator of how closely related the two pieces of text are.

The main idea of the approach proposed herein is that a software agent is assigned to each word of a section of the text under consideration. Agents have access to a comprehensive repository of knowledge (the ontology database) about possible meanings of words in the text, and engage in negotiation with each other until a consensus is reached on meaning of each word and each sentence. In some cases, the method may discover several contradictory meanings of a sentence. The conflict can then be resolved by an agent-triggered consultation with the user and consequent updating of the repository of knowledge.

In the preferred embodiment, to simplify the process of extracting meanings, the method preforms an initial morphological and syntactic analysis of the text, followed by semantic analysis. After that, a pragmatics program implements user-defined applications using the semantics descriptor which has been generated.

In the embodiment described in the following, the following features are present:

Decision making rules are specified in ontology, which incorporates general knowledge on text understanding, language oriented rules and specific knowledge on the problem domain.

Every word in the text under consideration is given the opportunity to autonomously and practically search for its own meaning using knowledge available in the ontology database.

Tentative decisions are reached through a process of consultation and negotiation among all words.

The final decision on the meaning of every word is reached through a consensus among all words.

Semantic descriptors are produced for individual sentences and for the whole text.

The extraction of meanings follows an autonomous trial and error pattern (self- organisation).

The process of meaning extraction can be regulated by modifying ontology. For a better understanding of the present invention and to show how the same may be carried into effect, reference will now be made by way of example to the accompanying drawings in which:

Figure 1 is a schematic block diagram of an architecture for implementing text analysis; Figure 2 is a schematic diagram illustrating the process flow; Figure 3 is a block diagram of an agent architecture; Figure 4 is a block diagram of the architecture of an agent body; Figure 4A is a flow chart illustrating agent operation; Figure 5 illustrates a piece of text to be analysed; Figure 6 illustrates a first morphological analysis phase; Figure 7 illustrates a second, syntactical analysis phase; Figure 8 illustrates a displayed semantic descriptor of one sentence; Figure 8A is a schematic diagram of an ontology network; Figures 8B and 8C are specific examples of a stored semantic descriptor; Figure 9 illustrates a displayed semantic descriptor of an abstract; Figure 10 is a displayed semantic descriptor of the abstract; Figure 11 illustrates a semantic descriptor of a search enquiry; Figure 12 is an example of ontology kinship; Figure 13A illustrates a ranking of analysed abstracts; and Figure 13B illustrates the comparison of semantic descriptors of the analysed abstracts.

In the following description the terms below are used and so they are defined here for ease of the reader. It is to be noted that agents which are suitable for use in the present invention are described fully in WO 03/067432, and are described herein to the extent necessary to understand the present invention.

An "agent" is a software object capable of contributing to the accomplishment of a task by: Accessing domain knowledge Reasoning about its task Composing meaningful messages Sending them to other agents or humans Interpreting received messages Making decisions based on domain knowledge and collected information Acting upon decisions in a meaningful manner

A "multi-agent system" is a system consisting of agents competing or cooperating with each other with a view of accomplishing system tasks. The main principle of achieving goals within such system is a negotiation among agents, aimed at finding a balance between many different interests of individual agents.

"Ontology" is a conceptual description of a domain of the virtual world under consideration. Concepts are organised in terms of objects, processes, attributes and relations, thus forming a "semantic network". Values defining instances of concepts are stored in associated databases. Concepts and values together form the domain knowledge.

A "syntactic descriptor" is a network of words linked by syntactic relations representing a grammatically correct sentence.

A "semantic descriptor" is a network of grammatically and semantically compatible words, which represents a computer readable interpretation of the meaning of a text. A semantic ontology describes all possible meanings of words in a domain, and a semantic descriptor describes the meaning of a particular text.

"Self-organisation" is the capability of a system to autonomously, i.e. without human intervention, modify existing and/or establish new relationships among its components with a view to increasing a given value or recovering from a disturbance, such as an unexpected addition or subtraction of a component. In the context of text understanding any autonomous change of a link between two agents representing different meanings of words is considered as a step in the process of self-organisation.

"Evolution" is the capability of a system to autonomously modify its components and/or links in response, or in anticipation of changes in its environment. In the context of text understanding any autonomous update of ontology based on the newly acquired information is considered as a step in the process of evolution.

Before describing the preferred embodiments of the invention, a brief explanation of the underlying agent architecture will be given. A virtual world is, as the name suggests, an artificial context that is created in an attempt to simulate a real context. The virtual world in the case of the present invention is a context where agents representing words interact to determine the meaning of a text. These agents are software objects which are capable of interpreting information received from other agents and from external sources like environment events or dialog with a user.

The virtual world is created using agents having properties and attributes which can establish relations with one another by identifying potential partners using identifiable characteristics. Such a matching process is called a negotiation. A relation between agents is only established if all agents to the negotiation agree and the agreement will be granted only if the proposed relation meets predefined criteria. In the present case, a matching of certain characteristics between agents representing words is based on the grammatical and syntactical rules of language. Agents communicate with each other and with the virtual world by "messages".

Figure 1 is a schematic diagram illustrating a text analysis system in accordance with one embodiment of the invention. Reference numeral 2 denotes a piece of text which can be in any appropriate electronic format. The text can start out in paper format and be converted to electronic format by a user entering it into a computer manually, or by some kind of optical character recognition process. The electronic format 32 of the text is supplied to a processor 34 which is connected to a memory 42 which is shown as a single memory block, but could be one or more memory blocks according to any particular design criteria. The memory 42 holds an ontology database 28 and a set of code sequences representing programs executable by the processor 34. The programs include a morphological analysis program P1 , and syntactical analysis program P2, and semantic analysis program P3, a pragmatics program analysis P4, an agent creation program P5 and a system dispatcher P6. The programs P1 to P3 will be grouped together and referred to herein as the text understanding programs. They share some features in common as will become clear later. The processor 34 is connected to a display 36 and to a user interface 38. The user interface can take the form of any known user interface, for example a keyboard and/or mouse and/or press/touch display. The text 32 is analysed by the text understanding processes P1 , P2 and P3 which are executed on the processor 34 in a manner which will be described in more detail in the following and according to the pragmatics analysis P4, the results are displayed to a user on the display 36. In the case of uncertain or unsatisfactory results, these can be flagged to a user on the display 36 and a user can use the user interface 38 to modify the results.

The aim of the text analysis is to generate a semantic descriptor of the text 32 which can then be used for other purposes, either to display a meaning of a text to a user, for example in another language, or for comparison with similarly computer generated semantic descriptors of the text for search and comparison purposes. This is indicated diagrammatically in Figure 1 as a semantic descriptor 40 being the output of the process.

The processes which are executed in the processor 34 are described in detail below. Figure 2 is a schematic diagram illustrating the data flow. Figure 2 illustrates a plurality of agents A1 , A2, A3a, A3b which are assigned to each of a plurality of words w1 , w2, w3 in a sentence. Each agent can access the ontology database 28 to acquire knowledge relating to morphology, syntax or semantics depending on the stage of the text analysis process that the agent is implementing. The agents A1 ... A3b are created by the agent creation program P5. The agent creation program is executed on the processor 34 (Figure 1 ). Once created, agents are stored in the memory 42.

The text analysis process will now be described and consists of four stages:

• Morphological analysis • Syntactic analysis • Semantic analysis • Pragmatics

The text is divided into sentences. Sentences are fed into the text analysis process one by one. In each of the following processes, agents negotiate with each other to transmit messages defining their properties and attributes. Values are assigned according to the results or previous negotiations of agents, le. how good was the solution which agents found. If in the sentence all agents agreed on their corresponding meanings and the meaning of the sentence, then the value is high. If there are some contradictions and conflicts, which were not resolved, then the value is lower, depending on the number of conflicts and degree of contradiction. In that case a new negotiation can be carried out to try and increase the value. The value is stored locally with individual agents on the ontology database.

Morphological Analysis (P1)

1. An agent A1 ... A3b is assigned to each word in the sentence.

2. The agents A1 ... A3b access the ontology database 28 and acquire relevant knowledge on morphology (at morphology section 28M). 3. The agents execute morphological analysis of the sentence and establish characteristics of each word, such as gender, number, case, tense, etc. 4. If morphological analysis results in polysemy, i.e. a situation in which some words could play several roles in a sentence (a noun or adjective or verb), several agents are assigned to the same word each representing one of its possible roles. This is shown with word w3 which is assigned agents A3a, A3b.

Syntactical Analysis (P2)

5. The agents A1 ... A3b access the ontology database 28 and acquire relevant knowledge on syntax (from the syntax portion 28sγ).

6. The agents execute syntactical analysis where they aim at identifying the syntactical structure of the sentence. For example, a Subject searches for a Predicate of the same gender and number, and a predicate looks for a suitable Subject and Objects. Conflicts are resolved through a process of negotiation. A grammatically correct sentence is represented by means of a Syntactic Descriptor 31.

7. If results of the syntactical analysis are ambiguous, i.e. several variants of the syntactic structure of the sentence under consideration are feasible, each feasible variant is represented by a different Syntactic Descriptor.

Semantic Analysis (P3)

8. The agents A1 ... A3b access the ontology database 28 and acquire relevant knowledge on semantics (from the semantics portion 28SE)- 9. Each grammatically correct version of the sentence under consideration is subjected to semantic analysis. This analysis is aimed at establishing the semantic compatibility of words in each grammatically correct sentence. The agents learn from the ontology database 28 possible meanings of words that they represent and by consulting each other attempt to eliminate inappropriate alternatives. 10. Once agents agree on a grammatically and semantically correct sentence, they create a Semantic Descriptor 33 of the sentence, which is a network of concepts and values contained in the sentence.

11. If a solution that satisfies all agents cannot be found, agents compose a message which is displayed to the user on display 36 explaining the difficulties and suggesting how the issues could be resolved.

12. Each new grammatically and semantically correct sentence generated by the steps 1 to 11 is checked for semantic compatibility with Semantic Descriptors of preceding sentences. In the process agents may decide to modify previously agreed semantic interpretations of words or sentences (self-organisation).

13. When all sentences are processed, the final Semantic Descriptor 40 of the whole document is constructed thus providing a computer readable semantic interpretation of the text.

Pragmatics (P4)

14. The agents access the ontology database 28 and acquire relevant knowledge on pragmatics, which is closely related to the application at hand.

15. At this stage agents consider their application-oriented tasks and decide if they need to execute any additional processes. For example, if the application is a Person - Computer dialogue agents may decide that they need to ask the user to supply some additional information; if the application is a Search Engine, agents will compare the Semantic Descriptor of the search request with Semantic Descriptors of available search results. If the application is a Classifier, agents will compare Semantic Descriptors of different documents and form groups of documents with semantic proximity.

Figure 3 is a schematic diagram of a basic agent architecture. An agent A comprises two main parts: an agent descriptor 4 and an agent body 6. The agent descriptor 4 is associated with a property store 8 which holds the word type of the word which the agent represents after this has been accessed from the morphology part of the ontology database 28. The word type defines the morphological and syntactical properties of the word, e.g. object, subject, noun, verb, etc. If necessary, more than one property can be stored for an object. The property store 8 is associated with a set of attributes 10 which represent the characteristics, e.g. gender, number, etc of the word which is represented by the agent. The agent A also has an agent body 6 which comprises a set of elements common to all agents and which has access to the ontology database 28. Figure 4 shows the components of the agent body 6. The agent body has sensors 20, actuators 22, a scene memory 24, a decision making machine 26, a fact memory 30 and a command memory 14. In the present embodiment, the sensor 20 is a vision sensor which an agents uses to read a current scenario and receive simple data without time-consuming negotiations with other agents. That is, the visual sensor 20 allows an agent to interact directly with other agents by viewing any open data fields. A visual sensor is the mechanism which is used by agents to read the open data fields of an agent descriptor 4. Typically, the sensing mechanism consists of a software procedure and data structure built into the agent body. Alternatively, the vision sensor mechanism can be transferred to an agent upon request from a base class held elsewhere in the system.

In the present embodiment, the actuator 22 represents a means of accessing an agent's database. In the present case, the actuator 22 takes the form of a software procedure and data structure which allows an agent to send messages to another agent so as to communicate with them to establish relations. The scene memory 24 holds information about other agents in the environment with whom the present agent might wish to establish relations. The decision making machine 26 is the core of the agent body and interfaces with the other components of the agent body. It uses its own knowledge to make a decision based on its received inputs and to implement the required course of action to be output. In reaching a decision, the decision making machine 26 can select from the command memory 14 the required process for a particular situation. Each of these stages is held in the form of a code sequence in the command memory (forming part of memory 42) to be implemented by the agent under the control of the decision making machine. These stages (morphology, syntax etc) are stages not for each word individually, but for the sentence as a whole. Therefore first all agents are going through the morphology stage, establishing morphological properties; after that they go to the syntactical analysis stage and decide on the syntax of the sentence, etc. The fact memory 30 stores all facts which were found by agents during each stage, i.e. best results out of several possible options, etc, in particular, in the fact memory 30.

The operation of each agent is based on a clock cycle which for example in the present embodiment is 300 μs. This clock cycle is allotted to all agents by the system dispatcher P6. The system dispatcher can be considered to be a unique and independent software object, which is executed on the processor to organise the activities of the agents.

Figure 4A shows a flow chart describing the steps of an agent operational cycle. Step S40 indicates the start of the agent operational cycle when an agent receives a clock edge from the system dispatcher. All of the active word agents operate in synchronisation by working in parallel for each clock cycle. At step S44 each agent decides whether any important events have occurred, for example the creation of new word agents. This decision is affected by the input sensors where in Figure 9 a simple agent having a mailbox input sensor S42 is shown. Each agent performs a check on whether the clock cycle has run out from step S60 and receives any inputs from the mailbox at S42. If there is a new event in the scene then at step S46 the current scenario is delayed and at step S48 a new scenario is selected based on the agent ontology 28. A scenario is a set of information or data defining one possible view of the virtual world. Scenarios are used for negotiation procedures (giving different abilities to create "groups" of agents), or for selecting best option when agents have found several alternatives. If there are no new events at step S44, the agent then proceeds to step S54 where it chooses the next command of the current scenario. Also at the end of step S50 once the new scenario has been executed the agent moves to step S54. At step S56, the next command of the current scenario is executed and output to an email actuator mechanism 22. At step S60 the agent checks whether the clock cycle has expired. If not, then the algorithm is returned to step S44 where the agent checks whether any other important events have entered the scene. If the clock has expired, step S62 stipulates that the end of the agent operational cycle has been reached.

A specific example of text analysis will now be given. Figure 5 illustrates a text 32 of a particular article. The text includes a number of headings H1 , H2 etc each heading associated with a text portion TP1 , TP2 etc. Some of the text portions

(e.g. TP3, TP5) have more than one sentence, the sentences being labelled S1 ,

S2 etc in each text portion. Consider the sentence in the text portion TP2.

Initially each word is assigned to an agent. This is shown in Figure 6. The agents A1 ... A7, access the morphology part of the ontology database 28 which returns properties and attributes associated with each of the words. For example, the word "containing" would return a property type of "verb", with an attribute of

"gerund". The property type is stored in the property store 8 and the attributes are stored in the attribute locations 10 as illustrated in Figure 3. A similar process is carried out for each of the other words. The agents then interact to form relations using information recalled from the syntax part of the ontology database.

Figure 7 is a schematic diagram illustrating the syntactical analysis stage. At this level the co-operation of words is aimed at defining their syntactical roles in the sentence ('subject', 'predicate'). For example, rules for deciding what a "subject" is are given in the ontology database 28. Rules stipulate that a 'subject' must search for a "predicate" of the same gender and number. A "predicate" looks for a 'subject' and for various types of Objects'. On the basis of such rules, a word decides which other words it can be combined with, and what characteristics they should possess for that to happen. If the match is satisfactory for both parties, then the value of this match is greater. Then the new process of negotiation begins with agents looking for a 'pair' of words. Here, as before, several alternatives of the syntactic structure of the phrase may be found, when several words respond to a request of the subject agent. The result of this analysis is a phrase (or variants of a phrase), which is represented as a network with each word agent viewed as a member of a sentence - subject, predicate, object.

After the syntactical analysis has been carried out, semantic analysis is carried out with reference to the semantic part of the ontology database 28. This generates a semantic descriptor for the text portion TP2 which is displayed in the form shown in Figure 8. The semantic descriptor takes the form of a network of linked concepts and values, and is stored in two formats. The graphical display is stored as a binary stream, while the concept and values are stored in the form of an ontology network 50 shown in Figure 8A. Figure 8A is a general schematic diagram of the structure of the ontology network 50 as held in the ontology database 28. This structure can be applied to a number of different applications, and in the case of text analysis it is applied in relation to a semantic descriptor as will now be described. For each object 52 its class in the ontology is specified, it is given a unique identifier (in case there are several instances of the same class in the scene), and values of known properties of each object are specified. These properties are labelled 54 in Figure 8A. Then, all instances of the relations relating to the object need to be specified. A relation 56 is shown in Figure 8A associated with the object 52 and linking to the instances of the relation 58 where the class of the relation is specified. For members of the relation instance identifiers of the objects which the particular relation links are specified, and their corresponding roles in the relation (subject to object of the relation) are also specified. This is done using the relation descriptor 60 with its properties 62, and the subject and objects of relation 64.

Figure 8B shows a specific example of scene objects storage for each of the objects 52. In the example of Figure 8 there are two objects 52A, 52B which are represented by the nodes N which are labelled Locus 1 and Gene 2. The object Locus 1 has no property, while the object Gene 2 has a first property 54A Reporter Value - True and the second property 54B name Value = xylE. Figure 8C illustrates the scene relations storage showing a relation 56 which is the relation Have with the Instance's identifier being 4 and has a subject of the relation, 64A Locus 1 and an object of the relation 64B Gene 2. As can be seen from Figure 8A, the subject and object of the relation 64A, 64B link back to object classes, in this case to the class instances 52A, 52B where the properties are defined.

The semantic descriptor as illustrated takes the form of a network of nodes N interconnected by links L. The nodes N that are illustrated in Figure 8 are Locus

1 , Have 4 and Gene 2. Each node consists of a word identifying the nature of the node and a reference numeral which acts as the unique instance identifier, because in general in the scene there could be several objects or relations of the same class with different attribute values, so these numerals are included to make them visually different. Thus, this semantic descriptor indicates the relations between the gene and locus and gene properties. That is, the gene has the property Reporter=TRUE (see display field 35), and the gene has the gene name "xy1 E" (see display field 38). Thus, by establishing relations between the word agents, the sentence has been completely understood and the relations between the gene and locus and gene properties have been determined.

The system proceeds to analyse the remainder of the text in the same way, sentence by sentence. It is possible that analysis of a subsequent sentence may throw some light on the semantic descriptor of a preceding sentence. Figure 9 illustrates the semantic descriptor of the text during analysis of the final sentence, S4, of the abstract. In this case, the following links indicate connections that were added to the semantic descriptor during the analysis of this last sentence. The link between Gene 2 and Have 44, the link between Gene 2 and Insert 41 , the link between Locus 31 and Have 44, the link between Locus 31 and Have 52 and Operon 45, and the links between Downstream from 49, Operon 45 and Result 47. As a result of the analysis of the last sentence, the system discovered some new concepts and new relations between the existing nodes of the descriptor including a new relation "Have" between the gene and the locus. Furthermore, the gene has obtained a new Insert relation and the relation "Have" has been established between the locus and the new node, Operon.

Figure 10 illustrates the display 36 showing the semantic descriptor of the abstract, that is the text portion TP5. Once again, the semantic descriptor takes the form of a plurality of nodes each node representing a concept. For example, the node which is labelled "Vector" has a name "pUC-derived" and this has a gene which is the node labelled Gene in Figure 10 with the properties promoterless=true, name=xylE and the property Transcribed=true (see fields 35, 35', 38). The semantic descriptor indicates that the gene encodes (see the node labelled Encode which is linked to the node labelled Gene) the polymerase named 3-dioxygenase, which belongs to the bacteria=Pseudomonas putida. This can be seen by the node labelled bacterium which is connected by a link to a node labelled Have which is connected by a link to the node labelled Polymerase. The gene also has the terminator with the property Transcribed=true (see the node Have connected between the node labelled Gene and the node labelled Terminator), which is linked to lambda phage Note that the display program allows showing properties of only one object at a time. On the screen of Figure 10 one can see only one object with properties - and it's a Gene= xylE, Promotorless, Transcribed. Other properties would show only if the program selected the corresponding node, by a mouse click or any suitable selection procedure.

Once the semantic descriptor has been created and stored in the form of an ontology network 50, it can be utilised for a number of different purposes. In the following example, it forms one of a number of abstracts which can be searched using a search semantic descriptor. In order to conduct such a search, a search semantic descriptor needs to be formulated which represents the inquiry. Figure 11 shows the semantic descriptor of a request to search for abstracts in which an Organism is connected with a Sequence through the relation Have. This semantic descriptor is compared by matching with the semantic descriptors of a number of abstracts. Comparing a semantic descriptor of a query with a semantic descriptor of a text is essentially a comparison of two semantic networks. A search is carried out for sets of concepts and relations, which are isomorphic or, at least, exhibit a level of kinship (partial match). Levels of kinship between concepts are specified in ontology which will now be discussed in more detail with reference to Figure 12. There are several types of ontology kinship. Figure 12 represents a part of the problem domain ontology. In the scene there are instances of classes of ontology concepts, and the kinship is to be determined on the basis of the ontology.

Family - means that two instances of object in the scene have one common ascendant in the ontology tree (i.e. Activation site, Terminator and Operon belong to the same family of sequence class).

Brother - means that two instances of objects in the scene have common ascendant, and these objects are located on one level of hierarchy (i.e. Locus, Operon, Transcript, Gene and Site are brothers. Other brothers illustrated in Figure 12 are for example Promoter, Operator, Activation site, Repression site.

Parent-Child - means that one class is a direct descendent of the other class (e.g. Site for Sequence, Binding site for Site and Promoter for Binding site).

Heir - means that one of the classes is a descendent of the other class (differing from the parent-child kinship because it does not need to be a direct descendent but could be multiple levels. Examples would be Operator for Site or Promoter for Sequence.

Uncle - means that one class is a brother of the parent of the other class (for example Operon for Terminator).

Equality - means that two instances of objects in the scene belong to the same class of the ontology, i.e. it is an exact match. Using these types, a value is assigned for the validity of each type, so that it is possible to calculate the distance between any two concepts in the ontology. This assists in calculating the distance between two semantic descriptors by calculating distances between partially corresponding objects and making a sum of these distances. In this way, values can be determined for semantic descriptor matching.

The comparison is executed by agent negotiations, with an agent representing each concept in the semantic descriptor where each Concept Agent of each descriptor is trying to find its best match among other semantic descriptors.

A concept object has the same structure as a word agent, but stores properties, characteristics and attributes of a concept instead of a word and has different code sequences in its command memory to implement matching.

There are standard criteria for determining kinships in textual analysis. In the present case, the search for isomorphism or kinship is done by agents. Figure 13A shows the ranking of analysed abstracts according to the degree of matching to the inquiry. Figure 13B shows the comparison of semantic descriptors of analysed abstracts, showing similar concepts in the selected semantic descriptor to that of the inquiry.

Claims

CLAIMS:

1. A method of operating a computer system to generate a semantic descriptor representing a piece of text, the method comprising: instantiating a plurality of agents as executable programs, each agent representing a word in said piece of text, wherein each agent is operable to exchange messages with another agent and comprises a decision engine for executing a decision making process based on information from an ontology database and messages from said other agent to implement a relation with said other agent and to determine whether the relation is satisfactory or not based on grammatical rules stored in the ontology database.

2. A method according to claim 1 , wherein in a first phase each agent determines from a morphology section of the ontology database the category of words that it represents.

3. A method according to claim 2, wherein there is associated with at least one category at least one attribute for the word represented by the agent.

4. A method according to claim 2 or 3, wherein in a second phase each agent determines from a syntax section of the ontology database a subset of said grammatical rules which govern the syntactical relation between words represented by the agents.

5. A method according to claim 4, wherein in a third phase each agent determines from a semantics section of the ontology database a further subset of the said grammatical rules which govern semantic relationships between words represented by the agents.

6. A method according to claim 1 , which further comprises the step of using the semantic descriptor in a user defined application.

7. A method according to claim 6, wherein said user defined application is a search application.

8. A computer system configured to generate a semantic descriptor representing piece of text, the system comprising: a plurality of word agents implemented as executable programs, each comprising at least one property defining a category of a word represented by the agent; an ontology database storing grammatical rules defining acceptable relationships between words of different categories, wherein the word agents are operable to negotiate by exchanging messages, said messages containing said at least one property corresponding to the category of the word represented by the agent so that a decision engine can establish a relation based on said grammatical rules; and a store for holding a semantic descriptor generated by the word agents.

9. A computer system according to claim 8, wherein the ontology database comprises a morphology section for use in assigning categories to words represented by agents.

10. A computer system according to claim 8 or 9, wherein the ontology database comprises a syntax section holding a subset of said grammatical rules which govern the syntactical relation between the words represented by the agents.

11. A computer system according to claim 8 or 9, wherein the ontology database comprises a semantics section which holds a subset of said grammatical rules which govern semantic relationships between words represented by the agents.

12. A computer system according to claim 8, wherein the store for holding the semantic descriptor has a structure in the form of an ontology network which has objects representing said word categories held in association with said relations.

13. A computer system according to claim 8, which comprises a display for displaying a graphical format of said semantic descriptor in the form of nodes and links.

14. A computer program product comprising program code means in the form a sequence of computer instructions which, when loaded into a computer, implement the steps of independent claim 1.

15. A method of automated comparison of two pieces of text, each piece of text being represented by a semantic descriptor in the form of an ontology network with objects representing categories of words in the text and relations based on grammatical rules linking said objects, the method comprising: determining kinships between objects and relations in the semantic descriptors of the pieces of text and assigning values, said values representing an indicator of how closely related the two pieces of text are.