WO2002049253A2

WO2002049253A2 - Method and interface for intelligent user-machine interaction

Info

Publication number: WO2002049253A2
Application number: PCT/IL2001/001164
Authority: WO
Inventors: Ofer Alt; Simon Rapoport; Oren Shamir; Ilya Knyazhansky
Original assignee: Poly Information Ltd.
Priority date: 2000-12-14
Filing date: 2001-12-13
Publication date: 2002-06-20
Also published as: IL140322A0; WO2002049253A3; AU2002222491A1

Abstract

A method for allowing interaction between a user and a computerized system (202) by using human natural language and/or textual data exchange. A conversation domain consisting of a plurality of phrases having valid logical meaning is generated. Each phrase corresponds to an aspect of the operation of, and/or the goods/services provided by, the computerized system (202). Data is exchanged between the user and the computerized system (202), and the computerized system (202) is operated by using at least one of the phrases.

Description

METHOD AND INTERFACE FOR INTELLIGENT USER-MACHINE

INTERACTION

Field of the Invention

The present invention relates to the field of human-machine interfaces. More particularly, the invention relates to a method and interface for intelligent user-machine interaction in natural language.

Background of the Invention

In recent years, man has been trying to communicate vocally with machines. The ability to communicate in such a way has many advantages in several fields. The many advantages of accessing computerized systems through voice in natural language are considered as common knowledge today.

In recent years, several technological attempts have been made to access computerized systems by using human voice with different products, such as Voice extensible Markup Language™ (VoiceXML™ ) by VoiceXML™ Forum (founded by AT&T, IBM, Lucent Technologies and Motorola; the Forum web site is: http://www.voicexml.org/), Nuance Grammar Builder™ (Nuance Communications 2000, Menlo Park, CA, USA) etc. Such technological attempts are, among others, Voice Recognition (VR), Automatic Speech Recognition (ASR), and Text-To-Speech conversion.

In VR technology, the system tries to recognize the voice of the user and to react according to the user's orders. The problem with such technology is that the computer must learn the tones of every user's voice in lab condition, in order to correctly identify the voice of the user, in order to correctly interpret his meaning. This technology, obviously, is not suitable in wide public domain. In ASR technology, the system is able to recognize the user's voice without teaching the system different user voices in lab condition. However, such technology is limited only to predetermined sets of sentences. ASR technology has a set of predetermined databases, such as Names, Addresses, Numbers that are compared with the answers of a user. ASR technology guides the conversation with the user, and waits for very specific answers. No Artificial Intelligence is used in ASR technology.

Additional technology that has been developed is the Interactive Voice Reply (IVR), which is actually a menu that gives the user the ability to choose between two or more possibilities in each step of the conversation. This technology is very similar to the ASR technology by the predetermined and limited set of sentences.

All major breakthroughs in recent years have evolved around a module called Speech-To-Text (STT) and Text-To-Speech (TTS). This core technology acts as a translator between a human voice and the written text in the computer. For example, when a person says "happy", it is translated by the module from the acoustic environment to text in a computer. However, the SST and TTS modules do not have any intelligence, but are simple translators between the acoustic environment and the written computer environment. The main breakthrough recently has been around the accuracy and reliability of such technologies. According to this technology of STT and TTS, several applications have been developed.

One of the applications developed according to the STT and TTS module for conversation between man and machine is known as the 'structured' conversation. In such an application, the machine checks its database for text, which is identical to the text that was given to IT by the user. The machine directs the conversation so that in each step of the conversation it knows exactly what kind of data the user inputs, and checks for identical text. Such modules can be used, for example, in phone directories, wherein the application is started first with the conversation by the service computer and 'structured' by the user, as in the following:

Computer: 'please say last name'

User: 'Bond'

Computer: 'please say first name'

User: 'James'

And so on.

Unfortunately, none of the known technologies and application succeeded in creating a fluent conversation between man and machine.

All the methods described above have not yet provided satisfactory solutions to the problem of providing a computerized system that can verbally/textually communicate freely with a user.

It is an object of the present invention to provide a method and interface for intelligent user-machine interaction with an ability to communicate with a user in natural language.

It is another object of the present invention to provide a human-machine interface with a level of artificial intelligence.

Other objects and advantages of the invention will become apparent as the description proceeds.

Summary of the Invention

The present invention is directed to a method for allowing interaction between a user and a computerized system by using human natural language and/or textual data exchange. A conversation domain that consists of a plurality of phrases having valid logical meaning is generated. Each phrase corresponds to an aspect of the operation of, and/or the goods/services provided by the computerized system. Data exchange between the user and the computerized system, and operation of the computerized system are allowed by using at least one of the phrases.

Preferably, a verbal and/or textual input from the user that matches one of the phrases is received. Whenever required, the input is converted into textual ^' data. The context of the textual data is analyzed by associating selected words and their logic relations, obtained from the input, with predetermined set of words, stored in a first accessible database, and restricted by the operation of, and/or the goods/services provided by, the computerized system, and by accurately obtaining the idea expressed by the textual data. The idea is associated with a set of keywords representing the goods/services, stored in a second internal/external accessible database. A search for obtaining data representing the goods/services is carried out in the second internal/external accessible database. Information related to the search results is obtained, according to the idea. Transacting with the computerized system may also be carried out, according to the idea, a textual response phrase that represents a selected record(s) from the search and/or transaction results, is synthesized by using words, selected from the first and/or the second databases, according to the context of the idea. Whenever required, the textual response phrase is converted into speech, to be played to the user, and/or to be displayed to the user.

The context analysis of the textual data and/or the association operation(s) may be performed by an internal/external application employing artificial intelligence. The search and /or transaction operation(s) may be performed by an internal/external application. Preferably, each idea is represented by a plurality of physical and/or abstract objects and relations between them, that belong to an ontology component, associated with a predicted reality of a user, while interacting with the computerized system. The textual response phrase may be synthesized by an answer generator that uses grammar templates, that are associated with the logic determined from the input and by a combination of objects and their corresponding relation, that are selected by resolving the logic. A dialog with the user may be performed, whenever the context of the input cannot be properly resolved.

Preferably, the conversation domain is generated by defining an ontology domain consisting of a plurality of physical and/or abstract objects and the relations between the objects, that belong to the reality associated with a predicted reality of a user, while interacting with the computerized system, as well as defining a plurality of logic patterns from the objects and their relations, each of the logic patterns consisting of a combination of selected objects and their corresponding relations. All the objects in the ontology domain are sorted and/or grouped according to criteria determined by the logic patterns. Sorted and/or grouped objects are forwarded to a lexical parser that generates different phrases using lexical templates, inflection and thesaurus for formatting the objects.

The present invention is also directed to an interface for allowing interaction between a user and a computerized system, by using human natural language and/or textual data exchange, operating according to the method described herein above.

The present invention is further directed to a computerized system capable of interacting with a user and by using human natural language and/or textual data exchange, operating according to the method described herein above. Brief Description of the Drawings

The above and other characteristics and advantages of the invention will be better understood through the following illustrative and non-limitative detailed description of preferred embodiments thereof, with reference to the appended drawings, wherein:

Fig. 1 is a block diagram of a computerized system operated by voice, according to a prior art;

Fig. 2 is a block diagram of a computerized system with an enhanced man-machine interface operated by voice, according to a preferred embodiment of the invention;

Fig. 3 is a flow chart of an process for a computerized system with enhanced man-machine interface operated by voice, according to a preferred embodiment of the invention;

Fig. 4 is block diagram of a Grammar Generation unit for a computerized system with enhanced man-machine interface operated by voice, according to a preferred embodiment of the invention; and

Fig. 5 is block diagram of a man-machine conversation system according to a preferred embodiment of the invention.

Detailed Description of Preferred Embodiments

Fig. 1 is block diagram of a conventional computerized system 100 operated by voice. A human voice 104 is received as input to an STT module 101, such as a Speech Recognition module, which translates the human voice 104 into digital data, that can be converted into textual characters. The translated digital/textual data is then processed and identified by a Computerized System 102. Computerized System 102 checks its database of answers (that can be predetermined or generated during a session) or predetermined words, in order to communicate with the user using human voice 104. The Computerized System 102 selects the appropriate word or sentence, in Text format, and forwards it to a Text-To-Speech module 103. TTS unit 103 translates the text into an acoustic format, which is then output and heard by a user 105 (or seen, if a textual response is output). However, as was mentioned hereinbefore, such a system suffers from lack of intelligence, and therefore does not enable users to talk freely with machines in a way that a computer will 'understand' a request from a user, in order to perform an operation. This system only translates information from an acoustic (or text oriented) environment, to text format, and vise versa. Of course, there are simpler conventional computerized systems which receive textual inputs/requests from the user (e.g., via chats, a keyboard, E-mail messages, etc.)

Fig. 2 is a block diagram of a computerized system 200 with enhanced man-machine interface operated by voice, according to a preferred embodiment of the invention. The computerized system 200 comprises a True Conversation™ unit 202 that is connected between an enhanced STT module 201 and Computerized System 102, and it also connects Computerized System 102 and TTS module 103.. True Conversation™ unit 202 is a human-machine natural-language interface that reproduces the natural-language transmission of information, by modeling the speaker's generated information and the intention behind it, and the listener's interpretation. Computerized system 200 technology completes the conversation solution by adding human conversational capabilities to the conventional computerized system 100.

Fig. 3 is a flow chart of the operations carried out in a computerized system with human-machine interface with enhanced capability, operated by voice. Block 301, which represents a user sentence as the input to an computerized system, such as system 200, is the beginning of the flow chart 300. At the next step 302, the user sentence is converted to text by a speech recognition unit. The next step 303 comprises a Context Handling for the Natural Language of the user. Context Handling is used to track the conversation in order to know the user's intention at any time. For example, during a conversation, a user may use the term 'it' in a sentence instead of a noun used in an earliest sentence. The Context Handling component in Block 303 may store subjects, objects and indirect objects that were mentioned directly or indirectly during interaction with the user. The Natural Language component in Block 303 identifies the logic template behind the lexical representation of the input sentence. The Context Handling and the Natural Language components in Block 303 identify the logic pattern of the sentence (objects and relations between them).

Block 304 comprises Artificial Intelligence (Al) modules for performing a "solve" operation. The "solve" operation is carried out by retrieving or requesting information from a database, or performing/requesting for transactions with said database, according to the meaning of the input sentence, as received from the user (this module will be further described with respect to the Meaning Resolvers of Fig.5 hereinafter). After the logic template has been identified by Block 303, the Al Block 304 associates a particular logic pattern with specific objects that are retrieved and the relations between them, from databases, and obtains the objects that comply the logic represented by this pattern. By using the term "Transaction" it is meant to include any action of interacting with a system, that may cause changes in the system.

Block 304 comprises at least a minimal level of intelligence, required for handling basic logic patterns that were identified in Block 303, may consult with an external block 305, which contains an application for outside influence on the "behavior" (i.e., the operation and/or the level of intelligence) of the Al modules in block 304. At the next step 306, a Search or Transaction operation is made, followed by conclusion about the results of the Search or the Transaction operation. The conclusion may be reached after consulting with an External Application Block 310. The search or Transaction may be executed in an External Systems 307, such as in the Internet, in predetermined database, in billing system and etc. For example, if the search concerned restaurants in a specific area, then the results according to a preferred embodiment of the invention will be as follows: Type of Restaurant = Italian; Address = Wall St. 27. At the next step 308, the search results translated into text format in the natural language (by a Natural Language component) of the user with the help of the Context Handling component (of Block 303 hareinabove). At the final step 309, the text produced at step 308 is translated back to speech.

Fig. 4 schematically illustrates a Grammar Generation system according to a preferred embodiment of the invention. The Grammar Generation system 400 is a system that can generate a wide range of grammar variations that can be used to allow communication between the user and the computerized system interfaces, e.g., Automatic Speech Recognition (ASR) products, such as VoiceXML™ and Nuance Grammar.

The Grammar Generation system 400 comprises an Ontology component 401, which is a computer representation of a specific human vision of the actual reality, among several different human visions of said reality. Ontology component 401 consists of Objects 401a to 40 Id and Relations 421ac, 421da and 421bd between them. Ontology Objects 401a to 401d represent real physical or abstract objects of the actual world, usually (but not necessarily) represented by nouns in human languages. A computerized machine may contain one or more Ontology components 401, for different subjects of interest. It is important to note that there is no absolute truth in the structure of the Ontology component 401, or in the structure of any other Ontology component. Every Ontology component with all its Objects and Relations reflects some reality, but of course, it cannot reflect all the possible information variations in the universe. Therefore, Ontology component 401, for example, can be a business-specific reality, or reality organized by any other principle. The only goal that should be achieved in the optimal way is the equal expectation from a listener and a speaker, from their virtual world. For instance, comprehensive knowledge of all aspects of the cinema industry (movies, actors, gossips, critics etc.) is expected from the system's Ontology component specialized in this specific field of cinema. Presumably, no deep (if any) knowledge of nuclear physics or genetics will be expected from a system specializing in the cinema industry. Thus, there are no Objects related to the nuclear physics in the Ontology component of this system, i.e., there are no such Objects in the universe known to this system.

According to another embodiment of the invention, several domains of knowledge that limit the scope of conversations and understanding between the user and the computerized system, can be derived from a corresponding Ontology component 401. Systems with human-machine interface may have more than a single Ontology component 401 according to another embodiment of the invention.

Ontology component 401 contains Objects 401a to 401d and Relations 421ac, 421da and 421bd between them (which are not words, phrases or any other lexical entities). Natural languages are ambiguous by their nature, and therefore, language independence is important for correct, unambiguous context evaluation. Language independence is crucial in multi -lingual environments. Ontology component 401 keeps information related to meta-data (data that describes data components and relations between them. For example, if the type of information is a train time-table, the meta-data describes the fields of the train time-table, such as arrivals and departures, while the raw data is the arrivals/departures). It describes Relations 421 421ac, 421da and 421bd between abstract Objects 401a to 401d of the actual world, but does not contain concrete data. Ontology component 401 will, for instance, retain information about the President and his relations to the country, but not the information related to specific presidents. However, a system, such as the Computerized system 200, contains the data related to the location of such information, as well as the manner in which it can be accessed. The Ontology component 401 comprises directions for retrieving raw data in an external data storage.

Grammar Patterns (GPatterns) 402a to 402n are functional entities responsible for choosing Ontology Objects 401a to 401d and Relations 421ac, 421da and 421bd corresponding to specific conditions, represented by logic patterns. Each Gpattern may have several common lexical representations (termed Lexomas, hereinbelow). For example, if one object is a person and the other object is a date, and the relation type between them is represented by the word "of, then two possible Lexomas may be:

"when he was born?" or

"on what date he was born"

GPatterns 402a to 402n are logical rules that decide with which set of Ontology Objects selected from 401a to 401d, a specific type of common lexical representation (Lexoma) of this specific logic pattern, should be generated. GPatterns 402a to 402n are conditions that can be defined on totally different levels of complexity. For example, GPattern 402a (which is simpler) will find, among all Objects 401a to 40 Id, an object that is related to the object "actor", and GPattern 402n (which is more complicated) will find all Objects that are directly connected and have type "A" relations, and are not related on any indirect connection with any other Object having type "B" Relation (For example, an actor that is related to movies made in France, with a budget below 1$M). GPattern 402n will retrieve a very limited objects population, but will return Objects with very particular attributes. On the other hand, GPattern 402a will retrieve less limited objects population, but will return Objects with more common attributes.

Each GPattern 402a to 402n comprises an Ontology Mask (OM) 412a -to 412n, respectively, and a Scope of Domain (SOD) 413a to 413n, respectively. Each OM 412a to 412n defines specific conditions applied to Objects 401a to 401d and Relations 421ac, 421bd and 421da. For example, a specific condition may result in the selection of objects having common attributes (for example, all objects having type "A" relations). "X" of "Y" may represent an OM and <"actor","movie"> may represent a corresponding SOD. The resulting GPattern will be "actor" of "movie".

Each sentence in human language is based upon certain logic selection (OM) of Objects and Relations between them. This logic is described by the conditions of OM 412a to 412n. OM 412a to 412n entities are fixed logic patterns. Different Gpatterns are generated by applying different parameters/values (SODs) to the same OM.

SOD 413a to 413n defines application boundaries for specific OM among OM 412a to 412n, respectively. SOD 413a to 413n boundaries limit the population of Objects 401a to 40 Id, which may comply for this specific OM. The combination of OM 412a to 412n, and SODs 413a to 413n, respectively, creates respective GPatterns 402a to 402n, that can extract objects population complying certain conditions and resides within specific application boundaries. The whole GPatterns (402a to 402n) concept is based on the assumption that the human ability to build logic sentences, is maintained with his world understanding (Ontology), represented Objects and relations.

Each human spoken sentence, generated from GPatterns 402a to 402n, has a corresponding module capable of "solving" the logic pattern (GPattern). Such module is called Meaning Resolver (MR) 504a to 504n and will be described with respect to Fig. 5 hereinafter.

According to a preferred embodiment of the invention, Ontology Objects 401a to 40 Id do not contain lexical entries. Since the Grammar Generation system 400 has to transform its internal representation (Ontology, GPatterns and MRs) into external human understandable form. The functional entity that is responsible for this transformation is the Lexical Parser 407.

Lexical Parser 407 is a functional entity that is responsible for all lexical aspects of sentences generation. It handles all lexicon, syntax and semantics issues. GPatterns 402a to 402n extract groups of Ontology component 401 entities (Objects 401a to 401d and Relations 421ac, 421da and 421bd). Answer Generator 403 (which is part of the Lexical Parser 407) takes this information and attempts to suit it to numerous lexical templates. These templates are called Lexomas 404, and are explained in details hereinbelow. Lexical Parser 407 uses several internal mechanisms, such as Lexical Inilector 405 and Thesaurus 406, to handle different grammar aspects, such as morphology inflections, lexicon etc. While InfLector 405 is a subsystem that is responsible for lexical inflections, Thesaurus 406 provides synonyms for every Ontology Object 401a to 401d. The resulting output from the Lexical Parser 407 is Poly Grammar 408 (explained in details hereinbelow).

Lexoma 404 is a Lexical Template, which is textual string written in LML™ (Lexoma Markup Language). LML is a language that has been developed especially for creating Lexical Templates. Besides plain text, Lexoma 404 holds considerable additional information, such as morpho-syntactic properties, lexicon instructions, concrete values definitions etc. Lexomas 404 are used during grammar and answer generation, for a user.

Poly Grammar 408 is the output provided by Lexical Parser 403, that represent a wide range of grammar variations, that can be used to allow communication between the user and the computerized system interfaces, in an intermediate format. Poly Grammar 408 can be transformed into several information delivery techniques, such as voice detection, text detection etc. Fig.4 shows, for example, a transformation of Poly Grammar 408 information into ASR Specific Grammar 409 format. ASR Specific Grammar 409 is a specific format of voice detection.

Grammar Generator 400, which creates Poly Grammar 408, can transform Poly Grammar 408 into wide range of grammars compatible with wide range of ASR producers, such as VoiceXML™ Forum, Nuance etc. Poly Grammar 408 contains much more information than required for a specific ASR function. This additional information comprises morpho-syntactic information, and information required by Meaning Resolvers 504a to 504n (MR 504a to 504n will be described with respect to Fig. 5 hereinafter).

A GPattern, such as GPattern 402a, is a "content agent" for the conversation, i.e. the utterances in the sentences of the conversation will describe the logic of objects selection and relations between them (GPattern).

The following paragraph shows logical concepts of conversation between two individuals with accordance to a preferred embodiment of the present invention:

The human language is a tool of transferring ideas between humans. One individual has some ideas in his mind and he wants to share them with another individual. By talking with another individual, he actually "downloads" them (convert ideas into sentences) and "transfers" them (i.e., speaks) to him. The other individual receives (i.e., listens) and "uploads" them (i.e., converts sentences into ideas). The sentences for a specific idea may vary (i.e.: one individual may think, "well — I would say it differently"), but the logic behind them should be similar (otherwise misunderstanding will occur).

"Download" (converting ideas into sentences) - the individual may choose any of the predefined logic patterns, i.e., GPatterns (each individual acquires this set of tools in his childhood), and converts it to one of its common language representations (Lexomas 404).

"Upload" (converts sentences into ideas) — the second individual seeks the right logic pattern in his "toolbox". He searches and detects — which logic pattern (GPattern 402a to 402n) can be represented by such sentence. Every logic pattern (GPattern 402a to 402n) has a corresponding module that can solve the meaning of the sentence (MRs 504a to 504n). Finally, he activates this module (MR 504a to 504n) and understands the idea. For example, a person wants to know the color of the sky. He chooses "property of object" logic pattern (GPattern 402a to 402n) and embeds it into a question (or request) representation (Lexoma 404) "what is the color of the sky?". He converts it to sound (TTS - speak). The other participant converts the sound into sort of text (VR - listen) and detects the logic pattern (GPattern 402a to 402n) behind it. Every logic pattern (GPattern 402a to 402n) has a corresponding module that can "solve" the question (MR 504a to 504n). He activates this appropriate module (MR 504a to 504n) and obtains the answer, such as "blue".

Information Server 410 comprises a plurality of Data Sources 411a to 411n and a Data Source Manager 412 for managing those Data Sources 411a to 411n. Information Server 410 is not limited to a specific language (used for information exchange) or to a specific hardware platform which contains ^■ the information. Information Server 410 model extends the traditional approach of storing information in relational databases, thereby the backward compatibility can be fully supported. Information Server 410 can be based on any open industry specifications (such as extensible Markup Language [XML]) with broad industry support and works with all major established database products.

Each Data Source 411a to 411n is an entity that encapsulates a source of information for particular Ontology object 401a to 40 Id. The operation of Data Sources 411a to 411n is transparent to external users. The Data Sources 411a to 411n functionality may be accessed using a conventional Application Program Interface (such as, Structured Query Language [SQL]). Data Sources 411a to 411n can access different information stores, such as, relational and other databases, Internet-based information, e-mail servers and on other hardware platforms. The information provided by Data Sources 411a to 411n is well structured (i.e., the data is organized in a known structure), self-descriptive (i.e., contains internal meta-data) and suitable for easy manipulation.

Information Server 410 is an entity that supplies information for one or more objects and their respective relations, as defined by Ontology component 401, according to a preferred embodiment of the invention. From the user's point of view, it encapsulates the actual Data Source 411a, or even several Data Sources of information 411a to 41 In, thereby providing a standard way of accessing data using a known Application Program Interface (such as an SQL language). Information Server 410 combines the information received from several Data Sources 411a to 41 In. These Data Sources 411a to 41 In may be redundant, allowing higher availability, or may supply different information related to the same topic that allows to obtain more accurate information. The Information Server can operate on the well-structured information presented to it by Data Sources 411a to 41 In in any common information format (such as XML). Information Server 410 uses architecture that can easily integrate these new Data Sources.

Reference components refer to data which can be stored externally (such as, in external databases, in Internet sites, etc.) are used during the generation of grammar patterns for including concrete values with their abstract objects. Each Data Source 411a to 41 In is capable of providing such concrete values during the grammar generation process. It is important to note that the generation of reference component can be done off-line, when performance considerations are less critical. During the application run-time, all reference data is easily accessible.

According to a preferred embodiment of the invention, as described in Fig.4, The Grammar Generation system 400 operates as follow:

All GPatterns 402a to 402n are activated. All Ontology objects 401a to 40 Id are processed and grouped by numerous criteria defined by GPatterns 402a to 402n.

Information that is extracted, sorted and grouped by GPatterns 402a to 402n, is forwarded to the Lexical Parser 407. Lexical Parser 407 queries Lexomas 404 from the database.

Every Lexoma 404 is processed by Lexical Parser 407, which comprises Answer Generator 403, Lexical Inflector 405 and Thesaurus 406, according to the instructions (provided by the LML) that are stored in Lexoma 404. Information that gathered by GPattern 402a to 402n is inserted into relevant parts of the Lexoma 404, indicated by LML.

Data Sources 411a to 41 In are used to supply concrete information to Lexical Parser 407.

The output of Lexical Parser 407 is Poly Grammar 408, which can be converted to specific ASR grammar generator 409.

Specific ASR Grammar 409 is generated and provides all the necessity for interacting with a user.

Fig. 5 is a block diagram of a human-machine conversation system according to a preferred embodiment of the invention. The human-machine conversation system 500 comprises an ASR system 501, which based upon Specific ASR Grammar 409 (converted from Poly Grammar 408), Meaning Resolvers 504a to 504n (MR) and MR Manager 503. MR 504a to 504n are responsible for "solving" different kinds of logic patterns (GPatterns), derived from an identified question (the ASR 501 converts User's 550 verbal question to an identified question). MR Manager 503 is responsible for deciding which MR among MR 504a to 504n, will "solve" the logic pattern (GPattern) derived from the identified question, and provides MR 504a to 504n all the necessary information for "solving".

"Solving" that is performed by MRs 504a to 504n considerably differs in their complexity, from "solving" simple logic patterns (GPatterns), to "solving" complex transactions and maintaining an intelligent dialogue with the User 550. Each MR 504a to 504n operates on Ontology component 401 entities of system 500, as well as accessing Information Server 410 (for obtaining information) and External System 520 for performing Transactions. Its capability also includes maintaining the dialog context and the ability to resolve ambiguity, misunderstanding or missing information.

MR 504a to 504n performs "solving" in a specific context (for example, particular user's context). This context may imply information filtering according to user personal preferences, or restricting access to particular kinds of information or actions by consulting with External Application 310.

All MR 504a to 504n form a MR Suite (MRS) 505 that is an integrated part of system 500 process. MRS 505 is responsible for coordinating the participating MR 504a to 504n. MRS 505 members are able to cooperate by exchanging information received from the User 550 or other sources, capture the User 550 behavior during the dialog and lead the user 550 to a particular predefined goal.

MRs 504a to 504n are able to provide intelligent answers. This functionality includes using different types of answers, such as slang, humor, short answers or more descriptive answers according to user preferences, or even by capturing parameters related to the user's mood from the previous dialog. Using the proper tense, number and person, substituting synonyms in the answer sentences and providing different forms of answer for the same question.

System 500 comprises Lexical Parser 407, according to a preferred embodiment of the invention, wherein the Lexical Parser 407 is now used to generate the answer on-line (instead of generating the Poly Grammar off-line). The human-machine conversation system 500 operates as follows:

Person 550 calls to the human-machine conversation system 500.

ASR 501 recognizes the identified question statement of User 550.

The identified question/statement of User 550 is forwarded to the MRs Manager 503.

MRs Manager 503 decides to which specific MR 504a to 504n this identified question/statement will be forwarded, and accordingly, it will activate this specific MR.

MR 504a to 504n receives information from MR Manager 503, and from relevant Data Sources 411a to 41 In and discourse context of the conversation. In this way, the specific MR 504a to 504n tries to achieve the required information for the preparation of a meaningful answer to User 550.

If some additional unavailable input data for "solving" is required, MR 504a to 504n generates a response to the User 550 in order to complete the require information.

If all necessary input for "solving" is available, the required information will be retrieved from the relevant Data Source 411a to 411n.

To transform data into a human's Natural language, the standard mechanism of Lexical Parser 407 is activated.

The generated answer is passed to User 550 by, for example TTS.

Of course the present invention can be used as other human-machine interfaces, such as automatic e-mail reader and response, or automatic support represented in web-sites by using regular "Chat" web tools.

The above examples and description have of course been provided only for the purpose of illustration, and are not intended to limit the invention in any way. As will be appreciated by the skilled person, the invention can be carried out in a great variety of ways, such as communicating with the user using e-mail messages, fax transmissions, Short Messaging Services (SMS), employing more than one technique from those described above, all without exceeding the scope of the invention.

Claims

1. A method for allowing interaction between a user and a computerized system by using human natural language and/or textual data exchange, comprising: a) generating a conversation domain consisting of a plurality of phrases having valid logical meaning, each of which corresponding to an aspect of the operation of, and/or the goods/services provided by, said computerized system; and b) allowing data exchange between said user and said computerized system, and operation of said computerized system, by using at least one of said phrases.

2. A method according to claim 1, comprising: a) receiving a verbal and/or textual input from said user, that matches one of the phrases; b) whenever required, converting said input into textual data; c) analyzing the context of said textual data by associating, selected words and their logic relations, obtained from said input, with predetermined set of words stored in a first accessible database and restricted by the operation of, and/or the goods/services provided by, said computerized system, and accurately obtaining the idea expressed by said textual data; d) associating said idea with a set of keywords representing said goods/services, stored in a second accessible database; e) searching for data representing said goods/services in said second accessible database and obtaining information related to the search results, and/or transacting with said computerized system, according to said idea; f) synthesizing a textual response phrase that represents a selected record(s) from said search and/or transaction results, by using words, selected from said first and/or said second databases, according to the context of said idea; and g) whenever required, converting said textual response phrase into speech, to be played to said user, and/or to be displayed to said user.

3. A method according to claim 2, wherein the input from the user is unguided, free continues input. '

4. A method according to claim 2, wherein the input from the user is a guided input.

5. A method according to claim 2, wherein the input from the user is a part of unguided, free continues conversation.

6. A method according to claim 2, wherein the input from the user is a part of guided, continues conversation.

7. A method according to claim 2, wherein the first and/or the second accessible databases reside within the computerized system.

8. A method according to claim 2, wherein the first and/or the second accessible databases are external to the computerized system.

9. A method according to claim 2, wherein the context analysis of the textual data and/or the association operation(s) are performed by an internal/external application employing artificial intelligence.

10. A method according to claim 2, wherein the search and /or transaction operation(s) are performed by an internal/external application.

11. A method according to claim 2, wherein each idea is represented by a plurality of physical and/or abstract objects and relations between them, that belong to an ontology component, associated with a predicted reality of a user, while interacting with the computerized system.

12. A method according to claim 2, wherein the textual response phrase is synthesized by an answer generator that uses grammar templates, that are associated with the logic determined from the input and by a combination of objects and their corresponding relation, that are selected by resolving said logic.

13. A method according to claim 12, further comprising performing a dialog with the user whenever the context of the input cannot be properly resolved.

14. A method according to claim 1, wherein the conversation domain is generated by performing the following steps: a) defining an ontology domain consisting of a plurality of physical and/or abstract objects and the relations between said objects, that belong to the reality associated with a predicted reality of a user, while interacting with the computerized system; b) defining a plurality of logic patterns from said objects and their relations, each of said logic patterns consisting of a combination of selected objects and their corresponding relations; c) sorting and/or grouping all the objects in said ontology domain according to criteria determined by said logic patterns; and d) forwarding sorted and/or grouped objects to a lexical parser that generates different phrases using lexical templates, inflection and thesaurus for formatting said objects.

15. Interface for allowing interaction between a user and a computerized system, by using human natural language and/or textual data exchange, operating according to the method described in any one of the preceding claims.

16. A computerized system capable of interacting with a user and by using human natural language and/or textual data exchange, operating according to the method described in any one of the preceding claims.

17. A method for allowing interaction between a user and a computerized system by using human natural language and/or textual data exchange, substantially as described and illustrated.

18. Interface for allowing interaction between a user and a computerized system, by using human natural language and/or textual data exchange, substantially as described and illustrated.

19. A computerized system capable of interacting with a user and by using human natural language and/or textual data exchange, substantially as described and illustrated.