WO2002049253A2 - Method and interface for intelligent user-machine interaction - Google Patents

Method and interface for intelligent user-machine interaction Download PDF

Info

Publication number
WO2002049253A2
WO2002049253A2 PCT/IL2001/001164 IL0101164W WO0249253A2 WO 2002049253 A2 WO2002049253 A2 WO 2002049253A2 IL 0101164 W IL0101164 W IL 0101164W WO 0249253 A2 WO0249253 A2 WO 0249253A2
Authority
WO
WIPO (PCT)
Prior art keywords
user
computerized system
objects
input
textual
Prior art date
Application number
PCT/IL2001/001164
Other languages
French (fr)
Other versions
WO2002049253A3 (en
Inventor
Ofer Alt
Simon Rapoport
Oren Shamir
Ilya Knyazhansky
Original Assignee
Poly Information Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Poly Information Ltd. filed Critical Poly Information Ltd.
Priority to AU2002222491A priority Critical patent/AU2002222491A1/en
Publication of WO2002049253A2 publication Critical patent/WO2002049253A2/en
Publication of WO2002049253A3 publication Critical patent/WO2002049253A3/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Definitions

  • the present invention relates to the field of human-machine interfaces. More particularly, the invention relates to a method and interface for intelligent user-machine interaction in natural language.
  • IVR Interactive Voice Reply
  • STT Speech-To-Text
  • TTS Text-To-Speech
  • One of the applications developed according to the STT and TTS module for conversation between man and machine is known as the 'structured' conversation.
  • the machine checks its database for text, which is identical to the text that was given to IT by the user.
  • the machine directs the conversation so that in each step of the conversation it knows exactly what kind of data the user inputs, and checks for identical text.
  • Such modules can be used, for example, in phone directories, wherein the application is started first with the conversation by the service computer and 'structured' by the user, as in the following:
  • the present invention is directed to a method for allowing interaction between a user and a computerized system by using human natural language and/or textual data exchange.
  • a conversation domain that consists of a plurality of phrases having valid logical meaning is generated. Each phrase corresponds to an aspect of the operation of, and/or the goods/services provided by the computerized system.
  • Data exchange between the user and the computerized system, and operation of the computerized system are allowed by using at least one of the phrases.
  • a verbal and/or textual input from the user that matches one of the phrases is received.
  • the input is converted into textual ' data.
  • the context of the textual data is analyzed by associating selected words and their logic relations, obtained from the input, with predetermined set of words, stored in a first accessible database, and restricted by the operation of, and/or the goods/services provided by, the computerized system, and by accurately obtaining the idea expressed by the textual data.
  • the idea is associated with a set of keywords representing the goods/services, stored in a second internal/external accessible database.
  • a search for obtaining data representing the goods/services is carried out in the second internal/external accessible database. Information related to the search results is obtained, according to the idea.
  • Transacting with the computerized system may also be carried out, according to the idea, a textual response phrase that represents a selected record(s) from the search and/or transaction results, is synthesized by using words, selected from the first and/or the second databases, according to the context of the idea. Whenever required, the textual response phrase is converted into speech, to be played to the user, and/or to be displayed to the user.
  • the context analysis of the textual data and/or the association operation(s) may be performed by an internal/external application employing artificial intelligence.
  • the search and /or transaction operation(s) may be performed by an internal/external application.
  • each idea is represented by a plurality of physical and/or abstract objects and relations between them, that belong to an ontology component, associated with a predicted reality of a user, while interacting with the computerized system.
  • the textual response phrase may be synthesized by an answer generator that uses grammar templates, that are associated with the logic determined from the input and by a combination of objects and their corresponding relation, that are selected by resolving the logic.
  • a dialog with the user may be performed, whenever the context of the input cannot be properly resolved.
  • the conversation domain is generated by defining an ontology domain consisting of a plurality of physical and/or abstract objects and the relations between the objects, that belong to the reality associated with a predicted reality of a user, while interacting with the computerized system, as well as defining a plurality of logic patterns from the objects and their relations, each of the logic patterns consisting of a combination of selected objects and their corresponding relations. All the objects in the ontology domain are sorted and/or grouped according to criteria determined by the logic patterns. Sorted and/or grouped objects are forwarded to a lexical parser that generates different phrases using lexical templates, inflection and thesaurus for formatting the objects.
  • the present invention is also directed to an interface for allowing interaction between a user and a computerized system, by using human natural language and/or textual data exchange, operating according to the method described herein above.
  • the present invention is further directed to a computerized system capable of interacting with a user and by using human natural language and/or textual data exchange, operating according to the method described herein above.
  • Fig. 1 is a block diagram of a computerized system operated by voice, according to a prior art
  • Fig. 2 is a block diagram of a computerized system with an enhanced man-machine interface operated by voice, according to a preferred embodiment of the invention
  • Fig. 3 is a flow chart of an process for a computerized system with enhanced man-machine interface operated by voice, according to a preferred embodiment of the invention
  • Fig. 4 is block diagram of a Grammar Generation unit for a computerized system with enhanced man-machine interface operated by voice, according to a preferred embodiment of the invention.
  • Fig. 5 is block diagram of a man-machine conversation system according to a preferred embodiment of the invention.
  • Fig. 1 is block diagram of a conventional computerized system 100 operated by voice.
  • a human voice 104 is received as input to an STT module 101, such as a Speech Recognition module, which translates the human voice 104 into digital data, that can be converted into textual characters.
  • the translated digital/textual data is then processed and identified by a Computerized System 102.
  • Computerized System 102 checks its database of answers (that can be predetermined or generated during a session) or predetermined words, in order to communicate with the user using human voice 104.
  • the Computerized System 102 selects the appropriate word or sentence, in Text format, and forwards it to a Text-To-Speech module 103.
  • TTS unit 103 translates the text into an acoustic format, which is then output and heard by a user 105 (or seen, if a textual response is output).
  • a user 105 or seen, if a textual response is output.
  • This system only translates information from an acoustic (or text oriented) environment, to text format, and vise versa.
  • there are simpler conventional computerized systems which receive textual inputs/requests from the user (e.g., via chats, a keyboard, E-mail messages, etc.)
  • Fig. 2 is a block diagram of a computerized system 200 with enhanced man-machine interface operated by voice, according to a preferred embodiment of the invention.
  • the computerized system 200 comprises a True ConversationTM unit 202 that is connected between an enhanced STT module 201 and Computerized System 102, and it also connects Computerized System 102 and TTS module 103.
  • True ConversationTM unit 202 is a human-machine natural-language interface that reproduces the natural-language transmission of information, by modeling the speaker's generated information and the intention behind it, and the listener's interpretation.
  • Computerized system 200 technology completes the conversation solution by adding human conversational capabilities to the conventional computerized system 100.
  • Fig. 3 is a flow chart of the operations carried out in a computerized system with human-machine interface with enhanced capability, operated by voice.
  • Block 301 which represents a user sentence as the input to an computerized system, such as system 200, is the beginning of the flow chart 300.
  • the user sentence is converted to text by a speech recognition unit.
  • the next step 303 comprises a Context Handling for the Natural Language of the user. Context Handling is used to track the conversation in order to know the user's intention at any time. For example, during a conversation, a user may use the term 'it' in a sentence instead of a noun used in an earliest sentence.
  • the Context Handling component in Block 303 may store subjects, objects and indirect objects that were mentioned directly or indirectly during interaction with the user.
  • the Natural Language component in Block 303 identifies the logic template behind the lexical representation of the input sentence.
  • the Context Handling and the Natural Language components in Block 303 identify the logic pattern of the sentence (objects and relations between them).
  • Block 304 comprises Artificial Intelligence (Al) modules for performing a "solve” operation.
  • the “solve” operation is carried out by retrieving or requesting information from a database, or performing/requesting for transactions with said database, according to the meaning of the input sentence, as received from the user (this module will be further described with respect to the Meaning Resolvers of Fig.5 hereinafter).
  • the Al Block 304 associates a particular logic pattern with specific objects that are retrieved and the relations between them, from databases, and obtains the objects that comply the logic represented by this pattern.
  • Transaction it is meant to include any action of interacting with a system, that may cause changes in the system.
  • Block 304 comprises at least a minimal level of intelligence, required for handling basic logic patterns that were identified in Block 303, may consult with an external block 305, which contains an application for outside influence on the "behavior" (i.e., the operation and/or the level of intelligence) of the Al modules in block 304.
  • an external block 305 which contains an application for outside influence on the "behavior” (i.e., the operation and/or the level of intelligence) of the Al modules in block 304.
  • a Search or Transaction operation is made, followed by conclusion about the results of the Search or the Transaction operation. The conclusion may be reached after consulting with an External Application Block 310.
  • the search or Transaction may be executed in an External Systems 307, such as in the Internet, in predetermined database, in billing system and etc.
  • the search results translated into text format in the natural language (by a Natural Language component) of the user with the help of the Context Handling component (of Block 303 hareinabove).
  • the text produced at step 308 is translated back to speech.
  • Fig. 4 schematically illustrates a Grammar Generation system according to a preferred embodiment of the invention.
  • the Grammar Generation system 400 is a system that can generate a wide range of grammar variations that can be used to allow communication between the user and the computerized system interfaces, e.g., Automatic Speech Recognition (ASR) products, such as VoiceXMLTM and Nuance Grammar.
  • ASR Automatic Speech Recognition
  • the Grammar Generation system 400 comprises an Ontology component 401, which is a computer representation of a specific human vision of the actual reality, among several different human visions of said reality.
  • Ontology component 401 consists of Objects 401a to 40 Id and Relations 421ac, 421da and 421bd between them.
  • Ontology Objects 401a to 401d represent real physical or abstract objects of the actual world, usually (but not necessarily) represented by nouns in human languages.
  • a computerized machine may contain one or more Ontology components 401, for different subjects of interest. It is important to note that there is no absolute truth in the structure of the Ontology component 401, or in the structure of any other Ontology component.
  • Ontology component 401 can be a business-specific reality, or reality organized by any other principle.
  • the only goal that should be achieved in the optimal way is the equal expectation from a listener and a speaker, from their virtual world.
  • comprehensive knowledge of all aspects of the cinema industry is expected from the system's Ontology component specialized in this specific field of cinema.
  • no deep (if any) knowledge of nuclear physics or genetics will be expected from a system specializing in the cinema industry.
  • there are no Objects related to the nuclear physics in the Ontology component of this system i.e., there are no such Objects in the universe known to this system.
  • Ontology component 401 contains Objects 401a to 401d and Relations 421ac, 421da and 421bd between them (which are not words, phrases or any other lexical entities). Natural languages are ambiguous by their nature, and therefore, language independence is important for correct, unambiguous context evaluation. Language independence is crucial in multi -lingual environments. Ontology component 401 keeps information related to meta-data (data that describes data components and relations between them. For example, if the type of information is a train time-table, the meta-data describes the fields of the train time-table, such as arrivals and departures, while the raw data is the arrivals/departures).
  • meta-data data that describes data components and relations between them. For example, if the type of information is a train time-table, the meta-data describes the fields of the train time-table, such as arrivals and departures, while the raw data is the arrivals/departures).
  • Ontology component 401 It describes Relations 421 421ac, 421da and 421bd between abstract Objects 401a to 401d of the actual world, but does not contain concrete data.
  • Ontology component 401 will, for instance, retain information about the President and his relations to the country, but not the information related to specific presidents.
  • a system such as the Computerized system 200, contains the data related to the location of such information, as well as the manner in which it can be accessed.
  • the Ontology component 401 comprises directions for retrieving raw data in an external data storage.
  • Grammar Patterns (GPatterns) 402a to 402n are functional entities responsible for choosing Ontology Objects 401a to 401d and Relations 421ac, 421da and 421bd corresponding to specific conditions, represented by logic patterns.
  • Each Gpattern may have several common lexical representations (termed Lexomas, hereinbelow). For example, if one object is a person and the other object is a date, and the relation type between them is represented by the word "of, then two possible Lexomas may be:
  • GPatterns 402a to 402n are logical rules that decide with which set of Ontology Objects selected from 401a to 401d, a specific type of common lexical representation (Lexoma) of this specific logic pattern, should be generated.
  • GPatterns 402a to 402n are conditions that can be defined on totally different levels of complexity.
  • GPattern 402a (which is simpler) will find, among all Objects 401a to 40 Id, an object that is related to the object "actor", and GPattern 402n (which is more complicated) will find all Objects that are directly connected and have type "A" relations, and are not related on any indirect connection with any other Object having type "B" Relation (For example, an actor that is related to movies made in France, with a budget below 1$M).
  • GPattern 402n will retrieve a very limited objects population, but will return Objects with very particular attributes.
  • GPattern 402a will retrieve less limited objects population, but will return Objects with more common attributes.
  • Each GPattern 402a to 402n comprises an Ontology Mask (OM) 412a -to 412n, respectively, and a Scope of Domain (SOD) 413a to 413n, respectively.
  • OM 412a to 412n defines specific conditions applied to Objects 401a to 401d and Relations 421ac, 421bd and 421da. For example, a specific condition may result in the selection of objects having common attributes (for example, all objects having type "A" relations).
  • "X" of "Y” may represent an OM and ⁇ "actor","movie”> may represent a corresponding SOD.
  • the resulting GPattern will be "actor" of "movie”.
  • OM logic selection
  • SOD parameters/values
  • SOD 413a to 413n defines application boundaries for specific OM among OM 412a to 412n, respectively.
  • SOD 413a to 413n boundaries limit the population of Objects 401a to 40 Id, which may comply for this specific OM.
  • the whole GPatterns (402a to 402n) concept is based on the assumption that the human ability to build logic sentences, is maintained with his world understanding (Ontology), represented Objects and relations.
  • Such module is called Meaning Resolver (MR) 504a to 504n and will be described with respect to Fig. 5 hereinafter.
  • Ontology Objects 401a to 40 Id do not contain lexical entries. Since the Grammar Generation system 400 has to transform its internal representation (Ontology, GPatterns and MRs) into external human understandable form. The functional entity that is responsible for this transformation is the Lexical Parser 407.
  • Lexical Parser 407 is a functional entity that is responsible for all lexical aspects of sentences generation. It handles all lexicon, syntax and semantics issues. GPatterns 402a to 402n extract groups of Ontology component 401 entities (Objects 401a to 401d and Relations 421ac, 421da and 421bd). Answer Generator 403 (which is part of the Lexical Parser 407) takes this information and attempts to suit it to numerous lexical templates. These templates are called Lexomas 404, and are explained in details hereinbelow. Lexical Parser 407 uses several internal mechanisms, such as Lexical Inilector 405 and Thesaurus 406, to handle different grammar aspects, such as morphology inflections, lexicon etc.
  • Thesaurus 406 provides synonyms for every Ontology Object 401a to 401d.
  • the resulting output from the Lexical Parser 407 is Poly Grammar 408 (explained in details hereinbelow).
  • Lexoma 404 is a Lexical Template, which is textual string written in LMLTM (Lexoma Markup Language). LML is a language that has been developed especially for creating Lexical Templates. Besides plain text, Lexoma 404 holds considerable additional information, such as morpho-syntactic properties, lexicon instructions, concrete values definitions etc. Lexomas 404 are used during grammar and answer generation, for a user.
  • Poly Grammar 408 is the output provided by Lexical Parser 403, that represent a wide range of grammar variations, that can be used to allow communication between the user and the computerized system interfaces, in an intermediate format.
  • Poly Grammar 408 can be transformed into several information delivery techniques, such as voice detection, text detection etc.
  • Fig.4 shows, for example, a transformation of Poly Grammar 408 information into ASR Specific Grammar 409 format.
  • ASR Specific Grammar 409 is a specific format of voice detection.
  • Poly Grammar Generator 400 which creates Poly Grammar 408, can transform Poly Grammar 408 into wide range of grammars compatible with wide range of ASR producers, such as VoiceXMLTM Forum, Nuance etc.
  • Poly Grammar 408 contains much more information than required for a specific ASR function. This additional information comprises morpho-syntactic information, and information required by Meaning Resolvers 504a to 504n (MR 504a to 504n will be described with respect to Fig. 5 hereinafter).
  • a GPattern such as GPattern 402a, is a "content agent" for the conversation, i.e. the utterances in the sentences of the conversation will describe the logic of objects selection and relations between them (GPattern).
  • the human language is a tool of transferring ideas between humans.
  • One individual has some ideas in his mind and he wants to share them with another individual. By talking with another individual, he actually “downloads” them (convert ideas into sentences) and "transfers” them (i.e., speaks) to him. The other individual receives (i.e., listens) and "uploads” them (i.e., converts sentences into ideas).
  • the sentences for a specific idea may vary (i.e.: one individual may think, "well — I would say it differently"), but the logic behind them should be similar (otherwise misunderstanding will occur).
  • Information Server 410 comprises a plurality of Data Sources 411a to 411n and a Data Source Manager 412 for managing those Data Sources 411a to 411n.
  • Information Server 410 is not limited to a specific language (used for information exchange) or to a specific hardware platform which contains ⁇ the information.
  • Information Server 410 model extends the traditional approach of storing information in relational databases, thereby the backward compatibility can be fully supported.
  • Information Server 410 can be based on any open industry specifications (such as extensible Markup Language [XML]) with broad industry support and works with all major established database products.
  • XML extensible Markup Language
  • Each Data Source 411a to 411n is an entity that encapsulates a source of information for particular Ontology object 401a to 40 Id.
  • the operation of Data Sources 411a to 411n is transparent to external users.
  • the Data Sources 411a to 411n functionality may be accessed using a conventional Application Program Interface (such as, Structured Query Language [SQL]).
  • Data Sources 411a to 411n can access different information stores, such as, relational and other databases, Internet-based information, e-mail servers and on other hardware platforms.
  • the information provided by Data Sources 411a to 411n is well structured (i.e., the data is organized in a known structure), self-descriptive (i.e., contains internal meta-data) and suitable for easy manipulation.
  • Information Server 410 is an entity that supplies information for one or more objects and their respective relations, as defined by Ontology component 401, according to a preferred embodiment of the invention. From the user's point of view, it encapsulates the actual Data Source 411a, or even several Data Sources of information 411a to 41 In, thereby providing a standard way of accessing data using a known Application Program Interface (such as an SQL language). Information Server 410 combines the information received from several Data Sources 411a to 41 In. These Data Sources 411a to 41 In may be redundant, allowing higher availability, or may supply different information related to the same topic that allows to obtain more accurate information. The Information Server can operate on the well-structured information presented to it by Data Sources 411a to 41 In in any common information format (such as XML). Information Server 410 uses architecture that can easily integrate these new Data Sources.
  • Reference components refer to data which can be stored externally (such as, in external databases, in Internet sites, etc.) are used during the generation of grammar patterns for including concrete values with their abstract objects.
  • Each Data Source 411a to 41 In is capable of providing such concrete values during the grammar generation process. It is important to note that the generation of reference component can be done off-line, when performance considerations are less critical. During the application run-time, all reference data is easily accessible.
  • the Grammar Generation system 400 operates as follow:
  • All GPatterns 402a to 402n are activated. All Ontology objects 401a to 40 Id are processed and grouped by numerous criteria defined by GPatterns 402a to 402n.
  • Lexical Parser 407 queries Lexomas 404 from the database.
  • Lexoma 404 Every Lexoma 404 is processed by Lexical Parser 407, which comprises Answer Generator 403, Lexical Inflector 405 and Thesaurus 406, according to the instructions (provided by the LML) that are stored in Lexoma 404. Information that gathered by GPattern 402a to 402n is inserted into relevant parts of the Lexoma 404, indicated by LML.
  • Data Sources 411a to 41 In are used to supply concrete information to Lexical Parser 407.
  • Lexical Parser 407 is Poly Grammar 408, which can be converted to specific ASR grammar generator 409.
  • Specific ASR Grammar 409 is generated and provides all the necessity for interacting with a user.
  • Fig. 5 is a block diagram of a human-machine conversation system according to a preferred embodiment of the invention.
  • the human-machine conversation system 500 comprises an ASR system 501, which based upon Specific ASR Grammar 409 (converted from Poly Grammar 408), Meaning Resolvers 504a to 504n (MR) and MR Manager 503.
  • MR 504a to 504n are responsible for "solving" different kinds of logic patterns (GPatterns), derived from an identified question (the ASR 501 converts User's 550 verbal question to an identified question).
  • MR Manager 503 is responsible for deciding which MR among MR 504a to 504n, will “solve” the logic pattern (GPattern) derived from the identified question, and provides MR 504a to 504n all the necessary information for "solving".
  • MRs 504a to 504n considerably differs in their complexity, from “solving” simple logic patterns (GPatterns), to “solving” complex transactions and maintaining an intelligent dialogue with the User 550.
  • Each MR 504a to 504n operates on Ontology component 401 entities of system 500, as well as accessing Information Server 410 (for obtaining information) and External System 520 for performing Transactions. Its capability also includes maintaining the dialog context and the ability to resolve ambiguity, misunderstanding or missing information.
  • MR 504a to 504n performs "solving" in a specific context (for example, particular user's context). This context may imply information filtering according to user personal preferences, or restricting access to particular kinds of information or actions by consulting with External Application 310.
  • All MR 504a to 504n form a MR Suite (MRS) 505 that is an integrated part of system 500 process.
  • MRS 505 is responsible for coordinating the participating MR 504a to 504n.
  • MRS 505 members are able to cooperate by exchanging information received from the User 550 or other sources, capture the User 550 behavior during the dialog and lead the user 550 to a particular predefined goal.
  • MRs 504a to 504n are able to provide intelligent answers. This functionality includes using different types of answers, such as slang, humor, short answers or more descriptive answers according to user preferences, or even by capturing parameters related to the user's mood from the previous dialog. Using the proper tense, number and person, substituting synonyms in the answer sentences and providing different forms of answer for the same question.
  • System 500 comprises Lexical Parser 407, according to a preferred embodiment of the invention, wherein the Lexical Parser 407 is now used to generate the answer on-line (instead of generating the Poly Grammar off-line).
  • the human-machine conversation system 500 operates as follows:
  • Person 550 calls to the human-machine conversation system 500.
  • ASR 501 recognizes the identified question statement of User 550.
  • the identified question/statement of User 550 is forwarded to the MRs Manager 503.
  • MRs Manager 503 decides to which specific MR 504a to 504n this identified question/statement will be forwarded, and accordingly, it will activate this specific MR.
  • MR 504a to 504n receives information from MR Manager 503, and from relevant Data Sources 411a to 41 In and discourse context of the conversation. In this way, the specific MR 504a to 504n tries to achieve the required information for the preparation of a meaningful answer to User 550.
  • MR 504a to 504n generates a response to the User 550 in order to complete the require information.
  • Lexical Parser 407 To transform data into a human's Natural language, the standard mechanism of Lexical Parser 407 is activated.
  • the generated answer is passed to User 550 by, for example TTS.
  • the present invention can be used as other human-machine interfaces, such as automatic e-mail reader and response, or automatic support represented in web-sites by using regular "Chat" web tools.

Abstract

A method for allowing interaction between a user and a computerized system (202) by using human natural language and/or textual data exchange. A conversation domain consisting of a plurality of phrases having valid logical meaning is generated. Each phrase corresponds to an aspect of the operation of, and/or the goods/services provided by, the computerized system (202). Data is exchanged between the user and the computerized system (202), and the computerized system (202) is operated by using at least one of the phrases.

Description

METHOD AND INTERFACE FOR INTELLIGENT USER-MACHINE
INTERACTION
Field of the Invention
The present invention relates to the field of human-machine interfaces. More particularly, the invention relates to a method and interface for intelligent user-machine interaction in natural language.
Background of the Invention
In recent years, man has been trying to communicate vocally with machines. The ability to communicate in such a way has many advantages in several fields. The many advantages of accessing computerized systems through voice in natural language are considered as common knowledge today.
In recent years, several technological attempts have been made to access computerized systems by using human voice with different products, such as Voice extensible Markup Language™ (VoiceXML™ ) by VoiceXML™ Forum (founded by AT&T, IBM, Lucent Technologies and Motorola; the Forum web site is: http://www.voicexml.org/), Nuance Grammar Builder™ (Nuance Communications 2000, Menlo Park, CA, USA) etc. Such technological attempts are, among others, Voice Recognition (VR), Automatic Speech Recognition (ASR), and Text-To-Speech conversion.
In VR technology, the system tries to recognize the voice of the user and to react according to the user's orders. The problem with such technology is that the computer must learn the tones of every user's voice in lab condition, in order to correctly identify the voice of the user, in order to correctly interpret his meaning. This technology, obviously, is not suitable in wide public domain. In ASR technology, the system is able to recognize the user's voice without teaching the system different user voices in lab condition. However, such technology is limited only to predetermined sets of sentences. ASR technology has a set of predetermined databases, such as Names, Addresses, Numbers that are compared with the answers of a user. ASR technology guides the conversation with the user, and waits for very specific answers. No Artificial Intelligence is used in ASR technology.
Additional technology that has been developed is the Interactive Voice Reply (IVR), which is actually a menu that gives the user the ability to choose between two or more possibilities in each step of the conversation. This technology is very similar to the ASR technology by the predetermined and limited set of sentences.
All major breakthroughs in recent years have evolved around a module called Speech-To-Text (STT) and Text-To-Speech (TTS). This core technology acts as a translator between a human voice and the written text in the computer. For example, when a person says "happy", it is translated by the module from the acoustic environment to text in a computer. However, the SST and TTS modules do not have any intelligence, but are simple translators between the acoustic environment and the written computer environment. The main breakthrough recently has been around the accuracy and reliability of such technologies. According to this technology of STT and TTS, several applications have been developed.
One of the applications developed according to the STT and TTS module for conversation between man and machine is known as the 'structured' conversation. In such an application, the machine checks its database for text, which is identical to the text that was given to IT by the user. The machine directs the conversation so that in each step of the conversation it knows exactly what kind of data the user inputs, and checks for identical text. Such modules can be used, for example, in phone directories, wherein the application is started first with the conversation by the service computer and 'structured' by the user, as in the following:
Computer: 'please say last name'
User: 'Bond'
Computer: 'please say first name'
User: 'James'
And so on.
Unfortunately, none of the known technologies and application succeeded in creating a fluent conversation between man and machine.
All the methods described above have not yet provided satisfactory solutions to the problem of providing a computerized system that can verbally/textually communicate freely with a user.
It is an object of the present invention to provide a method and interface for intelligent user-machine interaction with an ability to communicate with a user in natural language.
It is another object of the present invention to provide a human-machine interface with a level of artificial intelligence.
Other objects and advantages of the invention will become apparent as the description proceeds.
Summary of the Invention
The present invention is directed to a method for allowing interaction between a user and a computerized system by using human natural language and/or textual data exchange. A conversation domain that consists of a plurality of phrases having valid logical meaning is generated. Each phrase corresponds to an aspect of the operation of, and/or the goods/services provided by the computerized system. Data exchange between the user and the computerized system, and operation of the computerized system are allowed by using at least one of the phrases.
Preferably, a verbal and/or textual input from the user that matches one of the phrases is received. Whenever required, the input is converted into textual ' data. The context of the textual data is analyzed by associating selected words and their logic relations, obtained from the input, with predetermined set of words, stored in a first accessible database, and restricted by the operation of, and/or the goods/services provided by, the computerized system, and by accurately obtaining the idea expressed by the textual data. The idea is associated with a set of keywords representing the goods/services, stored in a second internal/external accessible database. A search for obtaining data representing the goods/services is carried out in the second internal/external accessible database. Information related to the search results is obtained, according to the idea. Transacting with the computerized system may also be carried out, according to the idea, a textual response phrase that represents a selected record(s) from the search and/or transaction results, is synthesized by using words, selected from the first and/or the second databases, according to the context of the idea. Whenever required, the textual response phrase is converted into speech, to be played to the user, and/or to be displayed to the user.
The context analysis of the textual data and/or the association operation(s) may be performed by an internal/external application employing artificial intelligence. The search and /or transaction operation(s) may be performed by an internal/external application. Preferably, each idea is represented by a plurality of physical and/or abstract objects and relations between them, that belong to an ontology component, associated with a predicted reality of a user, while interacting with the computerized system. The textual response phrase may be synthesized by an answer generator that uses grammar templates, that are associated with the logic determined from the input and by a combination of objects and their corresponding relation, that are selected by resolving the logic. A dialog with the user may be performed, whenever the context of the input cannot be properly resolved.
Preferably, the conversation domain is generated by defining an ontology domain consisting of a plurality of physical and/or abstract objects and the relations between the objects, that belong to the reality associated with a predicted reality of a user, while interacting with the computerized system, as well as defining a plurality of logic patterns from the objects and their relations, each of the logic patterns consisting of a combination of selected objects and their corresponding relations. All the objects in the ontology domain are sorted and/or grouped according to criteria determined by the logic patterns. Sorted and/or grouped objects are forwarded to a lexical parser that generates different phrases using lexical templates, inflection and thesaurus for formatting the objects.
The present invention is also directed to an interface for allowing interaction between a user and a computerized system, by using human natural language and/or textual data exchange, operating according to the method described herein above.
The present invention is further directed to a computerized system capable of interacting with a user and by using human natural language and/or textual data exchange, operating according to the method described herein above. Brief Description of the Drawings
The above and other characteristics and advantages of the invention will be better understood through the following illustrative and non-limitative detailed description of preferred embodiments thereof, with reference to the appended drawings, wherein:
Fig. 1 is a block diagram of a computerized system operated by voice, according to a prior art;
Fig. 2 is a block diagram of a computerized system with an enhanced man-machine interface operated by voice, according to a preferred embodiment of the invention;
Fig. 3 is a flow chart of an process for a computerized system with enhanced man-machine interface operated by voice, according to a preferred embodiment of the invention;
Fig. 4 is block diagram of a Grammar Generation unit for a computerized system with enhanced man-machine interface operated by voice, according to a preferred embodiment of the invention; and
Fig. 5 is block diagram of a man-machine conversation system according to a preferred embodiment of the invention.
Detailed Description of Preferred Embodiments
Fig. 1 is block diagram of a conventional computerized system 100 operated by voice. A human voice 104 is received as input to an STT module 101, such as a Speech Recognition module, which translates the human voice 104 into digital data, that can be converted into textual characters. The translated digital/textual data is then processed and identified by a Computerized System 102. Computerized System 102 checks its database of answers (that can be predetermined or generated during a session) or predetermined words, in order to communicate with the user using human voice 104. The Computerized System 102 selects the appropriate word or sentence, in Text format, and forwards it to a Text-To-Speech module 103. TTS unit 103 translates the text into an acoustic format, which is then output and heard by a user 105 (or seen, if a textual response is output). However, as was mentioned hereinbefore, such a system suffers from lack of intelligence, and therefore does not enable users to talk freely with machines in a way that a computer will 'understand' a request from a user, in order to perform an operation. This system only translates information from an acoustic (or text oriented) environment, to text format, and vise versa. Of course, there are simpler conventional computerized systems which receive textual inputs/requests from the user (e.g., via chats, a keyboard, E-mail messages, etc.)
Fig. 2 is a block diagram of a computerized system 200 with enhanced man-machine interface operated by voice, according to a preferred embodiment of the invention. The computerized system 200 comprises a True Conversation™ unit 202 that is connected between an enhanced STT module 201 and Computerized System 102, and it also connects Computerized System 102 and TTS module 103.. True Conversation™ unit 202 is a human-machine natural-language interface that reproduces the natural-language transmission of information, by modeling the speaker's generated information and the intention behind it, and the listener's interpretation. Computerized system 200 technology completes the conversation solution by adding human conversational capabilities to the conventional computerized system 100.
Fig. 3 is a flow chart of the operations carried out in a computerized system with human-machine interface with enhanced capability, operated by voice. Block 301, which represents a user sentence as the input to an computerized system, such as system 200, is the beginning of the flow chart 300. At the next step 302, the user sentence is converted to text by a speech recognition unit. The next step 303 comprises a Context Handling for the Natural Language of the user. Context Handling is used to track the conversation in order to know the user's intention at any time. For example, during a conversation, a user may use the term 'it' in a sentence instead of a noun used in an earliest sentence. The Context Handling component in Block 303 may store subjects, objects and indirect objects that were mentioned directly or indirectly during interaction with the user. The Natural Language component in Block 303 identifies the logic template behind the lexical representation of the input sentence. The Context Handling and the Natural Language components in Block 303 identify the logic pattern of the sentence (objects and relations between them).
Block 304 comprises Artificial Intelligence (Al) modules for performing a "solve" operation. The "solve" operation is carried out by retrieving or requesting information from a database, or performing/requesting for transactions with said database, according to the meaning of the input sentence, as received from the user (this module will be further described with respect to the Meaning Resolvers of Fig.5 hereinafter). After the logic template has been identified by Block 303, the Al Block 304 associates a particular logic pattern with specific objects that are retrieved and the relations between them, from databases, and obtains the objects that comply the logic represented by this pattern. By using the term "Transaction" it is meant to include any action of interacting with a system, that may cause changes in the system.
Block 304 comprises at least a minimal level of intelligence, required for handling basic logic patterns that were identified in Block 303, may consult with an external block 305, which contains an application for outside influence on the "behavior" (i.e., the operation and/or the level of intelligence) of the Al modules in block 304. At the next step 306, a Search or Transaction operation is made, followed by conclusion about the results of the Search or the Transaction operation. The conclusion may be reached after consulting with an External Application Block 310. The search or Transaction may be executed in an External Systems 307, such as in the Internet, in predetermined database, in billing system and etc. For example, if the search concerned restaurants in a specific area, then the results according to a preferred embodiment of the invention will be as follows: Type of Restaurant = Italian; Address = Wall St. 27. At the next step 308, the search results translated into text format in the natural language (by a Natural Language component) of the user with the help of the Context Handling component (of Block 303 hareinabove). At the final step 309, the text produced at step 308 is translated back to speech.
Fig. 4 schematically illustrates a Grammar Generation system according to a preferred embodiment of the invention. The Grammar Generation system 400 is a system that can generate a wide range of grammar variations that can be used to allow communication between the user and the computerized system interfaces, e.g., Automatic Speech Recognition (ASR) products, such as VoiceXML™ and Nuance Grammar.
The Grammar Generation system 400 comprises an Ontology component 401, which is a computer representation of a specific human vision of the actual reality, among several different human visions of said reality. Ontology component 401 consists of Objects 401a to 40 Id and Relations 421ac, 421da and 421bd between them. Ontology Objects 401a to 401d represent real physical or abstract objects of the actual world, usually (but not necessarily) represented by nouns in human languages. A computerized machine may contain one or more Ontology components 401, for different subjects of interest. It is important to note that there is no absolute truth in the structure of the Ontology component 401, or in the structure of any other Ontology component. Every Ontology component with all its Objects and Relations reflects some reality, but of course, it cannot reflect all the possible information variations in the universe. Therefore, Ontology component 401, for example, can be a business-specific reality, or reality organized by any other principle. The only goal that should be achieved in the optimal way is the equal expectation from a listener and a speaker, from their virtual world. For instance, comprehensive knowledge of all aspects of the cinema industry (movies, actors, gossips, critics etc.) is expected from the system's Ontology component specialized in this specific field of cinema. Presumably, no deep (if any) knowledge of nuclear physics or genetics will be expected from a system specializing in the cinema industry. Thus, there are no Objects related to the nuclear physics in the Ontology component of this system, i.e., there are no such Objects in the universe known to this system.
According to another embodiment of the invention, several domains of knowledge that limit the scope of conversations and understanding between the user and the computerized system, can be derived from a corresponding Ontology component 401. Systems with human-machine interface may have more than a single Ontology component 401 according to another embodiment of the invention.
Ontology component 401 contains Objects 401a to 401d and Relations 421ac, 421da and 421bd between them (which are not words, phrases or any other lexical entities). Natural languages are ambiguous by their nature, and therefore, language independence is important for correct, unambiguous context evaluation. Language independence is crucial in multi -lingual environments. Ontology component 401 keeps information related to meta-data (data that describes data components and relations between them. For example, if the type of information is a train time-table, the meta-data describes the fields of the train time-table, such as arrivals and departures, while the raw data is the arrivals/departures). It describes Relations 421 421ac, 421da and 421bd between abstract Objects 401a to 401d of the actual world, but does not contain concrete data. Ontology component 401 will, for instance, retain information about the President and his relations to the country, but not the information related to specific presidents. However, a system, such as the Computerized system 200, contains the data related to the location of such information, as well as the manner in which it can be accessed. The Ontology component 401 comprises directions for retrieving raw data in an external data storage.
Grammar Patterns (GPatterns) 402a to 402n are functional entities responsible for choosing Ontology Objects 401a to 401d and Relations 421ac, 421da and 421bd corresponding to specific conditions, represented by logic patterns. Each Gpattern may have several common lexical representations (termed Lexomas, hereinbelow). For example, if one object is a person and the other object is a date, and the relation type between them is represented by the word "of, then two possible Lexomas may be:
"when he was born?" or
"on what date he was born"
GPatterns 402a to 402n are logical rules that decide with which set of Ontology Objects selected from 401a to 401d, a specific type of common lexical representation (Lexoma) of this specific logic pattern, should be generated. GPatterns 402a to 402n are conditions that can be defined on totally different levels of complexity. For example, GPattern 402a (which is simpler) will find, among all Objects 401a to 40 Id, an object that is related to the object "actor", and GPattern 402n (which is more complicated) will find all Objects that are directly connected and have type "A" relations, and are not related on any indirect connection with any other Object having type "B" Relation (For example, an actor that is related to movies made in France, with a budget below 1$M). GPattern 402n will retrieve a very limited objects population, but will return Objects with very particular attributes. On the other hand, GPattern 402a will retrieve less limited objects population, but will return Objects with more common attributes.
Each GPattern 402a to 402n comprises an Ontology Mask (OM) 412a -to 412n, respectively, and a Scope of Domain (SOD) 413a to 413n, respectively. Each OM 412a to 412n defines specific conditions applied to Objects 401a to 401d and Relations 421ac, 421bd and 421da. For example, a specific condition may result in the selection of objects having common attributes (for example, all objects having type "A" relations). "X" of "Y" may represent an OM and <"actor","movie"> may represent a corresponding SOD. The resulting GPattern will be "actor" of "movie".
Each sentence in human language is based upon certain logic selection (OM) of Objects and Relations between them. This logic is described by the conditions of OM 412a to 412n. OM 412a to 412n entities are fixed logic patterns. Different Gpatterns are generated by applying different parameters/values (SODs) to the same OM.
SOD 413a to 413n defines application boundaries for specific OM among OM 412a to 412n, respectively. SOD 413a to 413n boundaries limit the population of Objects 401a to 40 Id, which may comply for this specific OM. The combination of OM 412a to 412n, and SODs 413a to 413n, respectively, creates respective GPatterns 402a to 402n, that can extract objects population complying certain conditions and resides within specific application boundaries. The whole GPatterns (402a to 402n) concept is based on the assumption that the human ability to build logic sentences, is maintained with his world understanding (Ontology), represented Objects and relations.
Each human spoken sentence, generated from GPatterns 402a to 402n, has a corresponding module capable of "solving" the logic pattern (GPattern). Such module is called Meaning Resolver (MR) 504a to 504n and will be described with respect to Fig. 5 hereinafter.
According to a preferred embodiment of the invention, Ontology Objects 401a to 40 Id do not contain lexical entries. Since the Grammar Generation system 400 has to transform its internal representation (Ontology, GPatterns and MRs) into external human understandable form. The functional entity that is responsible for this transformation is the Lexical Parser 407.
Lexical Parser 407 is a functional entity that is responsible for all lexical aspects of sentences generation. It handles all lexicon, syntax and semantics issues. GPatterns 402a to 402n extract groups of Ontology component 401 entities (Objects 401a to 401d and Relations 421ac, 421da and 421bd). Answer Generator 403 (which is part of the Lexical Parser 407) takes this information and attempts to suit it to numerous lexical templates. These templates are called Lexomas 404, and are explained in details hereinbelow. Lexical Parser 407 uses several internal mechanisms, such as Lexical Inilector 405 and Thesaurus 406, to handle different grammar aspects, such as morphology inflections, lexicon etc. While InfLector 405 is a subsystem that is responsible for lexical inflections, Thesaurus 406 provides synonyms for every Ontology Object 401a to 401d. The resulting output from the Lexical Parser 407 is Poly Grammar 408 (explained in details hereinbelow).
Lexoma 404 is a Lexical Template, which is textual string written in LML™ (Lexoma Markup Language). LML is a language that has been developed especially for creating Lexical Templates. Besides plain text, Lexoma 404 holds considerable additional information, such as morpho-syntactic properties, lexicon instructions, concrete values definitions etc. Lexomas 404 are used during grammar and answer generation, for a user.
Poly Grammar 408 is the output provided by Lexical Parser 403, that represent a wide range of grammar variations, that can be used to allow communication between the user and the computerized system interfaces, in an intermediate format. Poly Grammar 408 can be transformed into several information delivery techniques, such as voice detection, text detection etc. Fig.4 shows, for example, a transformation of Poly Grammar 408 information into ASR Specific Grammar 409 format. ASR Specific Grammar 409 is a specific format of voice detection.
Grammar Generator 400, which creates Poly Grammar 408, can transform Poly Grammar 408 into wide range of grammars compatible with wide range of ASR producers, such as VoiceXML™ Forum, Nuance etc. Poly Grammar 408 contains much more information than required for a specific ASR function. This additional information comprises morpho-syntactic information, and information required by Meaning Resolvers 504a to 504n (MR 504a to 504n will be described with respect to Fig. 5 hereinafter).
A GPattern, such as GPattern 402a, is a "content agent" for the conversation, i.e. the utterances in the sentences of the conversation will describe the logic of objects selection and relations between them (GPattern).
The following paragraph shows logical concepts of conversation between two individuals with accordance to a preferred embodiment of the present invention:
The human language is a tool of transferring ideas between humans. One individual has some ideas in his mind and he wants to share them with another individual. By talking with another individual, he actually "downloads" them (convert ideas into sentences) and "transfers" them (i.e., speaks) to him. The other individual receives (i.e., listens) and "uploads" them (i.e., converts sentences into ideas). The sentences for a specific idea may vary (i.e.: one individual may think, "well — I would say it differently"), but the logic behind them should be similar (otherwise misunderstanding will occur).
"Download" (converting ideas into sentences) - the individual may choose any of the predefined logic patterns, i.e., GPatterns (each individual acquires this set of tools in his childhood), and converts it to one of its common language representations (Lexomas 404).
"Upload" (converts sentences into ideas) — the second individual seeks the right logic pattern in his "toolbox". He searches and detects — which logic pattern (GPattern 402a to 402n) can be represented by such sentence. Every logic pattern (GPattern 402a to 402n) has a corresponding module that can solve the meaning of the sentence (MRs 504a to 504n). Finally, he activates this module (MR 504a to 504n) and understands the idea. For example, a person wants to know the color of the sky. He chooses "property of object" logic pattern (GPattern 402a to 402n) and embeds it into a question (or request) representation (Lexoma 404) "what is the color of the sky?". He converts it to sound (TTS - speak). The other participant converts the sound into sort of text (VR - listen) and detects the logic pattern (GPattern 402a to 402n) behind it. Every logic pattern (GPattern 402a to 402n) has a corresponding module that can "solve" the question (MR 504a to 504n). He activates this appropriate module (MR 504a to 504n) and obtains the answer, such as "blue".
Information Server 410 comprises a plurality of Data Sources 411a to 411n and a Data Source Manager 412 for managing those Data Sources 411a to 411n. Information Server 410 is not limited to a specific language (used for information exchange) or to a specific hardware platform which contains the information. Information Server 410 model extends the traditional approach of storing information in relational databases, thereby the backward compatibility can be fully supported. Information Server 410 can be based on any open industry specifications (such as extensible Markup Language [XML]) with broad industry support and works with all major established database products.
Each Data Source 411a to 411n is an entity that encapsulates a source of information for particular Ontology object 401a to 40 Id. The operation of Data Sources 411a to 411n is transparent to external users. The Data Sources 411a to 411n functionality may be accessed using a conventional Application Program Interface (such as, Structured Query Language [SQL]). Data Sources 411a to 411n can access different information stores, such as, relational and other databases, Internet-based information, e-mail servers and on other hardware platforms. The information provided by Data Sources 411a to 411n is well structured (i.e., the data is organized in a known structure), self-descriptive (i.e., contains internal meta-data) and suitable for easy manipulation.
Information Server 410 is an entity that supplies information for one or more objects and their respective relations, as defined by Ontology component 401, according to a preferred embodiment of the invention. From the user's point of view, it encapsulates the actual Data Source 411a, or even several Data Sources of information 411a to 41 In, thereby providing a standard way of accessing data using a known Application Program Interface (such as an SQL language). Information Server 410 combines the information received from several Data Sources 411a to 41 In. These Data Sources 411a to 41 In may be redundant, allowing higher availability, or may supply different information related to the same topic that allows to obtain more accurate information. The Information Server can operate on the well-structured information presented to it by Data Sources 411a to 41 In in any common information format (such as XML). Information Server 410 uses architecture that can easily integrate these new Data Sources.
Reference components refer to data which can be stored externally (such as, in external databases, in Internet sites, etc.) are used during the generation of grammar patterns for including concrete values with their abstract objects. Each Data Source 411a to 41 In is capable of providing such concrete values during the grammar generation process. It is important to note that the generation of reference component can be done off-line, when performance considerations are less critical. During the application run-time, all reference data is easily accessible.
According to a preferred embodiment of the invention, as described in Fig.4, The Grammar Generation system 400 operates as follow:
All GPatterns 402a to 402n are activated. All Ontology objects 401a to 40 Id are processed and grouped by numerous criteria defined by GPatterns 402a to 402n.
Information that is extracted, sorted and grouped by GPatterns 402a to 402n, is forwarded to the Lexical Parser 407. Lexical Parser 407 queries Lexomas 404 from the database.
Every Lexoma 404 is processed by Lexical Parser 407, which comprises Answer Generator 403, Lexical Inflector 405 and Thesaurus 406, according to the instructions (provided by the LML) that are stored in Lexoma 404. Information that gathered by GPattern 402a to 402n is inserted into relevant parts of the Lexoma 404, indicated by LML.
Data Sources 411a to 41 In are used to supply concrete information to Lexical Parser 407.
The output of Lexical Parser 407 is Poly Grammar 408, which can be converted to specific ASR grammar generator 409.
Specific ASR Grammar 409 is generated and provides all the necessity for interacting with a user.
Fig. 5 is a block diagram of a human-machine conversation system according to a preferred embodiment of the invention. The human-machine conversation system 500 comprises an ASR system 501, which based upon Specific ASR Grammar 409 (converted from Poly Grammar 408), Meaning Resolvers 504a to 504n (MR) and MR Manager 503. MR 504a to 504n are responsible for "solving" different kinds of logic patterns (GPatterns), derived from an identified question (the ASR 501 converts User's 550 verbal question to an identified question). MR Manager 503 is responsible for deciding which MR among MR 504a to 504n, will "solve" the logic pattern (GPattern) derived from the identified question, and provides MR 504a to 504n all the necessary information for "solving".
"Solving" that is performed by MRs 504a to 504n considerably differs in their complexity, from "solving" simple logic patterns (GPatterns), to "solving" complex transactions and maintaining an intelligent dialogue with the User 550. Each MR 504a to 504n operates on Ontology component 401 entities of system 500, as well as accessing Information Server 410 (for obtaining information) and External System 520 for performing Transactions. Its capability also includes maintaining the dialog context and the ability to resolve ambiguity, misunderstanding or missing information.
MR 504a to 504n performs "solving" in a specific context (for example, particular user's context). This context may imply information filtering according to user personal preferences, or restricting access to particular kinds of information or actions by consulting with External Application 310.
All MR 504a to 504n form a MR Suite (MRS) 505 that is an integrated part of system 500 process. MRS 505 is responsible for coordinating the participating MR 504a to 504n. MRS 505 members are able to cooperate by exchanging information received from the User 550 or other sources, capture the User 550 behavior during the dialog and lead the user 550 to a particular predefined goal.
MRs 504a to 504n are able to provide intelligent answers. This functionality includes using different types of answers, such as slang, humor, short answers or more descriptive answers according to user preferences, or even by capturing parameters related to the user's mood from the previous dialog. Using the proper tense, number and person, substituting synonyms in the answer sentences and providing different forms of answer for the same question.
System 500 comprises Lexical Parser 407, according to a preferred embodiment of the invention, wherein the Lexical Parser 407 is now used to generate the answer on-line (instead of generating the Poly Grammar off-line). The human-machine conversation system 500 operates as follows:
Person 550 calls to the human-machine conversation system 500.
ASR 501 recognizes the identified question statement of User 550.
The identified question/statement of User 550 is forwarded to the MRs Manager 503.
MRs Manager 503 decides to which specific MR 504a to 504n this identified question/statement will be forwarded, and accordingly, it will activate this specific MR.
MR 504a to 504n receives information from MR Manager 503, and from relevant Data Sources 411a to 41 In and discourse context of the conversation. In this way, the specific MR 504a to 504n tries to achieve the required information for the preparation of a meaningful answer to User 550.
If some additional unavailable input data for "solving" is required, MR 504a to 504n generates a response to the User 550 in order to complete the require information.
If all necessary input for "solving" is available, the required information will be retrieved from the relevant Data Source 411a to 411n.
To transform data into a human's Natural language, the standard mechanism of Lexical Parser 407 is activated.
The generated answer is passed to User 550 by, for example TTS.
Of course the present invention can be used as other human-machine interfaces, such as automatic e-mail reader and response, or automatic support represented in web-sites by using regular "Chat" web tools.
The above examples and description have of course been provided only for the purpose of illustration, and are not intended to limit the invention in any way. As will be appreciated by the skilled person, the invention can be carried out in a great variety of ways, such as communicating with the user using e-mail messages, fax transmissions, Short Messaging Services (SMS), employing more than one technique from those described above, all without exceeding the scope of the invention.

Claims

1. A method for allowing interaction between a user and a computerized system by using human natural language and/or textual data exchange, comprising: a) generating a conversation domain consisting of a plurality of phrases having valid logical meaning, each of which corresponding to an aspect of the operation of, and/or the goods/services provided by, said computerized system; and b) allowing data exchange between said user and said computerized system, and operation of said computerized system, by using at least one of said phrases.
2. A method according to claim 1, comprising: a) receiving a verbal and/or textual input from said user, that matches one of the phrases; b) whenever required, converting said input into textual data; c) analyzing the context of said textual data by associating, selected words and their logic relations, obtained from said input, with predetermined set of words stored in a first accessible database and restricted by the operation of, and/or the goods/services provided by, said computerized system, and accurately obtaining the idea expressed by said textual data; d) associating said idea with a set of keywords representing said goods/services, stored in a second accessible database; e) searching for data representing said goods/services in said second accessible database and obtaining information related to the search results, and/or transacting with said computerized system, according to said idea; f) synthesizing a textual response phrase that represents a selected record(s) from said search and/or transaction results, by using words, selected from said first and/or said second databases, according to the context of said idea; and g) whenever required, converting said textual response phrase into speech, to be played to said user, and/or to be displayed to said user.
3. A method according to claim 2, wherein the input from the user is unguided, free continues input. '
4. A method according to claim 2, wherein the input from the user is a guided input.
5. A method according to claim 2, wherein the input from the user is a part of unguided, free continues conversation.
6. A method according to claim 2, wherein the input from the user is a part of guided, continues conversation.
7. A method according to claim 2, wherein the first and/or the second accessible databases reside within the computerized system.
8. A method according to claim 2, wherein the first and/or the second accessible databases are external to the computerized system.
9. A method according to claim 2, wherein the context analysis of the textual data and/or the association operation(s) are performed by an internal/external application employing artificial intelligence.
10. A method according to claim 2, wherein the search and /or transaction operation(s) are performed by an internal/external application.
11. A method according to claim 2, wherein each idea is represented by a plurality of physical and/or abstract objects and relations between them, that belong to an ontology component, associated with a predicted reality of a user, while interacting with the computerized system.
12. A method according to claim 2, wherein the textual response phrase is synthesized by an answer generator that uses grammar templates, that are associated with the logic determined from the input and by a combination of objects and their corresponding relation, that are selected by resolving said logic.
13. A method according to claim 12, further comprising performing a dialog with the user whenever the context of the input cannot be properly resolved.
14. A method according to claim 1, wherein the conversation domain is generated by performing the following steps: a) defining an ontology domain consisting of a plurality of physical and/or abstract objects and the relations between said objects, that belong to the reality associated with a predicted reality of a user, while interacting with the computerized system; b) defining a plurality of logic patterns from said objects and their relations, each of said logic patterns consisting of a combination of selected objects and their corresponding relations; c) sorting and/or grouping all the objects in said ontology domain according to criteria determined by said logic patterns; and d) forwarding sorted and/or grouped objects to a lexical parser that generates different phrases using lexical templates, inflection and thesaurus for formatting said objects.
15. Interface for allowing interaction between a user and a computerized system, by using human natural language and/or textual data exchange, operating according to the method described in any one of the preceding claims.
16. A computerized system capable of interacting with a user and by using human natural language and/or textual data exchange, operating according to the method described in any one of the preceding claims.
17. A method for allowing interaction between a user and a computerized system by using human natural language and/or textual data exchange, substantially as described and illustrated.
18. Interface for allowing interaction between a user and a computerized system, by using human natural language and/or textual data exchange, substantially as described and illustrated.
19. A computerized system capable of interacting with a user and by using human natural language and/or textual data exchange, substantially as described and illustrated.
PCT/IL2001/001164 2000-12-14 2001-12-13 Method and interface for intelligent user-machine interaction WO2002049253A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2002222491A AU2002222491A1 (en) 2000-12-14 2001-12-13 Method and interface for intelligent user-machine interaction

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IL140322 2000-12-14
IL14032200A IL140322A0 (en) 2000-12-14 2000-12-14 Method and interface for inteligent user machine interaction

Publications (2)

Publication Number Publication Date
WO2002049253A2 true WO2002049253A2 (en) 2002-06-20
WO2002049253A3 WO2002049253A3 (en) 2002-09-12

Family

ID=11074937

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IL2001/001164 WO2002049253A2 (en) 2000-12-14 2001-12-13 Method and interface for intelligent user-machine interaction

Country Status (3)

Country Link
AU (1) AU2002222491A1 (en)
IL (1) IL140322A0 (en)
WO (1) WO2002049253A2 (en)

Cited By (61)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1950672A1 (en) * 2007-01-25 2008-07-30 Deutsche Telekom AG Method and data processing system for manual access of structurally stored information
US20100198595A1 (en) * 2009-02-03 2010-08-05 SoftHUS Sp.z.o.o Systems and methods for interactively accessing hosted services using voice communications
JP2014222510A (en) * 2010-01-18 2014-11-27 アップル インコーポレイテッド Intelligent automated assistant
US9582608B2 (en) 2013-06-07 2017-02-28 Apple Inc. Unified ranking with entropy-weighted information for phrase-based semantic auto-completion
US9620104B2 (en) 2013-06-07 2017-04-11 Apple Inc. System and method for user-specified pronunciation of words for speech synthesis and recognition
US9626955B2 (en) 2008-04-05 2017-04-18 Apple Inc. Intelligent text-to-speech conversion
US9633674B2 (en) 2013-06-07 2017-04-25 Apple Inc. System and method for detecting errors in interactions with a voice-based digital assistant
US9633660B2 (en) 2010-02-25 2017-04-25 Apple Inc. User profiling for voice input processing
US9646614B2 (en) 2000-03-16 2017-05-09 Apple Inc. Fast, language-independent method for user authentication by voice
US9668024B2 (en) 2014-06-30 2017-05-30 Apple Inc. Intelligent automated assistant for TV user interactions
US9697820B2 (en) 2015-09-24 2017-07-04 Apple Inc. Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks
US9798393B2 (en) 2011-08-29 2017-10-24 Apple Inc. Text correction processing
US9934775B2 (en) 2016-05-26 2018-04-03 Apple Inc. Unit-selection text-to-speech synthesis based on predicted concatenation parameters
US9953088B2 (en) 2012-05-14 2018-04-24 Apple Inc. Crowd sourcing information to fulfill user requests
US9966068B2 (en) 2013-06-08 2018-05-08 Apple Inc. Interpreting and acting upon commands that involve sharing information with remote devices
US9971774B2 (en) 2012-09-19 2018-05-15 Apple Inc. Voice-based media searching
US9972304B2 (en) 2016-06-03 2018-05-15 Apple Inc. Privacy preserving distributed evaluation framework for embedded personalized systems
US9986419B2 (en) 2014-09-30 2018-05-29 Apple Inc. Social reminders
US10043516B2 (en) 2016-09-23 2018-08-07 Apple Inc. Intelligent automated assistant
US10049668B2 (en) 2015-12-02 2018-08-14 Apple Inc. Applying neural network language models to weighted finite state transducers for automatic speech recognition
US10049663B2 (en) 2016-06-08 2018-08-14 Apple, Inc. Intelligent automated assistant for media exploration
US10067938B2 (en) 2016-06-10 2018-09-04 Apple Inc. Multilingual word prediction
US10079014B2 (en) 2012-06-08 2018-09-18 Apple Inc. Name recognition system
US10089072B2 (en) 2016-06-11 2018-10-02 Apple Inc. Intelligent device arbitration and control
US10102359B2 (en) 2011-03-21 2018-10-16 Apple Inc. Device access using voice authentication
US10169329B2 (en) 2014-05-30 2019-01-01 Apple Inc. Exemplar-based natural language processing
US10176167B2 (en) 2013-06-09 2019-01-08 Apple Inc. System and method for inferring user intent from speech inputs
US10185542B2 (en) 2013-06-09 2019-01-22 Apple Inc. Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant
US10192552B2 (en) 2016-06-10 2019-01-29 Apple Inc. Digital assistant providing whispered speech
US10223066B2 (en) 2015-12-23 2019-03-05 Apple Inc. Proactive assistance based on dialog communication between devices
US10249300B2 (en) 2016-06-06 2019-04-02 Apple Inc. Intelligent list reading
US10269345B2 (en) 2016-06-11 2019-04-23 Apple Inc. Intelligent task discovery
US10276170B2 (en) 2010-01-18 2019-04-30 Apple Inc. Intelligent automated assistant
US10283110B2 (en) 2009-07-02 2019-05-07 Apple Inc. Methods and apparatuses for automatic speech recognition
US10297253B2 (en) 2016-06-11 2019-05-21 Apple Inc. Application integration with a digital assistant
US10318871B2 (en) 2005-09-08 2019-06-11 Apple Inc. Method and apparatus for building an intelligent automated assistant
US10354011B2 (en) 2016-06-09 2019-07-16 Apple Inc. Intelligent automated assistant in a home environment
US10356243B2 (en) 2015-06-05 2019-07-16 Apple Inc. Virtual assistant aided communication with 3rd party service in a communication session
US10366158B2 (en) 2015-09-29 2019-07-30 Apple Inc. Efficient word encoding for recurrent neural network language models
US10410637B2 (en) 2017-05-12 2019-09-10 Apple Inc. User-specific acoustic models
WO2019193378A1 (en) * 2018-04-06 2019-10-10 Flex Ltd. Device and system for accessing multiple virtual assistant services
US10446143B2 (en) 2016-03-14 2019-10-15 Apple Inc. Identification of voice inputs providing credentials
US10482874B2 (en) 2017-05-15 2019-11-19 Apple Inc. Hierarchical belief states for digital assistants
US10490187B2 (en) 2016-06-10 2019-11-26 Apple Inc. Digital assistant providing automated status report
US10509862B2 (en) 2016-06-10 2019-12-17 Apple Inc. Dynamic phrase expansion of language input
US10521466B2 (en) 2016-06-11 2019-12-31 Apple Inc. Data driven natural language event detection and classification
US10567477B2 (en) 2015-03-08 2020-02-18 Apple Inc. Virtual assistant continuity
US10593346B2 (en) 2016-12-22 2020-03-17 Apple Inc. Rank-reduced token representation for automatic speech recognition
US10671428B2 (en) 2015-09-08 2020-06-02 Apple Inc. Distributed personal assistant
US10691473B2 (en) 2015-11-06 2020-06-23 Apple Inc. Intelligent automated assistant in a messaging environment
US10733993B2 (en) 2016-06-10 2020-08-04 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US10747498B2 (en) 2015-09-08 2020-08-18 Apple Inc. Zero latency digital assistant
US10755703B2 (en) 2017-05-11 2020-08-25 Apple Inc. Offline personal assistant
US10791176B2 (en) 2017-05-12 2020-09-29 Apple Inc. Synchronization and task delegation of a digital assistant
US10795541B2 (en) 2009-06-05 2020-10-06 Apple Inc. Intelligent organization of tasks items
US10810274B2 (en) 2017-05-15 2020-10-20 Apple Inc. Optimizing dialogue policy decisions for digital assistants using implicit feedback
US11010550B2 (en) 2015-09-29 2021-05-18 Apple Inc. Unified language modeling framework for word prediction, auto-completion and auto-correction
US11080012B2 (en) 2009-06-05 2021-08-03 Apple Inc. Interface for a virtual digital assistant
US11217255B2 (en) 2017-05-16 2022-01-04 Apple Inc. Far-field extension for digital assistant services
US11366574B2 (en) 2018-05-07 2022-06-21 Alibaba Group Holding Limited Human-machine conversation method, client, electronic device, and storage medium
US11587559B2 (en) 2015-09-30 2023-02-21 Apple Inc. Intelligent device identification

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5794050A (en) * 1995-01-04 1998-08-11 Intelligent Text Processing, Inc. Natural language understanding system
US5963894A (en) * 1994-06-24 1999-10-05 Microsoft Corporation Method and system for bootstrapping statistical processing into a rule-based natural language parser
US6044347A (en) * 1997-08-05 2000-03-28 Lucent Technologies Inc. Methods and apparatus object-oriented rule-based dialogue management
US6101492A (en) * 1998-07-02 2000-08-08 Lucent Technologies Inc. Methods and apparatus for information indexing and retrieval as well as query expansion using morpho-syntactic analysis
US6139201A (en) * 1994-12-22 2000-10-31 Caterpillar Inc. Integrated authoring and translation system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5963894A (en) * 1994-06-24 1999-10-05 Microsoft Corporation Method and system for bootstrapping statistical processing into a rule-based natural language parser
US6139201A (en) * 1994-12-22 2000-10-31 Caterpillar Inc. Integrated authoring and translation system
US5794050A (en) * 1995-01-04 1998-08-11 Intelligent Text Processing, Inc. Natural language understanding system
US6044347A (en) * 1997-08-05 2000-03-28 Lucent Technologies Inc. Methods and apparatus object-oriented rule-based dialogue management
US6101492A (en) * 1998-07-02 2000-08-08 Lucent Technologies Inc. Methods and apparatus for information indexing and retrieval as well as query expansion using morpho-syntactic analysis

Cited By (83)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9646614B2 (en) 2000-03-16 2017-05-09 Apple Inc. Fast, language-independent method for user authentication by voice
US10318871B2 (en) 2005-09-08 2019-06-11 Apple Inc. Method and apparatus for building an intelligent automated assistant
EP1950672A1 (en) * 2007-01-25 2008-07-30 Deutsche Telekom AG Method and data processing system for manual access of structurally stored information
US9865248B2 (en) 2008-04-05 2018-01-09 Apple Inc. Intelligent text-to-speech conversion
US9626955B2 (en) 2008-04-05 2017-04-18 Apple Inc. Intelligent text-to-speech conversion
US20130226579A1 (en) * 2009-02-03 2013-08-29 Eugeniusz Wlasiuk Systems and methods for interactively accessing hosted services using voice communications
CN102369568A (en) * 2009-02-03 2012-03-07 索夫特赫斯公司 Systems and methods for interactively accessing hosted services using voice communications
US20100198595A1 (en) * 2009-02-03 2010-08-05 SoftHUS Sp.z.o.o Systems and methods for interactively accessing hosted services using voice communications
WO2010089645A1 (en) * 2009-02-03 2010-08-12 Softhus Sp. Z.O.O Systems and methods for interactively accessing hosted services using voice communications
US8417523B2 (en) 2009-02-03 2013-04-09 SoftHUS Sp z.o.o Systems and methods for interactively accessing hosted services using voice communications
US20130226575A1 (en) * 2009-02-03 2013-08-29 Eugeniusz Wlasiuk Systems and methods for interactively accessing hosted services using voice communications
US10795541B2 (en) 2009-06-05 2020-10-06 Apple Inc. Intelligent organization of tasks items
US11080012B2 (en) 2009-06-05 2021-08-03 Apple Inc. Interface for a virtual digital assistant
US10283110B2 (en) 2009-07-02 2019-05-07 Apple Inc. Methods and apparatuses for automatic speech recognition
US10276170B2 (en) 2010-01-18 2019-04-30 Apple Inc. Intelligent automated assistant
US9548050B2 (en) 2010-01-18 2017-01-17 Apple Inc. Intelligent automated assistant
US11423886B2 (en) 2010-01-18 2022-08-23 Apple Inc. Task flow identification based on user intent
JP2014222510A (en) * 2010-01-18 2014-11-27 アップル インコーポレイテッド Intelligent automated assistant
US10706841B2 (en) 2010-01-18 2020-07-07 Apple Inc. Task flow identification based on user intent
US10741185B2 (en) 2010-01-18 2020-08-11 Apple Inc. Intelligent automated assistant
US9633660B2 (en) 2010-02-25 2017-04-25 Apple Inc. User profiling for voice input processing
US10049675B2 (en) 2010-02-25 2018-08-14 Apple Inc. User profiling for voice input processing
US10102359B2 (en) 2011-03-21 2018-10-16 Apple Inc. Device access using voice authentication
US9798393B2 (en) 2011-08-29 2017-10-24 Apple Inc. Text correction processing
US9953088B2 (en) 2012-05-14 2018-04-24 Apple Inc. Crowd sourcing information to fulfill user requests
US10079014B2 (en) 2012-06-08 2018-09-18 Apple Inc. Name recognition system
US9971774B2 (en) 2012-09-19 2018-05-15 Apple Inc. Voice-based media searching
US9633674B2 (en) 2013-06-07 2017-04-25 Apple Inc. System and method for detecting errors in interactions with a voice-based digital assistant
US9966060B2 (en) 2013-06-07 2018-05-08 Apple Inc. System and method for user-specified pronunciation of words for speech synthesis and recognition
US9582608B2 (en) 2013-06-07 2017-02-28 Apple Inc. Unified ranking with entropy-weighted information for phrase-based semantic auto-completion
US9620104B2 (en) 2013-06-07 2017-04-11 Apple Inc. System and method for user-specified pronunciation of words for speech synthesis and recognition
US9966068B2 (en) 2013-06-08 2018-05-08 Apple Inc. Interpreting and acting upon commands that involve sharing information with remote devices
US10657961B2 (en) 2013-06-08 2020-05-19 Apple Inc. Interpreting and acting upon commands that involve sharing information with remote devices
US10185542B2 (en) 2013-06-09 2019-01-22 Apple Inc. Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant
US10176167B2 (en) 2013-06-09 2019-01-08 Apple Inc. System and method for inferring user intent from speech inputs
US10169329B2 (en) 2014-05-30 2019-01-01 Apple Inc. Exemplar-based natural language processing
US10904611B2 (en) 2014-06-30 2021-01-26 Apple Inc. Intelligent automated assistant for TV user interactions
US9668024B2 (en) 2014-06-30 2017-05-30 Apple Inc. Intelligent automated assistant for TV user interactions
US9986419B2 (en) 2014-09-30 2018-05-29 Apple Inc. Social reminders
US10567477B2 (en) 2015-03-08 2020-02-18 Apple Inc. Virtual assistant continuity
US10356243B2 (en) 2015-06-05 2019-07-16 Apple Inc. Virtual assistant aided communication with 3rd party service in a communication session
US10671428B2 (en) 2015-09-08 2020-06-02 Apple Inc. Distributed personal assistant
US10747498B2 (en) 2015-09-08 2020-08-18 Apple Inc. Zero latency digital assistant
US11500672B2 (en) 2015-09-08 2022-11-15 Apple Inc. Distributed personal assistant
US9697820B2 (en) 2015-09-24 2017-07-04 Apple Inc. Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks
US10366158B2 (en) 2015-09-29 2019-07-30 Apple Inc. Efficient word encoding for recurrent neural network language models
US11010550B2 (en) 2015-09-29 2021-05-18 Apple Inc. Unified language modeling framework for word prediction, auto-completion and auto-correction
US11587559B2 (en) 2015-09-30 2023-02-21 Apple Inc. Intelligent device identification
US10691473B2 (en) 2015-11-06 2020-06-23 Apple Inc. Intelligent automated assistant in a messaging environment
US11526368B2 (en) 2015-11-06 2022-12-13 Apple Inc. Intelligent automated assistant in a messaging environment
US10049668B2 (en) 2015-12-02 2018-08-14 Apple Inc. Applying neural network language models to weighted finite state transducers for automatic speech recognition
US10223066B2 (en) 2015-12-23 2019-03-05 Apple Inc. Proactive assistance based on dialog communication between devices
US10446143B2 (en) 2016-03-14 2019-10-15 Apple Inc. Identification of voice inputs providing credentials
US9934775B2 (en) 2016-05-26 2018-04-03 Apple Inc. Unit-selection text-to-speech synthesis based on predicted concatenation parameters
US9972304B2 (en) 2016-06-03 2018-05-15 Apple Inc. Privacy preserving distributed evaluation framework for embedded personalized systems
US10249300B2 (en) 2016-06-06 2019-04-02 Apple Inc. Intelligent list reading
US11069347B2 (en) 2016-06-08 2021-07-20 Apple Inc. Intelligent automated assistant for media exploration
US10049663B2 (en) 2016-06-08 2018-08-14 Apple, Inc. Intelligent automated assistant for media exploration
US10354011B2 (en) 2016-06-09 2019-07-16 Apple Inc. Intelligent automated assistant in a home environment
US10733993B2 (en) 2016-06-10 2020-08-04 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US11037565B2 (en) 2016-06-10 2021-06-15 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US10067938B2 (en) 2016-06-10 2018-09-04 Apple Inc. Multilingual word prediction
US10490187B2 (en) 2016-06-10 2019-11-26 Apple Inc. Digital assistant providing automated status report
US10509862B2 (en) 2016-06-10 2019-12-17 Apple Inc. Dynamic phrase expansion of language input
US10192552B2 (en) 2016-06-10 2019-01-29 Apple Inc. Digital assistant providing whispered speech
US11152002B2 (en) 2016-06-11 2021-10-19 Apple Inc. Application integration with a digital assistant
US10269345B2 (en) 2016-06-11 2019-04-23 Apple Inc. Intelligent task discovery
US10297253B2 (en) 2016-06-11 2019-05-21 Apple Inc. Application integration with a digital assistant
US10089072B2 (en) 2016-06-11 2018-10-02 Apple Inc. Intelligent device arbitration and control
US10521466B2 (en) 2016-06-11 2019-12-31 Apple Inc. Data driven natural language event detection and classification
US10043516B2 (en) 2016-09-23 2018-08-07 Apple Inc. Intelligent automated assistant
US10553215B2 (en) 2016-09-23 2020-02-04 Apple Inc. Intelligent automated assistant
US10593346B2 (en) 2016-12-22 2020-03-17 Apple Inc. Rank-reduced token representation for automatic speech recognition
US10755703B2 (en) 2017-05-11 2020-08-25 Apple Inc. Offline personal assistant
US11405466B2 (en) 2017-05-12 2022-08-02 Apple Inc. Synchronization and task delegation of a digital assistant
US10791176B2 (en) 2017-05-12 2020-09-29 Apple Inc. Synchronization and task delegation of a digital assistant
US10410637B2 (en) 2017-05-12 2019-09-10 Apple Inc. User-specific acoustic models
US10482874B2 (en) 2017-05-15 2019-11-19 Apple Inc. Hierarchical belief states for digital assistants
US10810274B2 (en) 2017-05-15 2020-10-20 Apple Inc. Optimizing dialogue policy decisions for digital assistants using implicit feedback
US11217255B2 (en) 2017-05-16 2022-01-04 Apple Inc. Far-field extension for digital assistant services
WO2019193378A1 (en) * 2018-04-06 2019-10-10 Flex Ltd. Device and system for accessing multiple virtual assistant services
US11064006B2 (en) 2018-04-06 2021-07-13 Flex Ltd. Device and system for accessing multiple virtual assistant services
US11366574B2 (en) 2018-05-07 2022-06-21 Alibaba Group Holding Limited Human-machine conversation method, client, electronic device, and storage medium

Also Published As

Publication number Publication date
IL140322A0 (en) 2002-05-23
WO2002049253A3 (en) 2002-09-12
AU2002222491A1 (en) 2002-06-24

Similar Documents

Publication Publication Date Title
WO2002049253A2 (en) Method and interface for intelligent user-machine interaction
US10073843B1 (en) Method and apparatus for cross-lingual communication
JP5166661B2 (en) Method and apparatus for executing a plan based dialog
US6604075B1 (en) Web-based voice dialog interface
US7580842B1 (en) System and method of providing a spoken dialog interface to a website
US7539619B1 (en) Speech-enabled language translation system and method enabling interactive user supervision of translation and speech recognition accuracy
US7386449B2 (en) Knowledge-based flexible natural speech dialogue system
EP1349145B1 (en) System and method for providing information using spoken dialogue interface
JP2020518870A (en) Facilitating end-to-end communication with automated assistants in multiple languages
US20020077823A1 (en) Software development systems and methods
US20080133245A1 (en) Methods for speech-to-speech translation
WO2008128423A1 (en) An intelligent dialog system and a method for realization thereof
CA2361429A1 (en) System and method for bilateral communication between a user and a system
TW200424951A (en) Presentation of data based on user input
Callejas et al. Implementing modular dialogue systems: A case of study
CN1296588A (en) Opening and alli-information template type of language translation method having man-machine dialogue function and all-information semanteme marking system
US20030091176A1 (en) Communication system and method for establishing an internet connection by means of a telephone
Wahlster Robust translation of spontaneous speech: a multi-engine approach
Pearlman Sls-lite: Enabling spoken language systems design for non-experts
Mittendorfer et al. Evaluation of Intelligent Component Technologies for VoiceXML Applications
WO2023004226A1 (en) Virtual conversational agent
Wyard et al. Spoken language systems—beyond prompt and
Saigal SEES: An adaptive multimodal user interface for the visually impaired
Bernsen et al. Interactive Speech Systems
Tsai et al. WWW and telecommunication collaboration service for Mandarin automatic personal phone book inquire dialogue system

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SI SK SL TJ TM TN TR TT TZ UA UG US UZ VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP