WO2005038775A1 - Systeme, methode, et langage de programmation pour developper et pour executer des dialogues entre un utilisateur et un agent virtuel - Google Patents

Systeme, methode, et langage de programmation pour developper et pour executer des dialogues entre un utilisateur et un agent virtuel Download PDF

Info

Publication number
WO2005038775A1
WO2005038775A1 PCT/US2004/033186 US2004033186W WO2005038775A1 WO 2005038775 A1 WO2005038775 A1 WO 2005038775A1 US 2004033186 W US2004033186 W US 2004033186W WO 2005038775 A1 WO2005038775 A1 WO 2005038775A1
Authority
WO
WIPO (PCT)
Prior art keywords
dialog
speech
combination
script
interface
Prior art date
Application number
PCT/US2004/033186
Other languages
English (en)
Inventor
Michael Kuperstein
Original Assignee
Metaphor Solutions, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US10/915,955 external-priority patent/US20050080628A1/en
Application filed by Metaphor Solutions, Inc. filed Critical Metaphor Solutions, Inc.
Publication of WO2005038775A1 publication Critical patent/WO2005038775A1/fr
Priority to US11/145,540 priority Critical patent/US20060031853A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Definitions

  • the patent entitled “Methods and apparatus object-oriented rule-based dialogue management” discloses a dialogue manager that processes a set of frames characterizing a subject of the dialogue, where each frame includes one or more properties that describe an object which may be referenced during the dialogue.
  • a weight is assigned to each of the properties represented by the set of frames, such that the assigned weights indicate the relative importance of the corresponding properties.
  • the dialogue manager utilizes the weights to determine which of a number of possible responses the system should generate based on a given user input received during the dialogue.
  • the dialogue manager serves as an interface between the user and an application which is running on the system and defines the set of frames.
  • the dialogue manager supplies user requests to the application, and processes the resulting responses received from the application.
  • the dialogue manager uses the property weights to determine, for example, an appropriate question to ask the user in order to resolve ambiguities that may arise in execution of a user request in the application.
  • This patent discloses a flexible dialog manager that deals with ambiguities, it does not focus on fast and easy development, since it does not deal well with the following: organizing speech grammars and audio files are not efficient; manually determining the relative weights for all the frames requires much skill, creating a means of asking the caller questions to resolve ambiguities requires much effort. It does not deal well with interfaces to industry standard protocols and external data source integration.
  • the patent entitled "System and method for developing interactive speech applications" (U.S. Patent No. 6,173,266) is directed to the use of re-usable dialog modules that are configured together to quickly create speech applications.
  • dialog module The specific instance of the dialog module is determined by a set of parameters. This approach does impact the speed of development but lacks flexibility. A customer cannot easily change the parameter set of the dialog modules. Also the dialog modules work within the syntax of a standard application interface like Voice XML, which is still part of the problem of difficult development. In addition, dialog modules, by themselves do not address the difficulty of implementing complex conditional flow control inherent in good voice-user-interfaces, nor the difficulty of integration of external web services and data sources into the dialog.
  • the patent entitled "Natural language task-oriented dialog manager and method" (U.S. Patent No. 6,246,981) discloses the use of a dialog manager that is controllable through a backend and a script for determining a behavior for the dialog manager.
  • the recognizer may include a speech recognizer for recognizing speech and outputting recognized text.
  • the recognized text is output to a natural language understanding module for inte reting natural language supplied through the input.
  • the synthesizer may be a text to speech synthesizer.
  • the task-oriented forms may each correspond to a different task in the application, each form including a plurality of fields for receiving data supplied by a user at the input, the fields corresponding to infonnation applicable to the application associated with the form.
  • the task- oriented form may be selected by scoring the forms relative to each other according to information needed to complete each form and the context of information input from a user.
  • the dialog manager may include means for fonnulating questions for one of prompting a user for needed information and clarifying information supplier by the user.
  • the dialog manager may include means for confirming information supplied by the user.
  • the dialog manager may include means for inheriting information previously supplied in a different context for use in a present form.
  • This patent views a dialog as filling in a set of fo ⁇ ns.
  • the forms are declarative structures of the type "if the meaning of the user's text matches a specified subject then do the following".
  • the dialog manager in this patent allows some level of semantic flexibility, but does not address the development difficulty in real world applications for the difficulty in creating the semantic parsing that gives the flexibility, organizing speech grammars and audio files; interacting with industry standard speech interfaces, nor the difficulty of integration of external web services and data sources into the dialog.
  • the patent entitled “Method and apparatus for discourse management” (U.S. Patent No.
  • 6,356,869 discloses a method and an apparatus for performing discourse management.
  • the patent discloses a discourse management apparatus for assisting a user to achieve a certain task.
  • the discourse management apparatus receives information data elements from the user, such as spoken utterances or typed text, and processes them by implementing a finite state machine.
  • the finite state machine evolves according to the context of the information provided by the user in order to reach a certain state ⁇ vhere a signal can be output having a practical utility in achieving the task desired by the user.
  • the context based approach allows the discourse management apparatus to keep track of the conversation state without the undue complexity of prior art discourse management systems.
  • the dialog manager in this patent uses the decoded output of a speech grammar to search the user interface data set for a corresponding spoken language interface element and data which is returned to the dialog manager when found.
  • the dialog manager provides the spoken language interface element associated data to the application or system for processing in accordance therewith.
  • This patent is a simpler form of U.S. Patent No. 6,246,981 discussed above and is focused on use with embedded devices. It is too rigid and too simplistic to be useful in many customer service applications where flexibility is required.
  • the ASR industry is aware of the complexity of using Voice XML and SALT and a number of software tools have been created to make dialog development with ASR much easier.
  • Audium One of the better known tools is being sold by a company called Audium.
  • This is a development environment that incorporates flow diagrams for dialogs, similar to the Microsoft product VISIO, with drag-and- drop graphical elements representing parts of the dialog.
  • the Audium product represents a flow diagram style that most of the newer tools use.
  • Each graphical element in the flow diagram has a property sheet that the developer fills out.
  • the present invention provides an optimal combination of speed of development with flexibility of flow control and interfaces for commercial speech dialogs and applications.
  • Dialogs are vie ⁇ ved as procedural processes that are mostly easily managed by procedural programming languages.
  • the best examples of managing procedural processes having a high level of conditional flow control are standard programming languages like C++, Basic, Java and JavaScript. After more than 30 years of use, these languages have been honed to optimal use.
  • the present invention leverages the best features of these languages applied to real world automated speech response dialogs.
  • the present invention also represents a dialog as not just a sequence of forms.
  • a dialog may also include flow control, context management, call management, dynamic speech grammar generation, communication with service agents, data transaction management (e.g., database and web services) and fulfillment management which are either very difficult or not possible to program into current, standard voice interfaces such as Voice XML and SALT scripts.
  • the invention provides for integration of these functions into scripts.
  • the invention adapts features of standard procedural languages, dynamic web services and standard integrated development environments (IDEs), toward developing and running automated speech response dialogs.
  • IDEs integrated development environments
  • MetaphorScript A procedural software language or script language is provided, called MetaphorScript. This high level language is designed to develop and run dialogs which share knowledge between a person and a virtual agent for the purpose of solving a problem or completing a transaction.
  • the inherited speech dialog resources may include, for example, speech interface software drivers, automated dialog exception handling, organization of grammar and audio files to allow easy authoring and integration of grammar results with dialog variables.
  • the automated dialog exception handling may include handling the event when a user says nothing and times out and the event when the received speech is not known in a given speech grammar.
  • the language also allows proven applications to be linked as reusable building blocks with new applications, further leveraging development efforts.
  • editor allows the developer to develop an ASR dialog by entering text scripts in the script language syntax, which is similar to JavaScript. These scripts determine the flow control of a dialog.
  • editor allows the developer to enter information in a tree of property sheets associated with the scripts to determine dialog prompts, audio files, speech grammars, external interfaces and script language variables. It saves all the information about an application in an XML project file.
  • the defined project enables, builds and runs an application.
  • the linker reads the XML project file and checks the consistency of the scripts and associated properties, reports errors if any, and sets up the implementation of the run-time environment for the application project.
  • the run-time interpreter reads the XML project file and responds to a user through either a voice gateway using speech or through an Internet browser using HTML text exchanges, both of which are derived from the scripts, internal and external data sources and associated properties.
  • HTML text dialog with users does not have any of the input grammars that a voice dialog has, since the input is just what the users type in, while the voice dialog requires a grammar to transcribe ⁇ vhat the users say to text.
  • the text dialog mode may be used to simulate a speech dialog for debugging the flow of scripts.
  • the text dialog may be the basis for a virtual chat solution in the market.
  • One embodiment of the present invention includes a method and system for developing and running speech dialogs where each dialog is capable of supporting one or more turns of conversation between a user and virtual agent via a communications interface or data interface.
  • a communications interface typically interacts with a person while a data interface interacts with a computer, machine, software application, or other type of non-person user.
  • the system may include an editor for defining scripts and entering dialog info ⁇ nation into a project file. Each script typically determines the flow control of one or more dialogs while each project file is typically associated with a particular dialog.
  • a linker may use a project configuration in the project file to set up the implementation of a run-time environment for an associated dialog.
  • an computer application such as the Conversation Manager program, that may include a run-time interpreter, typically delivers a result to either or both a communications interface and data interface based on the dialog information in the project file and user input
  • the communications interface preferably delivers a message to the user such as a person.
  • the data interface may deliver a message to a non-person user as well.
  • the message may be a response to a user query or may initiate a response from a user.
  • the communications interface may be any one or combination of a voice gateway, Web server, electronic mail server, instant messaging server (IMS), multimedia messaging server (MMS), or virtual chat system.
  • the application and voice gateway preferably exchange info ⁇ nation using either the VoiceXML or SALT interface language.
  • the voice gateway message may be in the form of playing audio for the user derived from the speech grammar and audio files.
  • the message may be in various forms including text, HTML text, audio, an electronic mail message, an instant message, a multimedia message, or graphical image.
  • the user input may also be the form of text, HTML text, speech, an electronic mail message, an instant message, a multimedia message, or graphical image.
  • the dialog information typically includes either or a combination of dialog prompts, audio files, speech grammars, external interface references, one or more scripts, and script variables.
  • the application may perform interpretation on a statement by statement basis where each statement resides within the project file.
  • the editor preferably defines scripts using a unique script language.
  • the script language typically includes any one or combination of literals, integers, ⁇ floating-point literals, Boolean literals, dialog variables, internal dialog variables, arrays, operators, functions, if/then statements, switch/case statements, loops, for loops, while loops, do/while loops, dialog statements, external interfaces statements, and special statements.
  • other script or software programming languages may also be used.
  • Such languages may include C#, C++, C, JAVA, JavaScript, JScript, VBScript, VB.Net, Pearl, PHP, and other languages known to those skilled in the art.
  • the editor also preferably includes a graphical user interface (GUI) that allows a developer to perform any one of file navigation, project navigation, script text editing, property sheet editing, and linker reporting.
  • GUI graphical user interface
  • the linker may create the files, interfaces, and internal databases required by the interpreter of the speech dialog application.
  • the application typically uses an interpreter to parse and interpret script statements and associated properties in a script plan where each statement includes any one of dialog, flow control, external scripts, internal state change, references to external context information, and an exit statement.
  • the interpreter's result may also be based on any one or combination of external sources including external databases, web services, web pages through web servers, electronic mail servers, fax servers, CTI interfaces, Internet socket connections, and other dialog session applications. Yet further, the interpreter result may be based on a session state that determines where in a script to process a dialog session next. The interpreter also preferably saves the session state after returning the result to either or both the communications interface and data interface. In other embodiments, the scripts may be compiled directly into executable code avoiding the need for an interpreter. For example, a set of dialog scripts may be defined using the C# programming language and compiled directly into executable code.
  • Another embodiment of the present invention includes a speech dialog management system and method where each dialog supports one or more turns of conversation between a user and virtual agent using a communications interface or data interface.
  • the dialog management system preferably includes a computer and computer readable medium, operatively coupled to the computer, that stores text scripts and dialog information. Each text script then determines the recognition, response, and flow control of a dialog while an application, based on the dialog info ⁇ nation and user input, delivers a result to either or both the communications interface and data interface.
  • Fig. 1 shows a speech dialog processing system in accordance with the principles of the present invention.
  • Fig. 2 shows a process flow according to principles of the present invention.
  • Fig. 3 shows an alternative embodiment of the dialog session processing system.
  • Fig. 4 is a top-level view of a graphical user interface (GUI) for a conversation manager editor with a linker tool encircled in the toolbar.
  • Fig. 5 is a detailed view of a section of the GUI of Fig. 4 corresponding to a file navigation tree function.
  • Fig. 6 is a detailed view of a section of the GUI of Fig. 4 corresponding to a project navigation tree function.
  • GUI graphical user interface
  • FIG. 7 is a detailed view of a section of the GUI of Fig. 4 corresponding to a script editor.
  • Fig. 8 is a detailed view of a section of the GUI of Fig. 4 corresponding to a dialog property sheet editor.
  • Fig. 9 is a detailed view of a section of the GUI of Fig. 4 corresponding to a dialog variable property sheet editor.
  • Fig. 10 is a detailed view of a section of the GUI of Fig. 4 corresponding to a recognition property sheet editor.
  • Fig. 11 is a detailed view of a section of the GUI of Fig. 4 corresponding to an interface property sheet editor.
  • Fig. 1 illustrates an embodiment of a speech dialog processing system 110 that includes communications interface 102, i.e., a voice gateway, and application server 103.
  • a telephone network 101 connects telephone user 100 to the voice gateway 102.
  • communications interface 102 provides capabilities that include telephony interfaces, speech recognition, audio playback, text-to-speech processing, and application interfaces.
  • the application server 103 may also interface with external data sources or services 105.
  • application server 103 includes a web server 203, web- linkage files such as Initial Speech Interface file 204 and ASP file 205, a dialog session manager Interpreter 206, application project files 207, session state files 210, Speech Grammar files 208, Audio files 209 and Call Log database 211, the combination of which is typically refe ⁇ ed to as dialog session speech application 218.
  • dialog session speech application 218 may be performed in an integrated development environment using IDE GUI 217 which includes editor 214, linker 215 and debugger 216.
  • a session database 104 and external data sources 213 or services 105 are also connected to application server 103.
  • a data driven device interface 220 may be used to facilitate a dialog with a data driven device.
  • Web server 212 may enable back-end data transactions over the web. Operation of these elements of the speech dialog processing system 110 is described in further detail herein.
  • the unique script language is a dialog scripting language which is based on a specification subset of JavaScript but adds special functions focused on speech dialogs. Scripts written in the script language are written directly into project files 207 to allow Interpreter 206 to dynamically generate dialogs at run time.
  • the scripts are a sequence of functions, assignments of script variable expressions, logical operations, dialog interfaces and data interfaces (back end processing) as well as internal states.
  • a plan is a set of procedural steps that implements a process flow with a user, data sources and/or a live agent that may include conditional branches and loops.
  • a dialog interface specifies a single turn of conversation between a virtual agent and a user, i.e., person, whereby the virtual agent says something to a user and the virtual agent listens to recognize a response (or message) from the user.
  • the user's response is recognized using speech grammars 208 that may include standard grammars as specified by the World Wide Web (WW) Consortium that define expected utterances.
  • WWW World Wide Web
  • Script inteipretation is done on a statement by statement basis. Each statement can only be on one line, except when there is a continuation character at the end of a line. Unlike JavaScript, there are no ";" characters at the end of each line.
  • other scripting or software programming languages may also be used.
  • such languages may include C#, C++, C, JAVA, JavaScript, JScript, VBScript, VB.Net, Pearl, PHP, and other languages known to those skilled in the art.
  • Such languages may also be enhanced with specific functions focused on speech dialogs as discussed herein.
  • scripts generated from these scripting and software programming languages may be compiled directly into executable code avoiding the need for an interpreter.
  • a set of dialog scripts may be defined using the C# programming language and compiled directly into executable code.
  • a script may be called in two ways: The first script that is called in the beginning of any dialog is the one labeled as "start”. Every project typically has a "start" script. The other way a script is called is through a function called in one script which may refer to a function defined in another script, even across speech applications.
  • Elements of the script language may include: • Literals - are used to represent values in the script language. These are fixed values, not variables in the script. Examples of literals include: 1234, "This is a literal", true.
  • Integers - are expressed in decimal.
  • a decimal integer literal typically comprises of a sequence of digits without a leading 0 (zero) but can optionally have a leading '-'. Examples of integer literals are: 42, -345.
  • • Floating-point literals - may have the following parts: a minus sign("- "), a decimal integer, a decimal point (".") and a fraction (another decimal number).
  • a floating-point literal must have at least one digit. Some examples of floating-point literals are 3.1415, -3123.
  • Boolean literals - have the values: true, false, 1, 0, "yes" and "no".
  • String literals A string literal is zero or more characters enclosed in double (") quotation marks. A string is typically delimited by quotation marks. The following are examples of string literals: "blah”, "1234".
  • Dialog variables cannot be the same as any of the script keywords or special functions. Dialog variables are typically case sensitive. That means that "My variable” and “my_variable” are two different names to script language, because they have different capitalization. Some examples of legal names are: number_of_hits, temp99, and read_RDF. Dialog variables from other linked applications may be referenced by preceding the variable name with the name of the application with ":;” in between. For example, to refer to a dialog variable named “street” in the application named “address”, use “address::street”. The linked application is typically listed in the project configuration.
  • quotient divisor / dividend.
  • the value of quotient will be 2.
  • the script language preferably recognizes the following types of values: string, integer, float, boolean, or nbest (described below). Examples include: numbers, such as 42 or 3.14159 ; logical (Boolean) values, either true or false , 1 or 0; strings, such as "Howdy!; null, a special keyword which refers to a value of nothing; second highest recognition choice such as spelling.
  • the time out period is about 4 seconds.
  • o previous_subject (string) - previous subject if any o previous_user_input (string) - previous user input
  • o session_id (string) - unique ID for the cunent dialog session o subject (string) - cunent subject if any o top_recognition_confidence (float) - top recognition confidence score for the current user input. The score measures how confident the speech recognizer is that the result matches what was actually spoken.
  • NBest Arrays Most of the time a script plan gets some knowledge from the user with only one top choice such as yes/no or a phone number. However, at times, the script may require knowledge from the user that could be ambiguous such as spelling letters. For example "m” and "n” and “b” and “d” are probably difficult to distinguish.
  • a dialog variable By giving a dialog variable a value type of nbest, it will store a maximum of the top 5 choices that may be recognized by the speech grammar. The values are always strings.
  • the following syntax may be used: ⁇ nbest_variable>. ⁇ i> where ⁇ i> is either an integer or a dialog variable with a value ranging from 0 to 4. The 0 choice is the top choice.
  • the allowed operations are "+”,”-", “*”, "/” and “%” and the logical operators belo ⁇ v.
  • the "+” operator can be applied to integers, floats and strings. For strings, the "+” operator does a concatenation.
  • the "%” can only be applied to integers.
  • a developer may also assign a boolean expression using the "&&" and "II”.
  • a comparison operator compares its operands and returns a logical value based on whether the comparison is true or false.
  • the operands may be numerical or string values.
  • o Arithmetic Operators take numerical values (either literals or variables) as their operands and return a single numerical value.
  • the standard arithmetic operators are addition (+), subtraction (-), multiplication (*), division (/) and remainder (%). These operators work as they do in other programming languages, as well as, in standard arithmetic.
  • o Logical Operators Logical operators take Boolean (logical) values as operands and return a Boolean value. That is, they evaluate whether each subexpression within a Boolean expression is true or false, and then execute the operation on the respective truth values.
  • the operators include: and (&&), or (
  • Functions - are one of the fundamental building blocks in the present script language.
  • a function is a script procedure or a set of statements.
  • a function definition has these basic parts: The keyword "function”, a function name, and a parameter list, if any, between two parentheses, parameters are separated with commas.
  • the statements in the function are inside curly braces: " ⁇ ⁇ "• Defining the function gives the function a name and specifies what to do when the function is called.
  • the variables that will be called in that function must be declared. The following is an example of defining a function: function alert() ⁇ tell_alert ⁇
  • the linked application is typically listed in the configuration property sheet that is described further herein below.
  • Function calls in linked applications may also pass dialog variables by value through a parameter list.
  • address::get_street(city, state, zip_code, street) All parameters are typically defined as dialog variables in both the calling application and the called application and all parameters are both input and output values.
  • all values are passed from the calling application to the called application and then when the function returns, all values are passed back. If a function is called local to an application, the parameter list is ignored, because all dialog variables have a scope throughout an application.
  • Functions may be called from any application to any other application, if all the linked applications are listed in the configuration property sheet of the starting application. For example, in the starting application, "appO”, appl ::funl(x,y) can be called and then in the "appl " application, app2::fun2(a,b) can be called.
  • Switch/Case - statements allow choosing the execution of statements from a set of statements depending on matching a value of a specific case.
  • the syntax is: switch( ⁇ dialog variable>) ⁇ case ⁇ literal value>: (statements) break ⁇
  • Loops - are useful for controlling dialog flow. Loops handle repetitive tasks extremely well, especially in the context of consecutive elements. Exception handling immediately springs to mind here, since most user inputs need to be checked for accuracy and looped if wrong. The two most common types of loops are for and while loops:
  • a "for loop” constitutes a statement including three expressions, enclosed in parentheses and separated by semicolons, followed by a block of statements executed in the loop.
  • a “for loop” resembles the following: for (initial-expression; condition; increment-expression) ⁇ statements ⁇
  • the initial-expression is an assignment statement. It is typically used to initialize a counter variable. The condition is evaluated both initially and on 'each pass through the loop. If this condition evaluates to true, the statements in statements are performed. When the condition evaluates to false, the execution of the "for" loop stops.
  • the increment-expression is generally used to update or increment the counter variable.
  • the statements constitute a block of statements that are executed as long as condition evaluates to true. This may be a single statement or multiple statements. Although not required, it is good practice to indent these statements from the beginning of the "for" statement to make the program code more readable.
  • While Loops The "while loop” is functionally similar to the "Tor's” statement. The two can fill in for one another - using either one is only a matter of convenience or preference according to context. The "while” creates a loop that evaluates an expression, and if it is true, executes a block of statements. The loop then repeats, as long as the specified condition is true.
  • the syntax of while differs slightly from that of for: while (condition) ⁇ statements ⁇
  • Dialog Statements - provide a high level reference to preset processes of telling the caller something and then recognizing what he said.
  • dialog statement types There are two dialog statement types: o get - gets a knowledge resource or concept from the user through a dialog interface and stores it in a dialog variable.
  • the syntax is "get( ⁇ dialog_variable>)”. An example is: “get(number_of_shares)” o tell - tells the user something.
  • the syntax is: "tell_*”. An example is : “tell_goodbye”
  • Each dialog statement has properties that need to be filled. They include: o name - of the dialog. o subject - of the dialog for context processing purposes. o say - what the caller will hear from the computer.
  • the syntax is an arbitrary combination of " ⁇ text> ( ⁇ dialog variable>)". An example is: "(company) today has a stock price of (price)”. This property provides for a powerful and flexible combination of static information (i.e., ⁇ text>) with highly variable infonnation (i.e., ⁇ dialog variables).
  • the "say" value will be parsed by the Interpreter. Any parentheses containing a dialog variable will be processed so that the string and/or audio-file-path value stored in the dialog variables will be output to the voice gateway.
  • the dialog variable could result in text-to-speech of the value of "company” or playback of a recorded audio file associated with "company”.
  • Any text segment which is between parentheses will be processed so that the associated audio file in the "say_audio_list” will be played through the voice gateway.
  • say_variable dynamic version of "say” stored in a dialog variable.
  • say_audio_list the list of audio files associated with "say” text segments in order.
  • the first text segment in "say” is associated with the first audio file, etc.
  • say_random_audio - enable the audio files for "say” to be played at random .
  • speech grammars are either defined by the W3C 15 standards body, known as SRGS (speech recognition grammar specification) or are a representation of Statistical Language Model speech recognition determined by a speech recognition engine manufacturer such as ScanSoft, Nuance or other providers.
  • External Interface Statements 20 o interface - calls an external interface method or function.
  • the syntax is: "interface( ⁇ interface>)".
  • An example is: “interface(get_stock_price)” ⁇ o db_get - gets the value of a dialog variable from a database value in a data source by using SQL database statements in 25 a variable or in a literal.
  • An internal ODBC interface is used to execute this function.
  • the syntax is : "db_get( ⁇ data source>, ⁇ dialog variable>, ⁇ SQL>)”.
  • An example is "db_get(account_db,price,sql_statement)”.
  • o db set - sets a database value in a data source from the value of a dialog variable by using SQL database statements.
  • An internal ODBC interface is used to execute this function.
  • the syntax is : "db_set( ⁇ data source>, ⁇ dialog variable>, ⁇ SQL>)".
  • An example is "db_set(account_db,price,sql_statement)”.
  • 5 o db_sql - executes SQL database statements on a data source.
  • An internal ODBC interface is used to execute this function.
  • the syntax is : "db_sql( ⁇ data source>, ⁇ SQL>)”.
  • An example is "db_sql(account_db,sql_statement)”.
  • the file is located in ⁇ install_directory> ⁇ speech_apps ⁇ call_logs ⁇ 30 ⁇ app_name> ⁇ user_recordings
  • the syntax is:" record( ⁇ dialog_variable>)".
  • An example is:" record(welcome_message)" o calljxansfer - transfers the call to another phone number through the value of the dialog variable.
  • the syntax is:" 5 call_transfer( ⁇ phone>)”.
  • An example is:" call_transfer (operator_phone)" o transfer_dialog - transfers the dialog to another Metaphor dialog through the value of the dialog variable.
  • the syntax is:" transfer_dialog( ⁇ dialog_variable>)".
  • An example is: 30 "insert_string(buffer,start,”abcd”)”.
  • o replace_string - replaces one sub-string with another anywhere it appears.
  • the syntax is: “replace_string( ⁇ in- string>, ⁇ search>, ⁇ replace>)”.
  • An example is: “replace_string(buffer,”abc”,”def')”.
  • o erase_string - erases a sequence of a string starting at a beginning position for a specified length.
  • the syntax is: "erase_string( ⁇ in-string>, ⁇ start>, ⁇ length>)”.
  • An example is: "erase_string(buffer,start,length)”.
  • o substring - gets a sub-string of a string starting at a position for a specified length.
  • the syntax is: "substring( ⁇ in- string>, ⁇ start>, ⁇ length>, ⁇ sub-string>)”. An example is: “substring(name,0,3,part)”.
  • o string ength - gets the length of a string.
  • the syntax is: "string_length( ⁇ string>, ⁇ length>)”.
  • An example is: "string_length(buffer,length)”.
  • o return - returns from a function call. Not required if there is a sequential end to a function.
  • the syntax is:"return" o exit - ends the dialog and hangs-up. Not required if there is a sequential end of a script.
  • the syntax is:"exit”.
  • a reference to an application dialog variable can be done on either side of an assignment statement.
  • the applications are testedas stand-alone applications and then when they are ready to be linked, the "is_linked_application" is enabled.
  • the developer needs to consider that the audio files refened in linked applications may not change. So if two main applications use different voice talent in their recordings and then both use the same linked application, there could be a sudden change of voice talent heard by the caller when the script transfers control between linked applications.
  • Commenting - Comments allow a developer to write notes within a program. They allow someone to subsequently browse the code and understand what the various functions do or what the variables represent.
  • Comments also allow a person to understand the code even after a period of time has elapsed.
  • a developer may only write one-line comments. For a one line comment, one precedes their comment with "//”. This indicates that everything written on that line, after the "//", is a comment and the program should disregard it.
  • a sample script which defines a plan to achieve the goal of resetting a caller's personal identification number (PIN) is as follows: telljntroduction //say greeting ii -
  • GUI graphical user interface
  • a prefened embodiment is a plugin to the open source, cross-platfo ⁇ n Eclipse integrated development environment that extends the available resources of Eclipse to create the sections of the dialog session manager integrated development environment that is accessed using IDE GUI 217.
  • the editor 214 typically includes the following sections: • File navigation tree for file resources needed that include project files, audio files, grammar files, databases, image files, and examples. • Project navigation tree for single project resources that include configurations, scripts, interfaces, prompts, grammars, audio files and dialog variables • Script text editor • Property sheet editor for editing values for existing property tags • Linker reporting of linker errors and status.
  • Fig. 4 provides a screen shot of the top-level view of the GUI which includes sections for the file navigation tree, project navigation tree, script editor , propeity sheet editor and linker 215 tool.
  • Figs. 5 through 11, respectively, provide more detailed views of these corresponding sections.
  • the editor 214 typically takes all the information that the developer enters into the GUI and saves it into the project file 207, i.e., an XML project file.
  • the schema of a typical project file 207 may be organized into the following
  • Initial speech interface file 204 is a web- linkage file for the dialog session speech application that interfaces with communications interface 102, i.e., the voice gateway. This is either a Voice XML file or a SALT file.
  • the voice gateway 102 maps an incoming call to the execution of this file and this file in turns starts the dialog session application by calling the following web-linkage file with an initial state and application identifiers.
  • the ASP, JSP, PHP or ASP.NET file 205 is a web- linkage file for dynamic generation of Voice XML or SALT .
  • This file transfers the state and application info ⁇ nation to the run-time Interpreter 206 and the multi-threaded Interpreter 206 returns the Voice XML or SALT that represents one turn of conversation.
  • a turn of conversation between a virtual agent and a user is where the virtual agent says something to a user and the virtual agent listens to recognize a response message from the user.
  • Linker 215 uses the project configuration in project file
  • the project configuration specifies a configuration propeity sheet, defined using Editor 214 of Fig. 2, that includes the following parameters for a dialog session speech application: o application_name - name of the speech application. o is_linked_application - specifies whether the application is linked. The values are either "true” or “false”. Default is "false”.
  • o linked_application_list list of application names of linked applications that the active application refers to.
  • o init_interface_file the initial speech interface file called by the voice gateway 102.
  • the voice gateway 102 maps a phone number to this file path.
  • o phone_network phone network encoding type such as PSTN, SIP or H323.
  • the phone network 101 determines the method of implementing certain interfaces such as computer telephony integration (CTI).
  • CTI computer telephony integration
  • o speech_interface_type an industry standard interface type and version of either VoiceXML or SALT.
  • o voice_gateway_server the manufacturer of the voice gateway 102.
  • o voice_gateway_domain - domain URL used for retrieving files of recorded audio
  • voice_gateway_ftp_username Username the FTP o voice_gateway_ftp_password - Password for the FTP o speech_recognition_type - manufacturer or the speech recognition engine software o text_to_speech_type - manufacturer of the text-to-speech engine software o database_server - manufacturer of the database server software o data_source_list - list of ODBC data sources, usernames and passwords used for external access to databases for values in the dialog o enable_call_logs - boolean for enabling call logging. The values are "true” or "false”. The default is "false”.
  • o call_log_type - Specifies the type of call log to generate. Values include "all", “caller”, “prompts", "whole_call”. The default is
  • the values are "none”, “increment” or “accumulate” o interface_admin_email - used to report run time enors o enable_html_debug - boolean for enabling debug in simulation mode.
  • the values are "true” or “false”. The default is “true”.
  • o session_state_directory used for flexible location of the session state file in a RAID database when scaling up the network of application servers.
  • the Interpreter 206 typically dynamically processes the dialog session speech application by combining the following information:
  • the call comes into a communications interface 102, i.e., the voice gateway.
  • the voice gateway 102 which may be implemented using commercial voice gateway systems available from such vendors as VoiceGenie, Vocalocity, Genisys and others, has several internal processes that include: o Interfacing the phone call into data used internal to the voice gateway 102. Typical input protocols consists of incoming TDM encoded or SIP encoded signals coming from the call. o Speech recognition of the audio that the caller speaks into text strings to be processed by the application. o Audio playback of files to the caller. o Text-to-speech of text strings to the caller o Voice gateway interface to an application server in either Voice XML or SALT
  • the voice gateway 102 interfaces with application server 103 containing web server 203, application web-linkage files, Interpreter 206, application project file 207, and session state file 210 (Fig. 2).
  • the interface processing between the voice gateway 102 and application server 103 loops for every turn of conversation throughout the entire dialog session speech application.
  • Each speech application is typically defined by the application project file 207 for a certain dialog session.
  • Interpreter 206 completes the processing for each turn of conversation , the session state is stored in session state file 210 and the file reference is stored in a session database 104.
  • the Interpreter 206 processes one turn of conversation each time with information from the voice gateway 102, internal project files 207, internal context databases and session state file 210.
  • Interpreter 206 may access external data sources 213 and services 105 including: o External databases o Web services o Website pages through web servers o Email servers o Fax servers o Computer telephone integration (CTI) interfaces o Internet socket connections o Other Metaphor speech applications
  • Fig. 2 shows the steps taken by Interpreter 206 in more detail:
  • the Application Interface 201 within communications interface 102 interfaces to Web server 203 within Application Server 202.
  • the Web Server 203 first serves back to the communications interface 102 initialization steps for the dialog session application from the Initial Speech Interface File 204. Thereafter, Application Interface 201 calls Web Server 203 to begin the dialog session application loop through ASP file 205, which executes Interpreter 206 for each tum of conversation.
  • Interpreter 206 gets the text of what the user says (or types) from Application Interface 201 as well as a service script Application Project File 207 and cunent state data from Session State File 210.
  • Interpreter 206 When Interpreter 206 completes the processing for one turn of conversation, it delivers that result back to Application Interface 201 through ASP file 205 and Web Server 203.
  • the result is typically in a standard interface language such as VoiceXML or SALT.
  • Speech Grammar Files 208 and Audio Files 209 which are then fetched through Web Server 203.
  • the voice gateway 102 plays audio for the user caller to hear the computer response message from a combination of audio files and text-to-speech and then the voice gateway 102 is prepared to recognize what the user will say next.
  • Interpreter 206 After Interpreter 206 returns the result, it saves the updated state data in
  • Session State File 210 may also log the results of that turn of conversation in Call Log File 211.
  • the entire Interpreter 206 loop is activated again to process the next rum of conversation.
  • Interpreter 206 will typically parse and interpret statements of script language and their associated properties in the script plan. Each of these statements may be either: o Dialog which specifies what to say to and what to recognize from the caller. The interpretation of a dialog statement will result in a VoiceXML, SALT or HTML output and control back to the voice gateway.
  • Flow control of the script that could contain conditional statements, loops or function calls or jumps.
  • the interpretation will execute the specified flow control and then interpret the next statement.
  • o External interface to a data source or data service to call control The interpretation will execute the exchange with the external interface with the appropriate parameters, syntax and protocol. Then the next statement will be interpreted if there is a return process in place.
  • o Internal state change The interpretation will execute the changed state and then interpret the next statement. o If either an 'exit' or the final script statement is reached, the Interpreter will cause the voice gateway to hangup and end the processing of the application.
  • the dialog application 218, also referred to as a Conversation Manager (CM)
  • CM operates in an integrated development environment (IDE) for developing automated speech applications that interact with caller users of phones 302, interact with data sources such as web server 212, CRM and Corporate Telephony Integration (CTI) units 213, PC headsets 306, and with live agents through Automated Call Distributors (ACDs) 304 in circumstances when the call is transferred.
  • IDE integrated development environment
  • CTI Corporate Telephony Integration
  • ACDs Automated Call Distributors
  • the CM 218 includes an editor 217, linker 215, debugger 300 and run-time interpreter 206 that dynamically generates voice gateway 102 scripts in Voice XML and SALT from the high-level design-scripting language described herein.
  • the CM 218 may also include an audio editor 308 to modify audio files 209.
  • the CM 218 may also provide an interface to a data driven device 220.
  • the CM 218 is as easy to use as writing a flowchart with many inherited resources and modifiable properties that allows unprecedented speed in development.
  • Features of CM 218 typically include:
  • Runtime debugger 300 is available for text simulations of voice speech dialogs.
  • JDBC and ODBC-capable databases including Microsoft SQL Server, Oracle, IBM DB2, and Informix; and interfaces including COM+, Web services, Microsoft Exchange and ACD screen pops.
  • CM 218 process flow for transactions either over the phone 302 or on a PC 306 are shown in the system diagram of Fig. 3.
  • the communications interface 102 i.e., voice gateway, picks up the call and maps the phone number of the call to the initial Voice XML file 204.
  • the initial Voice XML file 204 submits an ASP call to the application ASP file 205.
  • the application ASP file 205 initializes administrative parameters and calls the CM 218.
  • the CM 218 interprets the scripts written in the present script language using interpreter 206.
  • the script is an interpreted language that processes a series of dialog plans and process controls for interfacing to a user 100 (Fig. 1), databases 213, web and internal dialog context to achieve the joint goals of user 100 and virtual agent within CM 218.
  • the code processes a plan for a user 100 interface, it delivers the prompt, speech grammar files 208 and audio files 209 needed for one turn of conversation to a media gateway such as communications interface 102 for final exchange with user 100.
  • the CM typically generates Voice XML on the fly as it interprets the script code. It initializes itself and reads the first plan in the ⁇ start> script.
  • This plan provides the first prompt and reference to any audio and speech recognition speech grammar files 208 for the user 100 interface. It formats the dialog interface into Voice XML and returns it to the Voice XML server 310 in the communications interface 102.
  • the Voice XML server 310 processes the request through its audio file player 314 and text-to-speech player 312 if needed and then waits for the user to talk.
  • the user 100 is done speaking, his speech is recognized by the voice gateway 102 using the speech grammar provided and speech recognition unit 316. It is then submitted again to the application ASP file 205 in step 4. Steps 4 and 5 repeat for the entire dialog. 6.
  • CM 218 needs to get or set data externally it can interface to web services 212 and CTI or CRM solutions and databases 213 either directly or through custom COM+ data interface 320. 7.
  • An ODBC interface can be used from the CM 218 script language directly to any popular database.
  • call logging is enabled, the user audio, dialog prompts used may be stored in database 211 and the call statistics for the application are incremented during a session. Detail and summary call analyses may also be stored in database 211 for generating customer reports. Implementations of conversations are extremely fast to develop because the developer never writes any Voice XML or SALT code and many exceptions in the conversations are handled automatically. An HTML debugger is also available for the script language.
  • a computer program product that includes a computer readable and usable medium.
  • a computer usable medium may consist of a read only memory device, such as a CD ROM disk or conventional ROM devices, or a random access memory, such as a hard drive device or a computer diskette, having a computer readable program code stored thereon. While this invention has been particularly shown and described with references to preferred embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the invention encompassed by the appended claims.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Stored Programmes (AREA)

Abstract

L'invention concerne un système de gestion de dialogues vocaux. Dans ce système, chaque dialogue peut prendre en charge au moins un tour de conversation entre un utilisateur et un agent virtuel, au moyen d'une interface de communication ou d'une interface de données, ou de la combinaison de ces interfaces. Ce système comprend un ordinateur et un support lisible par ordinateur, fonctionnellement reliés à l'ordinateur, qui stocke des scripts et des informations de dialogue. Chaque script détermine la reconnaissance, la réponse, et la commande de flux dans un dialogue, tandis qu'une application exécutée sur l'ordinateur fournit un résultat à l'interface de communication, à l'interface de données, ou à la combinaison de ces interfaces, en fonction des informations de dialogue et d'une entrée d'utilisateur.
PCT/US2004/033186 2003-10-10 2004-10-08 Systeme, methode, et langage de programmation pour developper et pour executer des dialogues entre un utilisateur et un agent virtuel WO2005038775A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US11/145,540 US20060031853A1 (en) 2003-10-10 2005-06-03 System and method for optimizing processing speed to run multiple dialogs between multiple users and a virtual agent

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US60/510,699 2003-10-10
US60/578,031 2004-06-08
US10/915,955 US20050080628A1 (en) 2003-10-10 2004-08-11 System, method, and programming language for developing and running dialogs between a user and a virtual agent
US10/915,955 2004-08-11

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US10/915,955 Continuation-In-Part US20050080628A1 (en) 2003-10-10 2004-08-11 System, method, and programming language for developing and running dialogs between a user and a virtual agent

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US11/145,540 Continuation-In-Part US20060031853A1 (en) 2003-10-10 2005-06-03 System and method for optimizing processing speed to run multiple dialogs between multiple users and a virtual agent

Publications (1)

Publication Number Publication Date
WO2005038775A1 true WO2005038775A1 (fr) 2005-04-28

Family

ID=34465819

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2004/033186 WO2005038775A1 (fr) 2003-10-10 2004-10-08 Systeme, methode, et langage de programmation pour developper et pour executer des dialogues entre un utilisateur et un agent virtuel

Country Status (1)

Country Link
WO (1) WO2005038775A1 (fr)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2882837A1 (fr) * 2005-03-07 2006-09-08 France Telecom Procede et dispositif de construction d'un dialogue vocal
WO2007042401A1 (fr) * 2005-10-11 2007-04-19 International Business Machines Corporation Integration d'une application rvi au sein d'un serveur d'applications base sur des standards
EP1936607A1 (fr) * 2006-12-22 2008-06-25 Sap Ag Test sur les applications de reconnaissance vocale automatique
WO2016018763A1 (fr) * 2014-07-31 2016-02-04 Google Inc. Agents de conversation
US11574621B1 (en) * 2014-12-23 2023-02-07 Amazon Technologies, Inc. Stateless third party interactions

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6173266B1 (en) * 1997-05-06 2001-01-09 Speechworks International, Inc. System and method for developing interactive speech applications
US6513009B1 (en) * 1999-12-14 2003-01-28 International Business Machines Corporation Scalable low resource dialog manager

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6173266B1 (en) * 1997-05-06 2001-01-09 Speechworks International, Inc. System and method for developing interactive speech applications
US6513009B1 (en) * 1999-12-14 2003-01-28 International Business Machines Corporation Scalable low resource dialog manager

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
GAMM S ET AL: "The development of a command-based speech interface for a telephone answering machine", SPEECH COMMUNICATION, ELSEVIER SCIENCE PUBLISHERS, AMSTERDAM, NL, vol. 23, no. 1-2, October 1997 (1997-10-01), pages 161 - 171, XP004117216, ISSN: 0167-6393 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2882837A1 (fr) * 2005-03-07 2006-09-08 France Telecom Procede et dispositif de construction d'un dialogue vocal
WO2007042401A1 (fr) * 2005-10-11 2007-04-19 International Business Machines Corporation Integration d'une application rvi au sein d'un serveur d'applications base sur des standards
EP1936607A1 (fr) * 2006-12-22 2008-06-25 Sap Ag Test sur les applications de reconnaissance vocale automatique
US9640180B2 (en) 2014-07-31 2017-05-02 Google Inc. Conversational agent with a particular style of expression
US9418663B2 (en) 2014-07-31 2016-08-16 Google Inc. Conversational agent with a particular spoken style of speech
US9601115B2 (en) 2014-07-31 2017-03-21 Google Inc. Conversational agent with a particular spoken style of speech
WO2016018763A1 (fr) * 2014-07-31 2016-02-04 Google Inc. Agents de conversation
US9997158B2 (en) 2014-07-31 2018-06-12 Google Llc Conversational agent response determined using a sentiment or user profile data
US10325595B2 (en) 2014-07-31 2019-06-18 Google Llc Conversational agent response determined using a sentiment
US10726840B2 (en) 2014-07-31 2020-07-28 Google Llc Conversational agent response determined using a sentiment
US11423902B2 (en) 2014-07-31 2022-08-23 Google Llc Conversational agent response determined using a sentiment
US11900938B2 (en) 2014-07-31 2024-02-13 Google Llc Conversational agent response determined using a sentiment
US11574621B1 (en) * 2014-12-23 2023-02-07 Amazon Technologies, Inc. Stateless third party interactions

Similar Documents

Publication Publication Date Title
US20050080628A1 (en) System, method, and programming language for developing and running dialogs between a user and a virtual agent
EP1277201B1 (fr) Reconnaissance vocale par le web faisant intervenir des objets du type script et des objets semantiques
US8046227B2 (en) Development system for a dialog system
US20060230410A1 (en) Methods and systems for developing and testing speech applications
US8024422B2 (en) Web-based speech recognition with scripting and semantic objects
US20110106527A1 (en) Method and Apparatus for Adapting a Voice Extensible Markup Language-enabled Voice System for Natural Speech Recognition and System Response
US20060206299A1 (en) Dialogue flow interpreter development tool
US20100061534A1 (en) Multi-Platform Capable Inference Engine and Universal Grammar Language Adapter for Intelligent Voice Application Execution
US8457973B2 (en) Menu hierarchy skipping dialog for directed dialog speech recognition
EP1936607B1 (fr) Test sur les applications de reconnaissance vocale automatique
US20060031853A1 (en) System and method for optimizing processing speed to run multiple dialogs between multiple users and a virtual agent
EP1382032B1 (fr) Reconnaissance vocale basee sur le web faisant intervenir des scripts et des objets semantiques
US20050132261A1 (en) Run-time simulation environment for voiceXML applications that simulates and automates user interaction
WO2005038775A1 (fr) Systeme, methode, et langage de programmation pour developper et pour executer des dialogues entre un utilisateur et un agent virtuel
Larson W3c speech interface languages: Voicexml [standards in a nutshell]
Pearlman Sls-lite: Enabling spoken language systems design for non-experts
AU2010238568B2 (en) A development system for a dialog system
Al-Manasra et al. Speech-Enabled Web Application “Case Study: Arab Bank Website”
Ali Framework and implementation for dialog based Arabic speech recognition
Dunn Speech Server 2007
Zhuk Speech Technologies on the Way to a Natural User Interface
McTear et al. Dialogue Engineering: The Dialogue Systems Development Lifecycle
AU2003257266A1 (en) A development system for a dialog system
Liu Building complex language processors in VoiceXML.
Dolezal et al. Feasibility Study for Integration ASR Services for Czech with IBM VoiceServer

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

WWE Wipo information: entry into national phase

Ref document number: 11145540

Country of ref document: US

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWP Wipo information: published in national office

Ref document number: 11145540

Country of ref document: US

122 Ep: pct application non-entry in european phase