WO2001067241A1 - Virtual assistant engine - Google Patents

Virtual assistant engine Download PDF

Info

Publication number
WO2001067241A1
WO2001067241A1 PCT/US2001/006882 US0106882W WO0167241A1 WO 2001067241 A1 WO2001067241 A1 WO 2001067241A1 US 0106882 W US0106882 W US 0106882W WO 0167241 A1 WO0167241 A1 WO 0167241A1
Authority
WO
WIPO (PCT)
Prior art keywords
virtual assistant
user
application
engine
string
Prior art date
Application number
PCT/US2001/006882
Other languages
French (fr)
Inventor
Richard M. Ulmer
Edward Peebles
Derek Sanders
Original Assignee
Conita Technologies, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Conita Technologies, Inc. filed Critical Conita Technologies, Inc.
Priority to AU2001241965A priority Critical patent/AU2001241965A1/en
Publication of WO2001067241A1 publication Critical patent/WO2001067241A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/487Arrangements for providing information services, e.g. recorded voice services or time announcements
    • H04M3/493Interactive information services, e.g. directory enquiries ; Arrangements therefor, e.g. interactive voice response [IVR] systems or voice portals
    • H04M3/4938Interactive information services, e.g. directory enquiries ; Arrangements therefor, e.g. interactive voice response [IVR] systems or voice portals comprising a voice browser which renders and interprets, e.g. VoiceXML
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M2201/00Electronic components, circuits, software, systems or apparatus used in telephone systems
    • H04M2201/39Electronic components, circuits, software, systems or apparatus used in telephone systems using speech synthesis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M2201/00Electronic components, circuits, software, systems or apparatus used in telephone systems
    • H04M2201/40Electronic components, circuits, software, systems or apparatus used in telephone systems using speech recognition

Definitions

  • the present mvention relates to a computer-based, virtual assistant engine.
  • a virtual assistant is a computer application that allows the mobile professional to access personal, company, and public information, including contacts, schedules, and databases from any interactive device, such as telephone.
  • virtual assistant applications were hardcoded.
  • a monolithic program written in the C++ programming language for example, would implement all ofthe functions of and interfaces with the virtual assistant. Examples of such a virtual assistant application is described in U.S. Patent No.
  • the present invention solves this problem by defining a virtual assistant application in terms of discourses, grammars, event handlers, and other components that instruct a virtual assistant engine as to how to execute ofthe application.
  • This advantageously permits integration ofthe virtual assistant with other commercially available applications, including messaging applications and database management applications.
  • the VA Application can be easily modified to satisfy specific requirements of a user.
  • the present invention relates to a virtual assistant system with many discrete features, each of which comprises a separate but related invention.
  • a virtual assistant engine for running a virtual assistant application, comprised of an interpreter for parsing, storing in a computer memory, and executing virtual assistant definition language source code for a virtual assistant application, a scripting object that provides methods and properties for creating a virtual assistant application and an abstraction layer for interfacing with a speech recognition server, telephony hardware and a text to speech server, wherein the scripting object provides the interface between the abstraction layer and the virtual assistant application.
  • the interpreter is comprised of a parser for parsing the virtual assistant definition language source code and storing the parsed virtual assistant definition language source code in the computer memory.
  • the parser is constructed using the Purdue Compiler Constructor Tool Set.
  • the interpreter is further comprised of a state machine for executing the stored virtual assistant definition language source code.
  • the state machine determines the tasks to be performed by the virtual assistant application responsive to input from the user.
  • the state machine also manages a barge-in commands received from the user.
  • the state machine manages external events responsive to output from the virtual assistant application, the output indicating that an external event has occurred, and is configured to cause the user to be notified ofthe occurrence ofthe external event. Examples of external events of which the virtual assistant user is notified are receipt of a telephone call, placing a telephone call, receipt of an electronic message, a meeting reminder, a task reminder, a change in a database and a change in monitored information.
  • the interpreter is further comprised of: a scripting host object, and a scripting engine, whereby the scripting host object interfaces with the scripting engine.
  • the scripting engine executes scripts written in a scripting language, such as VBScript, JavaScript, Perl, REX and Python.
  • the interpreter is further comprised of a session object, which manages telephone calls to and from a virtual assistant application user.
  • the session object is comprised of a call state manager for tracking the status, for example connected, on hold and in conference, of a telephone call to or from the virtual assistant application user.
  • the session object is further comprised of a call object, for managing calls to the virtual assistant user from the virtual assistant and to the virtual assistant from the virtual assistant user.
  • the session object also is configured to generate and store in the computer memory a log of information about a virtual assistant application user session.
  • the information log includes information about call statistics (for example, call duration, DNIS number, ANI number), call counters and call transcription (commands issued by the user and responses from virtual assistant.).
  • the interpreter is further comprised of a discourse manager, which activates the appropriate discourse responsive to input from the virtual assistant application user.
  • the discourse manager also activates the appropriate grammar responsive to the active discourse.
  • the scripting object is configured to provide output to the user asynchronously. Such output is comprised of rendering text into speech or playing recorded .prompts.
  • the scripting object is further comprised of a management interface, which is configured to generate and store in the computer memory a log of information about virtual assistant application errors.
  • the management interface also is configured to enable the management and configuration of a virtual assistant system by a system administrator.
  • the scripting object is further comprised of an interface for managing dynamic grammars.
  • the management of dynamic grammars is comprised of, creating a user specific grammar when a virtual assistant application user session begins, storing the user specific grammar in the computer memory for use by a user during the user session, and deleting the user specific grammar from the computer memory when the user session ends.
  • the user specific grammar is generated from a user- specified database.
  • the scripting object is further comprised of a state machine interface for controlling a state machine for managing external events, and a call management interface for controlling a session object, which manages telephone calls to and from a virtual assistant application user.
  • the abstraction layer is comprised of a speech recognition module, a telephony module and a text to speech module.
  • the speech recognition module is comprised of an interface between the scripting object and the speech recognition server.
  • the telephony module is comprised of an interface between the scripting object and the telephony hardware, such as, an adapter for allowing electronic communication between the virtual assistant application and the virtual assistant user and a conference call adapter.
  • the text to speech module is comprised of an interface between the scripting object and the text to speech server.
  • FIG. 1 is an overview of the virtual assistant (VA) of the present invention
  • FIG. 2 is a diagram ofthe VA Server
  • FIG. 3 is a diagram ofthe VA Studio
  • FIG. 4 is a diagram ofthe VA Engine conceptual model
  • FIG. 5 is a diagram ofthe VA Manager conceptual model
  • FIG. 6 is a screen shot ofthe Microsoft Management Console for managing the VA Server Manger
  • FIG. 7 is a screen shot of a web page that uses Active Server Pages to manage the VA Server Manager
  • FIG. 8 is a diagram ofthe component relationships of a VA Server Set
  • EIG. 9 is a diagram of a relatively small VA system
  • FIG. 10 is a diagram of a large VA system
  • FIG. 11 is a diagram of a very large VA system
  • FIG. 12 is a diagram of a VA discourse
  • FIG. 13 is a diagram ofthe VA Discourse/Grammar Model
  • FIG. 14 is a screen shot ofthe Main Application Window
  • FIG. 15 is a screen shot ofthe Topic View Window
  • FIG. 15 is a screen shot ofthe Task View Window
  • FIG. 16 is a screen shot ofthe Prompt Properties Dialogue Box
  • FIG. 17 is a screen shot ofthe an Expanded Tree View ofthe Menu_Error Prompt Group.
  • FIG. 18 is a screen shot ofthe Prompt Group Properties Dialogue Box.
  • the virtual assistant ofthe present invention allows the mobile professional to access personal, company, and public information, including contacts, schedules, and databases from any interactive device, such as telephone.
  • the virtual assistant (“VA”) system ofthe present invention is comprised of two main components: (1) the VA Server, which is built on a Windows NT telephony server platform, and (2) the VA Studio, which allows skilled information technology professionals to develop VA applications that interface with electronic messaging systems, such as Microsoft Exchange and Lotus Notes.
  • the VA Server is a component ofthe Service Deployment Environment (“SDE”), which is discussed in more detail below.
  • the VA Studio is a component ofthe Service Creation
  • SCE SCE Environment
  • the VA Server 10 is comprised of a human interface 12 and a network interface 14 for handling calls and providing automated access to information to corporate 28, private 30 and pubic 32 information repositories and sources.
  • the human interface 12 is comprised of a graphical user interface 22, which may be a web browser, a subscriber (or user) voice user interface 24, generally accessed by a telephone, and a public voice user interface 26.
  • the virtual assistant allows a user to use a voice interactive device, such as a telephone, either wired or wireless, to access and update such information.
  • the VA Server also manages all incoming communications by sorting, prioritizing, and filtering such communications, while providing notice to the user of important messages and events.
  • a core component ofthe VA Server 40 is the voice- enabled Virtual Machine 42, which is also referred to as the VA Engine.
  • the VA Engine receives spoken commands, interprets and executes them.
  • the VA Engine supports a COM interface 44, which in turn enables VA applications to provide voice access to network applications.
  • the VA Engine also supports a telephony interface 46 to voice messaging 52 and private branch exchange systems 54, enabling third-party systems to be integrated with the VA Server.
  • the VA Server conforms to Windows NT telephony and speech interface specifications.
  • the voice-messaging interface 56 supports the VPIM (Voice Profile for Internet Mail) standard, and provides a gateway between proprietary voice messaging systems and VA Server.
  • VPIM Voice Profile for Internet Mail
  • the VA system management services provide operations, administration and maintenance capability (OA&M) 60.
  • OA&M operations, administration and maintenance capability
  • the OA&M applications also provide a Simple
  • SNMP Network Management Protocol
  • the VA Server is operable on Windows NT Server, release 4.0 or higher, in both single and multiprocessor configurations.
  • Windows NT Server release 4.0 or higher
  • the VA Server can be ported to other computing platforms. Multiple systems may be clustered together to support higher system workloads and fail-safe operation.
  • the VA Application Suite in the preferred embodiment, is compatible with a messaging server 62, such as Microsoft Exchange/Outlook.
  • the VA's architecture advantageously permits integration with other commercially available and customized messaging applications.
  • the VA Application can be easily modified to satisfy specific requirements of a user.
  • the basic functions ofthe VA Application include: Messaging - voice-mail, e-mail, and faxes
  • Call Control remote users to perform conference calling and call management; notification and forwarding features allow remote users to be contacted immediately by phone/pager when they receive specific voice-mails, e-mails, faxes, or pages
  • Internet Applications users can access and internet via an internet server 64 and obtain public information such as weather, travel, financial, competitive data and news
  • Intranet Applications - users can remotely access information contained on a corporate network (inside the company firewall) using the VA, for example, customer data, shipping and inventory information, sales reports, and financial data, or any information on a database server 66, including SQL databases such as Oracle or Informix.
  • CRM customer relationship management
  • the VA Studio 80 is comprised of a grammar generator 82 and a publishing toolkit 84.
  • the VA Studio allows a user to create, modify and debug applications that run on the VA Server 40 without requiring the user to be skilled in the complexities ofthe underlying components ofthe VA Server, such as the speech recognition engine, text to speech engine, switch control and unified messaging.
  • VA Studio employs a graphical user interface (GUI) application that runs on a Windows NT workstation. It allows developers to create projects, each of which defines a VA application.
  • GUI graphical user interface
  • VA Studio is a multiple document interface (MDI) application that follows the workspace-based model.
  • VA Studio follows the Microsoft Component Object Model (COM).
  • VA applications are developed using Active Scripting languages such as VBScript and JavaScript, thus enabling integration with a variety of third party components.
  • the VA applications created with the VA studio will include voice query to SQL databases, message stores, business logic and mainframe applications.
  • VA applications are composed of discourses and resources. Discourses are the context of conversations between a user and the VA. Resources are items like voice prompts and dictionaries. A developer can utilize the VA Studio Wizard to generate a "skeleton" VA application template. Application templates consist of packages of predefined discourses and resources. Discourses are the context of conversations between a user and the VA. Resources are items like voice prompts and dictionaries. Once a VA application template is generated, the application is further customized using any supported Active Scripting languages.
  • VA Studio After writing the VA application, it is then submitted to the build process.
  • VA Studio checks for dialog errors, builds a master intermediate grammar and builds a master lexicon. Once compiled and error-free the application is ready to be published.
  • the VA Server allows a scripted application to access services such as voice mail, databases, and telephony equipment.
  • a VA application is created, modified, debugged and tested using the VA Studio.
  • the completed application is then automatically installed and configured to run on the VA Server, which enables the VA application to take incoming calls and provide access to both public and private information.
  • a VA application allows a user to manage electronic communications and access his or her business's computer resources through a telephone. Using speech recognition and text-to-speech technology, the VA communicates with callers in spoken English. By calling into the VA on a standard telephone, a user can perform functions such as the following:
  • the VA can perform many ofthe functions of a personal secretary, such as the following:
  • the VA performs the above functions by interfacing with a company's Microsoft Exchange server.
  • This application allows users to use their desktop Outlook software over the telephone.
  • the VA software includes a development platform (the SCE) and run-time platform (the SDE), which can host a variety of different VA's.
  • the SDE provides the core components necessary for the functionality of a VA: a telephony interface, speech recognition facilities, a text-to-speech engine, interfaces with databases and mail servers, and an administrative framework in which the assistant applications will run.
  • the SCE also includes development tools that programmers can use to create custom VA applications.
  • the VA Platform consists of three main components: • The Service Deployment Environment (SDE)
  • SDE Service Deployment Environment
  • the function of each of these components can be understood using a World Wide Web analogy.
  • the SDE functions like a web server, providing connections with the network and telephone system, controlling the execution of VA applications, and providing resources such as text-to-speech and voice recognition engines that will be accessed by the applications that run on it.
  • VA applications are analogous to web pages, determining the content that will be presented and controlling the interactions with the user.
  • a VA application uses scripting languages such as VBScript, JavaScript, and Perl, so that developers can add significant functionality to a VA, such as performing mathematical calculations, processing text, and calling ActiveX and COM objects.
  • the SCE is the development environment used to create the VA applications.
  • the main component ofthe SCE is the VA Studio application, which is based on the
  • the SCE also includes a set of COM objects that can be used in applications to perform functions such as checking email, reading from a database, and manipulating sound files.
  • the Service Deployment Environment consists of eight processes that run simultaneously and perform the functions necessary to support a VA application.
  • each of these SDE components runs as a Windows NT Service or background process.
  • VA Voice over IP
  • the components can be distributed across several servers and communicate over the network. Such distribution can allow, for example, one server to be dedicated to performing voice recognition functions while another supports the VA Engine that actually runs the applications. When multiple VA components are distributed across multiple machines, these machines are collectively termed a VA server set.
  • the VA Engine 100 is the virtual machine on which a VA application 102 runs. Based on the application's instructions, the VA Engine uses its telephony interface 104 to communicate with the user 106 and its speech interface 110 to recognize speech into text and translate text into speech. The VA Engine connects to an Active Scripting Engine 112 to execute the scripts contained in the VA application, and it also communicates with administrative processes such as the VA Server 114 and VA Manager 116.
  • a VA Engine process can support user interaction over only one telephone line, but multiple VA Engines can be run simultaneously on a single platform. If the VA platform is connected to more than one telephone line, then a separate VA Engine will be running for each incoming line.
  • the Text-to-Speech (TTS) Server receives text from other components, translates it into speech (that is, into a sound file), and returns it to the requesting component.
  • This speech translation service is isolated in a separate component to improve performance and to allow for TTS vendor-independence.
  • the preferred embodiment uses the Acu Voice TTS system, but the platform can be easily modified to support a
  • TTS engine from a different vendor. Only the TTS Server component would have to be modified for such a customization, not the entire platform.
  • Multiple VA Engines can use the same TTS Server process, and more than one TTS Server can be running at the same site, allowing translation services to be distributed across multiple machines for load-balancing.
  • the Recognition Server 122 receives sound files from other components, attempts to recognize them as speech, and returns the recognized text.
  • the Recognition Server is a component that isolates speech-recognition functions from the rest ofthe VA platform.
  • the server provides an interface to a third-party voice recognition engine (in the preferred embodiment, Nuance) that can be changed to a different vendor's brand without requiring the entire VA platform to be modified.
  • Multiple VA Engines can use the same Recognition Server process, and more than one Recognition Server can be running simultaneously.
  • the Recognition Server process requires three additional processes to be running:
  • the Resource Manager is a management process that automatically load-balances requests when more than one instance ofthe Recognition Server is running. Rather than making recognition requests to a particular Recognition Server, the VA Engine makes the request to the Resource Manager, which forwards it to the first available Recognition Server.
  • the Compilation Server The Compilation Server compiles dynamic grammars.
  • the License Manager The License Manager server runs continually in the background and dispenses- licenses to all requesting components. Only one license manager need run in a single server set, but no Recognition Server components can launch unless the license manager is already running. In the preferred embodiment, all ofthe sub-processes ofthe Recognition
  • the VA Server • The VA Server 114 performs persistent VA functions that occur even when no user is connected to a VA application. These functions include the following:
  • VA Server Only one VA Server can run on a system, but a single VA Server can provide persistent services to multiple VA Engines running both locally and on remote systems.
  • each system that is running one or more VA components should also be running the VA Manager application 116.
  • This application creates and monitors all VA components that are active on the system, and it provides management interfaces that are used for the following purposes:
  • the VA Manager provides the interface through which the VA Server Manager 130 communicates with all systems in use at the site.
  • the VA Server Manager 130 provides a single point of control for all ofthe processes and servers being used in a VA server set. It communicates with the VA Manager 116 running on each VA server in the set and, through this interface, allows an administrator to use a single system to manage the entire site. There are two ways an administrator can connect with the VA Server Manager application:
  • the VA software includes an MMC snap-in component 140 that allows the VA Server Manager services (and, thereby, the entire VA site) to be managed from the Microsoft Management Console application.
  • MMC Microsoft Management Console
  • the VA software also includes an administrative web page 142 that uses Active Server Pages to interface with the VA Server Manager service, allowing an administrator to manage the site through a standard web browser.
  • the VA Server Manager 130 monitors all ofthe VA components (such as Recognition Servers 132, -TTS Servers 134, and VA Engines 136) running on all the systems within the server set, and it can be configured to page the system administrator with an alert if components fail or other system-critical events occur.
  • the VA Server Manager process uses a Microsoft MSDE database to store configuration parameters and platform logs.
  • the MSDE database engine is installed automatically as part ofthe VA platform install, and the required tables and initial data are created during the installation routine.
  • the VA Server Manager uses a COM object called DBManager to communicate with the database. This object is created automatically at start-up by the
  • VA Server Manager and provides a set of application programming interfaces (API's) that other VA components can use to retrieve configuration information and log data.
  • API's application programming interfaces
  • the DBManager object automatically handles version checking, database restoration, and other database management functions.
  • the VA Web Server The VA Web Server
  • the VA platform uses a
  • VA Logging Tool is used by the administrator to view and manage system logs.
  • the VA software uses a set of shared directories for storing files necessary for platform operations. In a multi-server implementation, these shares are stored on a central server (the same server that hosts the VA Server Manager process) and can be accessed by all the systems in the server set.
  • the shared directories used by the VA platform are described in the table below. Table 1-1 : VA Platform Shared Directories
  • %conitava% ⁇ V Logs Used to store application logs %co ⁇ -tt v ⁇ % ⁇ VAUsers Used to store information about VA users %conitava%W T AUiterances Used to store temporary sound files containing the commands spoken by VA users * %conitava% represents the base path under which the VA platform software was installed. By default, this path is c: ⁇ Program Files ⁇ Conita Virtual Assistant. VA Platform Configurations
  • the service processes that make up the VA platform either can be run on a single server (a VA platform server) or can be distributed across multiple servers (a VA platform server set).
  • a single-server implementation is adequate for small companies that need to support only a few incoming VA calls at a time.
  • a server set implementation will be necessary for load balancing.
  • the Server Set Controller Node 150 As the platform's primary server, the Server Set Controller Node will host the following components:
  • the IIS web-server • Shared directories that will be used by all the servers in the server set to store logs, utterance files, application files, and user information
  • Each secondary node 160 in the set will host one or more instances of VA Engines 162, TTS Servers 164, and/or Recognition Servers 166. These processes will be monitored by a VA Manager process 170 on each server, which will in turn communicate with the VA Server Manager 172 on the Server Set Controller Node
  • the ' lone server is configured as the controller node, hosting the database, web-server, and VA Server Manager process along with all other VA services.
  • VA Voice over IP
  • All the VA components can be run on a single server 180.
  • Such a site could support several incoming telephone lines 182, allowing up to multiple instances ofthe VA application to be running simultaneously.
  • VA components can be distributed across multiple systems.
  • a medium-sized company may, for instance, use a six-server rack 184, with two ofthe servers running VA Engines 186a, 186b, two servers running Recognition Servers 190a, 190b, one running VA Servers 192, and one running TTS Servers 194.
  • a large organization may require even more scalability.
  • the site may use upwards of eight systems for VA Engines 202, sixteen for Recognition Servers 204a, 204b, four for VA Servers 206, and four for TTS Servers 206.
  • the duties ofthe VA Virtual Assistant platform administrator include the following tasks:
  • VA management interfaces to manage the systems in the server set, start and stop the VA services, and run VA applications
  • VA Virtual Assistant
  • a Virtual Assistant (“VA”) application is a set of scripts and resources that leads callers through an interaction with a virtual assistant.
  • the application provides the functions available to the user from the beginning of a call until its termination.
  • a single VA application may, for example, allow the caller to check email messages, check voice mail messages, and look up a phone number in a computerized address book.
  • Each VA application is made up of smaller units called Discourses, which is illustrated in Figure 12.
  • Discourses Each of these units can be thought of as a single interaction or conversation between the user and the Virtual Assistant. This interaction may be as simple as the assistant's asking a question ("Are you sure you want to quit?") and the user's giving a response ("yes” or "no"). Or, the interaction may be more complicated, such as an email-checking discourse in which the assistant responds to several different types of user commands (e.g. "read next message” and "delete last message”).
  • Each discourse can be linked to other discourses, which will become activated when a user gives a particular command or a system event occurs.
  • a "Main Menu” discourse 210 may provide the user with a list of available options.
  • the application will move from the "Main Menu” discourse to the "Check Email” discourse 212.
  • Each discourse is associated with a Grammar component, which is a list of the words and phrases that need to be recognized by that discourse. Depending on the functions performed by the discourse, its grammar may be simple or complex.
  • phrases included in a grammar are linked to actions that will be taken by the discourse when the user speaks a particular phrase. If the virtual assistant recognizes the user's saying, "Read next message,” in the "Check Email” discourse, it would read the next email message to the user by translating the email text to speech.
  • Figure 13 shows the basic relationship between discourses 220 and grammars 222 within a VA application.
  • the arrow 224 points from the grammar to the discourse (instead ofthe other way around) because the grammar is the driving force behind the virtual assistant's interaction with the user.
  • the virtual assistant engine attempts to recognize the phrase and looks it up in the grammar for the current discourse. If the phrase is found in the grammar, the engine calls the appropriate piece of code within the discourse to handle the required action.
  • the virtual assistant engine recognizes the phrase, looks it up in the grammar, and finds the trigger
  • a discourse can also perform predetermined actions when particular events occur. These events maybe related to the interaction with the user, such as a speech-recognition error
  • RecognitionError may be system events unrelated to the caller's activity, such as the launching ofthe virtual assistant application on the server (ApplicationEntry).
  • a virtual assistant application developer can use these events in a number of different ways. Each time a speech-recognition error event occurs, for example, a piece of code in the discourse can be triggered that increments a running count ofthe number the errors that have occurred in a particular session or over a predetermined amount of time. If this count exceeds a preset limit (suggesting, for example, that the phone connection is poor), the virtual assistant can terminate the call. Similarly, the developer can write a piece of code that is trigged by the ApplicationEntry event and loads whatever initialization data is necessary to prepare the virtual assistant application for incoming calls.
  • a resource allows a discourse or event handler to reference components contained externally. These components include Phrase List and Script Modules, which a discourse uses to import pre-defined grammar entries and scripts, and Prompt Groups, which a discourse uses to incorporate voice output defined externally. All
  • Resources are defined globally, so once a resource is defined it can be use from within any discourse in the application. With resources, a developer can both increase the speed of application development (by reusing grammar and code components) and improve the sophistication ofthe interface, such as a voice interface, by using recorded and dynamic prompts.
  • a developer When creating an application, for example, a developer might write a piece of code that sends an email message though the user's SMTP (simple mail transport protocol) account. Rather than rewriting this code in each discourse that needs to send an email message, the developer can place the code in a script module resource and reference that resource in each discourse that needs it.
  • the script module resource can also be saved and used later in other VA applications.
  • Prompt group resources similarly, allow a developer to define a particular piece of output and use it repeatedly throughout the application.
  • a developer can, for example, record a live person speaking the phrase "I did not understand you” and play the sound clip whenever an unrecognized command is spoken by a caller.
  • Prompt Groups can also be used to create dynamic output that selects from and combines various voice segments.
  • a dynamic prompt can change the output depending on the person's gender (e.g. "He is not available” or "She is not available”).
  • He is not available or "She is not available”
  • a developer can record several different versions ofthe same message and let the system choose among them. In this way, instead of always saying "I did not understand you” when a user speaks an unrecognized command, the application can say “I did not understand you”, “I do not know that command", or "That is not one ofthe commands I recognize.”
  • Virtual Assistaixt Definition Files The discourses, grammars, event handlers, and other components that constitute a virtual assistant application are saved in a Virtual Assistant Definition (.vad) file, which is published to the VASiteManager and instructs the platform how to execute the application.
  • .vad Virtual Assistant Definition
  • the .vad file is to the virtual assistant platform as an HTML or Active Server Page (.asp) is to a web server.
  • the .vad file for a complex VA application may be hundreds of pages in length.
  • a file for a very simple "hello world" application would be only a few lines of code.
  • a developer could, in theory, write a virtual assistant application by creating a .vad file in a text editor and copying it to the virtual assistant server platform.
  • the Service Creation Environment simplifies the application- writing process.
  • the SCE's VAStudio application provides a GUI interface (similar to Microsoft's Visual Studio) in which a developer visually lays out the various application components. Using VAStudio tools, the developer can then compile the application, build a .vad file, and publish it to the VA platform.
  • Directory The complete path ofthe directory in which the project files are to be stored. The entire path can be entered in the text field or the Browse button can be clicked and the path selected from a standard Windows file dialogue.
  • the new application project will be created, and, as shown in Figure 14, the main application window 230 will be displayed in the workspace.
  • the first step when writing a VA application is to create the first discourse.
  • the Discourse View window When performing any of these three actions, the Discourse View window will appear in the main workspace.
  • the Discourse View window consists of two panes: Topics 234 and Event Handlers 236. Listed in these panes will be the topics and event handlers that have already been added to the discourse being edited. Both of these panes will be empty for a new discourse.
  • the Event Handler View window will open, but it will be empty because we have not yet added any code.
  • the greeting can be a simple one.
  • vavm.TTSString "Welcome to XYZ Incorporated. Please say the name of the department you wish to contact. To speak to an operator, say operator. "
  • This code uses the vavm object's TTSString API to read a text string over the phone to the caller.
  • TTSString API to read a text string over the phone to the caller.
  • the VA application will greet the user when a call is received, but it has no mechanism for responding to the user's spoken command. This type of functionality is performed within a discourse's topics. The next step in creating a
  • Virtual Operator application is to add a topic that listens for main menu commands.
  • a drop down menu will open.
  • Topic View window 240 which is illustrated in Figure 15, will open. This Topic View window contains the following three panes:
  • Phrase List The list of spoken phrases that will be recognized by this topic
  • Tasks The actions that will be taken based on the phrase(s) spoken by the user
  • Event Handlers The actions that will be taken when particular events occur, such as user inactivity and recognition errors. •Phrase Lists
  • phrase ⁇ Tag Value> .
  • Phrase indicates the phrase to be listened for. When this phrase is heard, the
  • Recognition object's Tag will be set to the specified Value. Note: Each entry in a phrase list should end with a period.
  • the Recognition (or rec) object is the mechanism by which the speech recognition engine communicates with the VA application.
  • the speech recognition engine When the user speaks into the phone, or other input device, the sound is sent to the speech recognition engine, which attempts to "recognize” it—that is, to resolve it into a text string.
  • the VA virtual machine compares the recognized string to the entries in the phrase list. If the string is found in the list, the virtual machine sets a tag in the rec object to the value specified in the phrase list and executes the appropriate task for the phrase.
  • phrases the topic needs to listen for, and how to assign the rec object's tags when each phrase is heard, should be defined.
  • the rec object does not have pre-defined tag names that a developer should use; a tag can be created with any name, provided that it is a single word beginning with a capital letter.
  • a developer could have used the tag "Action”, “Trnsfr”, or even "Blue” instead of "Transfer”.
  • the values assigned to the tags can be any string the developer desires. Though two ofthe tag values in the above list match the phrase being listened for, the last one ("accnt") has been abbreviated. Tags and values are simply flags that are set by a developer for later use in the Tasks fields of his or her application. •Phrase List Operators In the sample phrase list above, all the phrases are single words. List entries, though, can contain multiple words joined by operators. The operators that can be used within a phrase list entry are listed in the table below:
  • This entry specifies that if the caller says either "human resources” or "personnel", the rec object's Transfer tag will be set to "hr". Similar to the or operator is the optional operator ('?'). This operator specifies that the word it precedes is not required. Suppose, for example, that some callers requested the sales department by saying "sales” while others used the term "product sales”. In the phrase list, ad developer could have the engine recognize either version by using a combination of and and or operators: ((product sales)
  • sales) ⁇ Transfer "sales"> .
  • the Phrase List pane 242 on the Topic View window is a text editor that allows a developer to enter and edit the various phrases to be recognized by the topic. To enter a new phrase, simply click inside the pane to activate the cursor, position the cursor to the end ofthe list, and type the new phrase list entry. A developer can also edit existing phrases, as with any text editor. It should be noted that the phrase list pane supports standard Windows text editor functions such as cut, paste, and undo. For example, for the Virtual Operator application, the list of phrases that will be recognized by the MenuResponses topic needs to be defined. Click inside the
  • MenuResponses topic has its list of recognized phrases defined, the application needs to be configured to take the proper actions when the phrases are spoken by the caller. This functionality is implemented in bits of script code called tasks.
  • Tasks are the basic building block for the actions that will be taken by a VA application when a particular command is given by the user. Each task has two parts: • The trigger: The trigger is the event that causes the task to be performed.
  • An action is a set of instructions that the topic performs when a particular trigger is activated. These instructions are written in standard Windows scripting languages such as VBScript or JavaScript. (For the examples in this chapter, VBScript will be used.) •Defining a Task A new task is defined for a topic, perform the following steps:
  • a new task will be empty of functionality when first created.
  • To edit the task double click on its name in the Tasks pane. As shown in Figure 15, the Task View window 250 will be opened in the workspace.
  • the Task View window contains two panes, Trigger 252 and Action 254.
  • Both these panes act as text editors, allowing a developer to type the free-form text that will be the trigger and the Action for the task.
  • the first step when creating a new task is to define its trigger.
  • This definition causes the virtual assistant to execute the task whenever the rec object's Tag tag is equal to the specified value. For example, the first task needed to implement for the Virtual Operator's
  • MenuResponses topic is to connect the caller to the operator when he or she says
  • the action for a particular task is defined as a series of scripting-language commands.
  • VBScript is used for the command scripts, but any standard Windows scripting language such as JavaScript could also be used.
  • the action the application is to perform in the Connect_To_O ⁇ erator task is straightforward: transfer the user's call to the operator's extension. This function is performed by calling the vavm object's TransferfJ API.
  • the API call has the following syntax:
  • PhoneNumber is the string containing the number to which the call should be transferred. For example, if the number ofthe XYZ Corporation switchboard is 555-
  • the virtual machine When this command is executed, the virtual machine will transfer the call to the specified number. When this occurs, the user will no longer be connected to the VA application, so the application will reset and wait for the next call.
  • new tasks can be added to handle the other menu functions (i.e. transferring the caller to a specified department). This is done by performing the above described, but using the rec tags as triggers for the task as defined in the phrase list.
  • the virtual machine sets the rec object's tag as instructed by the phrase's entry in the phrase list. Then, it scans the topic's task collection, comparing the trigger for each task against the rec object. For each task found whose trigger condition evaluates to true, the virtual machine will execute the script contained in that task's action definition. > For example, when the Virtual Operator's Main Menu discourse is executed, the application will call the DiscourseEntry event and play the prompt, "Welcome to XYZ Incorporated. Please say the name ofthe department you wish to contact. To speak to an operator, say operator.” The application will then wait for the user to speak and try to recognize the sound. If the phrase "operator" is heard, the virtual machine will set the rec object's Transfer tag to "operator" and scan the
  • the virtual assistant application has been configured to answer an incoming call, listen for a spoken department, and transfer the caller. However, there is no mechanism to handle a caller's saying the name of a department that is not included in the phrase list or to take actions if the caller says nothing for an extended period. These situations are addressed using event handlers.
  • a RecognitionError event occurs any time the user speaks a phrase that is not defined in a topic's phrase list and therefore is not understood by the speech recognition engine. Such an event would occur if, for example, the caller says a department name (such as "maintenance") that has not been defined in the phrase list for the MenuResponse topic. Similarly, if the caller mumbles a command or there is excessive background noise, the application may not recognize the speech and will fire a RecognitionError.
  • a RecognitionError event handler For the application to take an action when a recognition error occurs, a RecognitionError event handler should be inserted into the MainMenu discourse and a script should be added that will be executed when such an event occurs. This can be done by performing the following steps:
  • Event Handler View window add the following code: vavm.TTSString "I'm sorry. I did not understand that response . Please say the name of the department you wish to contact. To speak to an operator, say operator.”
  • the application When the user says a phrase that cannot be recognized, the application will play the above message and wait for the user's next command.
  • a second type of occurrence that a virtual assistant application should handle is Inactivity events. These events occur when there is no voice input from the caller for a specified amount of time (this period is variable and can be set by the application developer). For example, if the caller to the Virtual Operator says nothing for five seconds, the application may need to prompt the user for input. Such silence may indicate that the user is confused about the menu options and does not know what to say, so the prompt should give some direction about what to do next.
  • an Inactivity handler is defined by inserting an entry into the discourse's Event Handlers pane and the handler is then edited to add the VBScript to be executed when the event occurs.
  • P ⁇ r ⁇ meterN ⁇ me indicates the name ofthe parameter to be set, and Value indicates the value to which it should be set.
  • Value indicates the value to which it should be set.
  • the inactivity timer at the start ofthe Main Menu discourse. This can be accomplished by inserting the SetParameter call into the DiscourseEntry event handler for the Main
  • DiscourseStart event can be used to streamline the prompts presented to the user.
  • DiscourseEntry occurs only once during the discourse's execution, but DiscourseStart occurs each time the discourse resets after processing user input. For example, if the user says a phrase that is not recognized in the phrase list, the code in the Inactivity event handler will execute. Then, the discourse will start over and wait for the next input from the user. At this point, the DiscourseStart event will occur again, but DiscourseEntry will not.
  • the discourse begins execution and processes the code in the DiscourseEntry event , 2.
  • the DiscourseStart event occurs and its code is executed 3.
  • the application waits for the user to speak.
  • the prompt structure in the Virtual Operator's Main Menu discourse can be simplified by defining a DiscourseStart event that prompts the user to name a department.
  • the greeting "Welcome to XYZ corporation" would remain in the DiscourseEntry event handler, since the user should be greeted only when the call is first received, not each time the discourse resets.
  • the DiscourseStart event now prompts the user each time a voice command is expected, the duplicated prompts in the RecognitionError and InactivityError event handlers can be removed. To make these changes, the event handlers in the Main Menu discourse should be edited as specified in the table below.
  • the application should be compiled, published and tested to see how the prompts are read to the user.
  • the prompts can then be modified and the application republished to determine how the flow ofthe discourse proceeds.
  • the Bargeln feature can be turned on and off by setting the AllowBargeln component-level parameter.
  • DiscourseEntry Occurs when a discourse is activated. (This event fires only once during the execution of a discourse)
  • DiscourseStart Occurs each time a discourse resets following an event or the execution of a topic.
  • Inactivity Occurs no input is received from the user during a set period of time. (The period can be set using the InactivityTimer component-level parameter.)
  • RecognitionResult Occurs when the user speaks a phrase that is recognized by the rec engine.
  • the next step in creating the VirtualOperator application may be to add a second discourse.
  • each set of tasks that should have its own set of recognized phrases should be isolated in its own discourse.
  • the VirtualOperator will prompt a user to name a department and then transfer the call as specified.
  • An employee may, for example, wish to call the VirtualOperator and access his or her voice messages or email.
  • This functionality needs to be isolated into separate discourses because it involves different prompts and commands than requesting a particular department.
  • vavm.TTSString "Welcome back”
  • vavm. SelectDiscourse "VerifylnternalCaller”
  • vavm's SelectDiscourse API call transfers the flow of control from the current discourse to the one named in the parameter.
  • the application will say “Welcome Back” and switch to the VerifylntemalCaller discourse whenever a user says “It's me” or "personal” at the main menu prompt.
  • vavm's SelectDiscourse method when called, instructs the virtual machine as to which discourse to switch to after the current discourse has completed its execution; it does not interrupt the execution ofthe current discourse. For this reason, the order ofthe commands in the InternalCaller topic's Action script could be reversed without changing the way the application executes: vavm. SelectDiscourse "VerifylntemalCaller" vavm.TTSString "Welcome back"
  • the VerifylnteralCaller discourse should be opened and a new DiscourseStart event handler should be added that prompts the user to enter his or her PIN number.
  • the prompt may appear as follows: vavm.TTSString "Please enter your personal identification number"
  • VerifylnteralCaller executes, it will prompt the user for a PIN and wait for a response.
  • One way to handle the response to the PIN prompt would be to listen for a particular four-digit number and, when one is recognized, trigger a task that corresponds to the user who called in. This could be done by creating a new topic (named, for example, PINResponse) and adding the following entries to the topic's phrase list:
  • a resource allows topics and event handlers to reference components contained externally. These components include Phrase List and Script Modules, which a topic uses to import pre-defined grammar entries and scripts, and Prompt Groups, which topic and event handlers use to incorporate voice output defined externally. All resources are global, so once a resource is defined it can be use from within any discourse in the application. With resources, a developer can both increase the speed of application development (by reusing grammar and code components) and improve the sophistication ofthe voice interface (by using recorded and dynamic prompts).
  • the resources that are available in an application are listed in the "Resources" pane in the main application window. When a new application is created it will have no resources associated with it, so this pane will be empty. (Available resources can also be viewed in the left-hand application tree by clicking on the "ResourceView” table at the bottom ofthe tree pane.)
  • phrase lists The three types of resources are available: phrase lists, script modules, and prompt groups.
  • prompt groups The three types of resources are available: phrase lists, script modules, and prompt groups.
  • a Prompt Group is the output counterpart of a Grammar. Using prompt groups, the developer can define a set of phrases that will be output to the caller as needed during the execution of a VA application.
  • Prompt Group resource is a recorded prompt that will be played to the user.
  • TTS Text-to-Speech
  • Prompt Groups also allow dynamic output that combines various static pieces of output depending on the identity ofthe caller and the circumstances. Such a resource would select pre-recorded excerpts and assemble them to say, for example, “he has ten messages”, “she has six messages”, and "he has one message”.
  • a Prompt Group is associated only with the application as a whole. Once a Prompt Group has been defined, it can be used in any topic or event handler in the application.
  • the name of the prompt can be anything the developer chooses, but it should be descriptive ofthe prompt's purpose.
  • the new prompt group will be added to the list of available Resources on the main menu and to the tree of available Resources in the left-hand tree- view pane.
  • the new default group will be created with a default set of properties, including
  • a group When a group is first created, it does not contain any prompts. To add a prompt, perform the following steps: 1. In the Resources pane on the main menu, double click on the prompt group to which to add a prompt. The prompt group view window will be opened in the main workspace. (Because no prompts have been defined, this window will be empty.)
  • the Prompt Properties dialogue 259 is the primary interface for editing a prompt.
  • the fields in this dialogue have the following meanings:
  • the type ofthe prompt can be either Simple or Expression:
  • Simple prompts have only a Filename and/or Text string defined for their output; this output is static and cannot be changed during application execution.
  • Simple type is selected for a prompt, the Expression and Properties text fields are grayed out on the dialogue.
  • a prompt that says "Thank you for calling" is an example of a Simple prompt.
  • Expression prompts allow for dynamic output. Rather than playing a predefined text string or sound file, Expression prompts allow the developer to specify an expression that will be evaluated when the prompt is called and determine what sound is output to the user. When the Expression type is selected, the Filename and Text fields are grayed out on the dialogue. A prompt that says, "You have Xnew messages" (where the number X is passed into the prompt by the application) is one example of an Expression prompt.
  • the filename property contains the name of a sound file that will be played when the prompt is called. The property is used only for Simple prompts.
  • the text property contains a text string that will be converted to speech and played when the prompt is called.
  • the property is used only for Simple prompts.
  • a developer can define both a Filename and a Text property for a prompt. In such a case, the Text field will act as a back up.
  • the system will attempt first to play the sound file indicated by Filename. If this file cannot be found or read, the system will revert to the Text field and play the specified string through the Text-to-Speech facility.
  • the most basic use of a resource is to play a single prompt when a particular event occurs.
  • a developer could use a resource to play a greeting message when a caller first connects with the Virtual Assistant.
  • the developer would create such as resource by using the following steps:
  • Simple prompts that define a Filename and or Text string are static, producing the same output each time they are played.
  • Expression prompts allow developers to insert macros and pieces of script into their resources so that the output will be dynamically determined at run time.
  • An example of such a resource is one that reports to the user how email many new messages are in his or her inbox. The output line will always be the same ("you have new messages") except that the value of X will vary.
  • An Expression consists of a series of macros, references to other resources, or both.
  • Macros A macro which is always preceded by a percent sign (%), will be translated by the VA virtual machine when the resource is processed.
  • the %siience ⁇ duration) macro for example, will be translated to a duration -millisecond pause by the system.
  • the %string [output_string] macro instracts the virtual machine to translate the specified output fstring of text into speech.
  • the string When using a literal string value as a macro parameter, the string should be enclosed in two quotations marks (e.g. ""string”"). The first quotation mark in each pair acts as an escape character.
  • %silence(String duration) Pauses voice/sound output for the number of milliseconds specified by duration.
  • %play(String filename) Plays the sound file specified by filename.
  • %record(String filename) Records the user's voice input into the specified file. (This file will saved in Sphere format.)
  • %phone(String number) Translates the text specified by number into speech in the format of a phone number. "5553322", for example, would be translated “five-five-five-three- three-two-two”.
  • %date(Date output date) Translates the Date variable specified by output _date into speech.
  • %idate(String datestring) Intelligently translates the text in datestring to date-formatted speech.
  • %time(Date output_time) Translates the Date variable specified by output Jime into speech.
  • %itime(String timestring) Intelligently translates the text in timestring to time-formatted speech.
  • Macros by themselves offer limited utility. The real power of a macro appears when it is combined with resource arguments, which allow data to be sent to the resource when it is called by the application- The syntax for referencing arguments is
  • a VA application for example, might define a resource named check mail that informs the user how many email messages are in his or her box.
  • a resource expression can call another resource, providing a sophisticated method for building dynamic output. Rather than using the Text-to-Speech facilities to render the entire prompt, for example, a developer may use recorded prompts for the constant elements in the prompt and the TTS facilities for the dynamic data.
  • the developer could record a human voice saying "the current time is,” save it in a file named cwrent_time.wav, and create a resource named current_time that had its file field set to "current_time.wav.”
  • the developer could then create a second resource named read_time that added the dynamic data and called the first resource.
  • the expression field for the second resource would appear as follows:
  • the read_time resource When called at 3:12 PM, the read_time resource would say the following to the user: "The current time is three twelve p.m.”
  • the VBScript Now function returns a Date object specifying the current date and time according the computer's system clock.
  • a resource expression can also specify arguments when calling another resource.
  • a developer for example, could call the read_time resource from within another resource using the following expression:
  • a resource returns only one prompt to the system at a time
  • multiple prompts can be defined within a single resource (hence the name "Prompt Group” as the resource type).
  • Using multiple prompts adds variety to the output of a VA application and allows the application's responses to be dynamically generated.
  • a developer may want to create a resource that plays a more detailed error message when the user is a customer than it does when the user is an employee.
  • Such a resource would define two possible prompts that would be played back depending on the identity ofthe current user was.
  • the system determines which prompt to play back based on one or more Properties defined for each prompt in the prompt group.
  • the developer should call it from within another resource and set the Property to the desired value.
  • the developer would need to first create an Expression-type resource that calls Menu_Error and sets the Insider Property to true or false. This call (defined in the Expression field ofthe calling resource) has the following syntax:
  • an expression can also use a piece of scripting code to assign the value to the resource's Property.
  • the expression statement uses the following format Expression :
  • Prompt Group Selects Prompts When a prompt group is invoked, it uses the custom-defined properties to test each prompt and determine which one should be returned for output. To do so, it steps through its list of prompts, evalutes the properties for each prompt, and returns the first prompt whose properties all evaluate to true.
  • Such a prompt group might be called from another resource with the following expression:
  • a prompt group uses a Sequential mode to select which prompt to return. That is, it starts with the first prompt in the list and proceeds in order to the last. In most cases, this is the desired behavior, but a developer may wish to have the prompts tested randomly.
  • Random selection mode the prompt group checks its prompts' properties in a random order until a true condition is found. This mode is useful for creating resources that provide variety in their output.
  • a resource that is played when a user makes an invalid menu selection might define several possible prompts that could be returned. Rather than saying "Invalid choice” each time, the application could say variously “that choice is incorrect,” “you selected an invalid option,” and “that choice is not available.” The system would randomly select which prompt to use, bringing variety to the user's interaction with the application.
  • a prompt group might be defined as follows:
  • the Prompts text area contains a list ofthe prompts that have been defined in the prompt group and the order in which they will be evaluated when the group is in Sequential Selection mode. To change the order of a prompt, click on its name to highlight it, then click the Up or Down buttons to move the prompt through the list.
  • Expression and Prompt Properties can be used to create sophisticated output.
  • vavm PlayPrompt "num_messages (@1 , @2) " , num, num_unread
  • variable num would be set to the number of messages in the user's mailbox and numj ⁇ nread set to the number of messages that are unread.
  • phrase list module allows the definition and importation of sections of a grammar that will be accessible from within phrase lists anywhere in the application, that is, in the phrase lists for a discourse or topic. Using a phrase list module saves a developer from having to repeatedly define the same phrases in multiple lists.
  • phrases List Module can be defined only globally. To create a new phrase list module, perform the following steps: 1. Open the main window for the application. 2. Right click inside the "Resources" pane. A drop-down menu will open.
  • phrase list module The format of a phrase list module is identical to that for regular phrase lists in a discourse or topic.
  • a developer can define rales, use typed variables, and implement dynamic grammars. For example, one use of a phrase list is to recognize positive and affirmative voice commands from the user. In a large application, multiple discourses and topics may ask the user to answer questions by saying "yes” or "no.” Without a phrase list module, the developer would have to define the phrase “yes” and "no" in the phrase list for every discourse or topic that listened for such responses. If the developer wished to make his or her application flexible and recognize variant responses such as "yeah” and "okay," the amount of coding required would increase substantially.
  • phrase list module By using a phrase list module, the developer can perform the work of defining acceptable "yes” and “no” responses only once and then use the same grammar code repeatedly throughout the application.
  • the text of such a phrase list module might be called YesNoEPL and could appear as follows:
  • the system will recognize a positive response and set the rec object's Answer tag to "yes” no matter whether the caller says “yes”, “yup”, “yes that's right”, or any of several other affirmatives. Similarly, the system will recognize a negative response no matter whether the caller says "no",
  • phrases list module can be inserted into the phrase list of a topic simply by adding the name ofthe module followed by a period.
  • YesNoEPL defined in the previous section, the developer would simply open the phrase list for the desired topic and add the following line:
  • topic's phrase list will, in addition to whatever other phrases and rules the developer wishes to define, will recognize all the variants of "yes” and "no" defined in the YesNoEPL module.
  • a developer can also import an existing module from a file. This allows modules that were created for other VA applications to be used in a new application. After developing a few full-featured applications, in fact, a VA developer should have a considerable library for phrase list modules on hand, allowing him or her to reuse the same sets of common responses such as affimatives/negatives, email commands, and menu commands.
  • a phrase list module should be stored in a .vaplm file. These types of files are generated by the VA Studio compiler each time a project is built. In most instances, there will be only one .vaplm file for each application, and it will share the application's name.
  • the phrase list module file for the VirtualOperator project (VirtualOperator.vasp), for example, could be named VirtualOperator. vaplm .
  • phrase list module Once a phrase list module has been inserted from a file, it can be edited to add or remove grammar entries. To do so, double click on the name of the new module in the Resources list. An editor window will open containing the text ofthe phrase list. Script Modules
  • Script Modules help the developer create applications rapidly. Instead of defining modules that can be used in a topic's phrase list, script modules allow developers to reuse the same piece of scripting code in multiple topics or event handlers.
  • the process for defining a script module is very similar to that for a phrase list module: 1. Open the main window for the application.
  • a script module is analogous to a C/C++ header file or an imported Java class.
  • constants and public functions are defined and made available outside the module by using the vavm's Export API method. Once defined, these constants and functions can be referenced in the code of any topic or event handler that imports them using the vavm's External method.
  • vavm. Export ⁇ Modul eName
  • ModuleName is a string containing the module's name.
  • ModuleName can be defined as a literal string, by convention VA applications define the module's name in a constant string called MODULEJSTAME and use that constant in the Export method call.
  • one type of functionality that might be repeated in several topics in an application is the sending of an email message using the user's SMTP account.
  • This functionality involves dozens of lines of code, and by isolating it in a script module the developer can reuse the same code in multiple topics.
  • the example below shows a script module named "SendingEmail" that defines a public function (SendMailMessage) that will be used by all the topics in the application that need to send an email message.
  • vavm.SignalEvent 0 3 "Sending Mail” oSMTP.Send theUserName, "ConitaCSR@conita.com”, “ConitaCSR@conita.com”, toUserAddress, 1, mailSubject, "EMAIL”, "This is a test, an included file attached.”
  • the module can be used within the code of any topic or event handler. To do so, use the vavm's External API method to import the module.
  • the syntax of External is as follows: vavm. External (ModuleName)
  • ModuleName is a string containing the name ofthe script module to be imported. Once External has been called, the script module's public subroutines and functions can be called just as if they were a part ofthe topic or event handler's code. In effect, the External method defines an object with the name ofthe module; any calls to that module's subroutines or functions will be made as if the developer were calling the methods of an object.
  • Topic EmergencyNotif ication
  • Trigger: rec .Action "Notify"
  • vavm.TTSString Please say the message you would like me to send to the administrator.
  • vavm. RecordWaveform “RecordedMessage”
  • vavm.TTSString I'm sending the email notification now.
  • the User Object (VAUserDO) •Instantiating a User Object
  • vavm's CreateUserObj API call This call returns a new User object with uninitialized values.
  • the method void Authenticate (String AuthString, String AuthMethod) authenticates the user contained in the object based on the criteria specified in the
  • AuthMethod parameter The possible values for AuthMethod correspond to the following fields in the VA database:
  • the method void setValue (String Name, String Value [, String Privilege] ) sets a keyword/value pair in the User object's memory cache. If the keyword Name already exists in the cache, its value will be updated to that specified by Value.
  • Privilege can be used to specify whether the privilege for the newly created keyword/value pair should be "Public” or "Private.”
  • a "Public” value (such as a telephone number) is accessible to anyone who wishes it.
  • a "Private” value (such as a PIN or account number) can be read only by a User object that has been successfully authenticated.
  • Calling the setValue method will change the keyword/value pair only in the memory cache, not in the actual database record. To change a user's database record, the CommitValue or CommitValues method must be called.
  • void DeleteValue deletes the keyword/value pair specified by Name. To delete a keyword from a user's database record, the User object must be successfully authenticated.
  • void CommitValue (BSTR Name) writes the value contained in the user keyword specified by Name to the database. For a value that with "Private" privileges to be committed to the database, the User object must have been successfully authenticated.
  • void CommitNalues writes all the keyword/values contained in the User object's memory cache to the database. For values with "Private" privileges to be committed to the database, the User object must have been successfully authenticated.
  • void RefreshValue (String Name) reloads the value ofthe keyword specified by N--7?ze from the database into the User object's memory cache.
  • void Ref reshValues reloads all user data values from the database into the User object's memory cache.
  • a developer can also treat a keyword as a member variable ofthe User object.
  • EmailPassword // end if
  • the SMTP Object allows VA applications to interface with a mail server and send email messages for the user.
  • void Connect String Servemame
  • This method must be called before a message can be sent.
  • Servemame specifies the name ofthe SMTP server with which to connect and can be either in DNS name format (e.g. mail.bigmail.com) or IP Address format (e.g. 123.45.231.2).
  • void Disconnect disconnects the object from the SMTP server.
  • void attach (String Filename) attaches the document specified by Filename to the outgoing email message. This method can be called multiple times to attach more than one document to a message, and it should be called before the send method.
  • attach method the SMTP object will cache the filename internally.
  • send method the object will attach each ofthe documents in its cache to the message before sending it to the SMTP server.
  • void send (String FromName, String FromAddress ,
  • FromName A string containing the name to be displayed in the from field ofthe message.
  • Standard SMTP values for Importance are "high”, “low”, and "normal"
  • X is an integer determined by the user. This call will invoke a new STMP Object Session with the ID specified by X.
  • the Recognition object is used by the VA virtual machine to return to results of a speech recognition operation to the VA application. For example, when the user speaks into the telephone, the system recognizes the speech and compares the results with the phrase list for the active grammar. If a match for the recognized speech is found in the phrase list, the speech recognition data is stored in the rec Recognition object, which is passed back to the application.
  • a recognition objects is also returned by the vavm API RecognizeUtterance, which allows a developer to use the speech recognition facilities to recognize a sound file and translate it to text.
  • a Recognition object has no methods. Three member variables (listed in the table below) are used to store the speech recognition results. In addition, the user can define the custom tags that are set within grammars when particular phrases are recognized.
  • Variable Type Description confidence integer A value from 0 to 100 indicating the relative level of confidence that the speech was correctly recognized resultstring String
  • the text string generated by the recognition engine when the speech was recognized utterance String Name ofthe file in which the utterance was recorded Using the Recognition Object
  • the Security object can be used by a VA to encrypt and decrypt text-file communications (such as email). It supports both standard encryption as well as high- security 64 bit algorithms.
  • String Encrypt (String Text, String Key) returns a string containing the encrypted version of Text as encoded using the specified Key.
  • String Decrypt (String EncryptedText, String Key) returns a string containing the decoded version of EncryptedText, as decoded using the specified Key.
  • String Base64Encrypt (String Text, String Key) returns a string containing the encrypted version of Text as encoded using the specified Key with Base-64 Encryption.
  • String Base64Decrypt (String EncryptedText , String Key) returns a string containing the decoded version of EnayptedText, as decoded using the specified Key with Base-64 Encryption.
  • AudioDO Audio Object
  • the Audio object can be used to perform several audio utility functions for VA applications. These functions include the following:
  • void Connect String ServerName, Integer Port connects to the port Port ofthe speech recognition server specified by ServerName.
  • the method void Disconnect () disconnects the Audio object from the server.
  • RiffFileName Boolean ConvertTo8Bit
  • void RiffToSphere (String RiffFileName, String SphereFileName) converts the Riff file contained in RiffFileName to Sphere format and saves it in the file SphereFileName.
  • JnputBFileNa-rie, String OutputFileName reads the sound files in InputAFileName and InputBFileName, concatenates them, and stores the result in OutputFileName.
  • NA applications use static grammar definitions.
  • the phrase lists defined for these applications remain the same each time the application executes.
  • a developer will need to use grammars that change according to who is using the application or when the application is being used.
  • a developer may wish to implement an address book feature that allows users to call in and request the addresses and/or telephone number for contacts.
  • Dynamic grammars provide a way around these problems. They allow an application to import grammar definitions both at start-up and on the fly during the course of its execution. Using dynamic grammars, an address book application can wait until a call is received and, once the user is identified, load a list of recognizable contact names from that user's database records.
  • pron strings short for
  • pronunciation strings contain the data that instracts the speech recognition engine how to identify particular phrases. Although developers of a NA application will probably never have to construct a pron string manually, it is useful to understand the information they contain.
  • the standard format for a pron string has four separate elements: word_name pronunci a ti on recogni ti on_probabi 1 i y NL__statement
  • word_name the text representation of the word to be recognized.
  • Pronunciation a phonetic string indicating how the word is pronounced.
  • Recognition_Probability the probability that the word will be recognized.
  • This string instracts the speech recognition engine that when the phonetic pattern "jh ao r jh” is heard, it should return the text string "george” to the VA virtual machine.
  • One ofthe easiest ways to generate a pron string for a particular name is to use the AudioDO object's PronFactory interface.
  • This interface's methods allow a developer to submit a person's name and receive back a formatted pron string that is ready for insertion into a dynamic grammar.
  • the AudioDO object makes calls to a PronFactory server for the actual translation ofthe text to a pron string.
  • This PronFactory serrver runs as a Windows
  • AudioDO's Connect method should be called before calling the methods to generate pron string.
  • the connect method has the following syntax: connect (String ServerName, Integer Port)
  • ServerName is the name ofthe server on which the PronFactory is running and Port is the port on which it is listening for requests.
  • the GeneratePron method can be used to generate a new pron string for a word.
  • the method has the following syntax:
  • the AudioDO includes a GeneratePronFromName method that converts a first and last name into a single pron string: String GeneratePronFromName (String FirstName,
  • the VBScript code below instantiates an AudioDO object and uses it to generate a pron string for the name "Michael Harrison.”
  • Dynamic entries can be added to a grammar by using the Dynamic typed variable. Dynamic essentially creates a placeholder in the grammar that will be filled by dynamically generated entries at ran time.
  • the format ofthe typed variable is as follows:
  • Name is the name used as an identifier for the dynamic grammar. .Setting Rec Object Tags
  • the returned pron string will automatically contain instructions that set tags in the rec object.
  • a rec object will be returned to the application with its FirstName and LastName tags set to the first and last name ofthe person identified.
  • the Dynamic typed variable might be used.
  • the application needs to recognize the phrase "give me the address for" plus the desired contact name.
  • the contact name is unknown at the time of writing the application (since the user will periodically add and remove names to his database-stored address book), so the Dynamic placeholder is used instead ofthe contact name in the phrase list:
  • a rec object will be returned with an Action tag set to "Query_Address” and FirstName and LastName tags set to the first and last name ofthe identified person.
  • Gr ⁇ mm ⁇ rN ⁇ me represents the name ofthe dynamic grammar to which the entry should be added. Entry is a pron string for the word that is being dynamically added.
  • a pron string is a regular ASCII text string, it can also be stored in a database. By storing the pron string in the database, it can be loaded directly into the dynamic grammar when a user connection is made and thereby avoid the overhead of converting contact names to pron strings each time a call is received by the application.
  • the virtual assistant scripting application programming interfaces are as follows:
  • the module contains a single function, Chop, that receives a string as an input parameter, chops off any leading whitespace, and returns the modified string.
  • Public Const MODULE_NAME "SMUtility"
  • vavm. export (MODULE_NAME) Public Function Chop (strlnString as String) as String
  • strlnput strlnString
  • Imports an external script module for use in the code of a topic or event handler The module to be imported must have been defined in the application's Resources list and made available using the Export call.
  • script module Once a script module has been made available using the External API, it can be called from any topic or event handler that shares the same namespace in which the call was made. It is common for VA developers to make all their External calls in an ApplicationEntry event handler so that the external modules will be available throughout the application.
  • CallerD The telephone number from which the user calls.
  • a developer can retrieve the ANI (CallerlD) number for a particular call using the vavm.CallANI method and use this identification method to compare it against the value stored in the Users table.
  • DNISNumber The number dialed by the user to reach the VA.
  • each user can be given a different number to call to connect to the VA. Though these numbers will all connect to the same application, the DNISNumber identification method can be used to determine which user has called in.
  • VAUserDO User Object
  • the object is returned unauthenticated; the developer will need to call the User Object's authenticate method. For more information, see Chapter 7, "Creating Advanced VA Applications.” •Examples The following code instantiates a VAUserDO object and sets its user to "Bill
  • the following code instantiates a VAUserDO object and identifies the user by
  • LoadDynamicGrammarFile •Syntax vavm. LoadDynamicGrammarEntry (String GrammarName, String FileName)
  • GrammarName The name ofthe grammar into which the new entries should be loaded.
  • Level (Optional) The verbosity level ofthe message. The valid range of values for Level is 1 to 5, with one indicating a brief message and 5 indicating a very verbose message. (To shorten the length of log files, Administrators can set an option to filter out messages with a verbosity greater than a particular level.)
  • PhoneNumber The phone number to be called. .See Also
  • %silence(String duration) Pauses voice/sound output for the number of milliseconds specified by duration.
  • %play(String filename) Plays the sound file specified by filename.
  • %record(String/z/e7- ⁇ 7.-e) Records the user's voice input into the specified file. (This file will saved in Sphere format.)
  • %phone(String number) Translates the text specified by number into speech in the format of a telephone number. "5553322", for example, would be translated “five-five-five- three-three-two-two' '.
  • %date(Date outputjlate) Translates the Date variable specified by outputjlate into speech.
  • %idate(String datestring) Intelligently translates the text in datestring to date-formatted speech.
  • Type Identifies the file's type. Cu ⁇ ently, two types of sound files are supported: "Riff (Microsoft Riff Format, .wav) and "Sphere” (Sphere format). If no type is specified, the method will attempt to detect the type automatically from the file.
  • RecordWaveform returns a long integer containing the length (in milliseconds) ofthe recorded clip. •Parameters
  • FileName The name file in which the recording should be saved.
  • Type Identifies the file's type. Cu ⁇ ently, two types of sound files can be specified: "RIFF” (Microsoft's Riff/wav Format,) and "SPHERE"
  • DiscourseName The name ofthe discourse to be selected.
  • Timeout A long integer indicating the timeout value in milliseconds.
  • ParameterName The name of the parameter to be set.
  • SpeechRecProgID The ProglD ofthe Recognition engine that will be used by the VA Application.
  • TTSProgID The ProglD ofthe Text-to-Speech server that will • be used by the VA Application.
  • InactivityTimeout The period in milliseconds that must pass without input from the VUI before an Inactivity event is fired.
  • ParameterName The name of the parameter to be set Value: The value to which the parameter should be set.
  • UtteranceFileName string The base filename that will be used for saving temporary utterance files.
  • UtteranceDirectory Specifies the directory in which temporary utterance files will be written.
  • RecognitionTimeout rec.numNBest integer The number of possible matching words that will be returned by the speech recognition engine. For most VA applications, this parameter should always be set to 1.
  • rec. GenParti alresults rec.ConfidenceRejectionThres integer (0-100) The minimum confidence hold level that the speech recognition engine must reach before a phrase is considered to be properly recognized. config.ServerHostname config.ServerPort , config.RecClientHostname config.RecClientPort lm.Addresses
  • FALSE barge-in feature is active or inactive. Barge-in interrupts the current output stream when new user voice input is received, allowing a caller to interrupt the VA's output.
  • clientKillPlaybackOnBargeln "TRUE"
  • EventlD The numeric ID co ⁇ esponding to the event.
  • a set of events have been defined by the SDE, which uses them internally to log messages and alert administrators. This events are listed in the table below.
  • a developer can also create custom events using any Event ID he or she chooses, provided it is not already in use by the SDE
  • Severity A long integer indicating the severity ofthe event (see table below).
  • Event Descriptor Event ID Event ID

Abstract

A virtual assistant engine (10) for running a virtual assistant application (28, 30, 32), comprised of an interpreter for parsing, storing in a computer memory, and executing source code for a virtual assistant application, a scripting object that provides methods and properties for creating a virtual assistant application and an abstraction layer for interfacing with a speech recognition server (24), telephony hardware and a text to speech server, wherein the scripting object provides the interface between the abstraction layer and the virtual assistant application.

Description

VIRTUAL ASSISTANT ENGINE
This application is related to application Serial No. , entitled Personal
Virtual Assistant, Serial No. , entitled Virtual Assistant with Semantic . Tagging, and Serial No. , entitled Virtual Assistant with Temporal
Selectivity, which are filed simultaneously herewith, assigned to a common assignee, and are hereby incorporated by reference.
FIELD OF THE INVENTION
The present mvention relates to a computer-based, virtual assistant engine.
BACKGROUND OF THE INVENTION
Mobile professionals, such as physicians, attorneys, sales representatives and other highly mobile professionals often find it difficult to communicate with clients, customers, colleagues and assistants. These mobile professionals travel frequently and are not accessible via a desk phone or traditional, wired computer network. They typically employ human assistants to relay important information, maintain their schedules and filter out all unnecessary interruptions. A virtual assistant is a computer application that allows the mobile professional to access personal, company, and public information, including contacts, schedules, and databases from any interactive device, such as telephone. Previously, virtual assistant applications were hardcoded. In other„words, a monolithic program written in the C++ programming language, for example, would implement all ofthe functions of and interfaces with the virtual assistant. Examples of such a virtual assistant application is described in U.S. Patent No. 5,65 789 to Miner, et al., and assigned to Wildfire Communications, Inc. The problem with such prior art virtual assistant applications is that modifying or customizing the application is a difficult and time-consuming process. In order to make changes, a highly skilled computer programmer would be required to edit the source code, debug, recompile and link the edited source code. Then, the modified virtual assistant application had to be tested to ensure that it functioned as intended.
The present invention solves this problem by defining a virtual assistant application in terms of discourses, grammars, event handlers, and other components that instruct a virtual assistant engine as to how to execute ofthe application. This advantageously permits integration ofthe virtual assistant with other commercially available applications, including messaging applications and database management applications. The VA Application can be easily modified to satisfy specific requirements of a user.
SUMMARY OF INVENTION
The present invention relates to a virtual assistant system with many discrete features, each of which comprises a separate but related invention. Thus, one aspect ofthe present invention is a virtual assistant engine for running a virtual assistant application, comprised of an interpreter for parsing, storing in a computer memory, and executing virtual assistant definition language source code for a virtual assistant application, a scripting object that provides methods and properties for creating a virtual assistant application and an abstraction layer for interfacing with a speech recognition server, telephony hardware and a text to speech server, wherein the scripting object provides the interface between the abstraction layer and the virtual assistant application.
The interpreter is comprised of a parser for parsing the virtual assistant definition language source code and storing the parsed virtual assistant definition language source code in the computer memory. The parser is constructed using the Purdue Compiler Constructor Tool Set.
The interpreter is further comprised of a state machine for executing the stored virtual assistant definition language source code. The state machine determines the tasks to be performed by the virtual assistant application responsive to input from the user. The state machine also manages a barge-in commands received from the user. In addition, the state machine manages external events responsive to output from the virtual assistant application, the output indicating that an external event has occurred, and is configured to cause the user to be notified ofthe occurrence ofthe external event. Examples of external events of which the virtual assistant user is notified are receipt of a telephone call, placing a telephone call, receipt of an electronic message, a meeting reminder, a task reminder, a change in a database and a change in monitored information. The interpreter is further comprised of: a scripting host object, and a scripting engine, whereby the scripting host object interfaces with the scripting engine. The scripting engine executes scripts written in a scripting language, such as VBScript, JavaScript, Perl, REX and Python.
The interpreter is further comprised of a session object, which manages telephone calls to and from a virtual assistant application user. The session object is comprised of a call state manager for tracking the status, for example connected, on hold and in conference, of a telephone call to or from the virtual assistant application user. The session object is further comprised of a call object, for managing calls to the virtual assistant user from the virtual assistant and to the virtual assistant from the virtual assistant user. The session object also is configured to generate and store in the computer memory a log of information about a virtual assistant application user session. The information log includes information about call statistics (for example, call duration, DNIS number, ANI number), call counters and call transcription (commands issued by the user and responses from virtual assistant.).
The interpreter is further comprised of a discourse manager, which activates the appropriate discourse responsive to input from the virtual assistant application user. The discourse manager also activates the appropriate grammar responsive to the active discourse. The scripting object is configured to provide output to the user asynchronously. Such output is comprised of rendering text into speech or playing recorded .prompts. The scripting object is further comprised of a management interface, which is configured to generate and store in the computer memory a log of information about virtual assistant application errors. The management interface also is configured to enable the management and configuration of a virtual assistant system by a system administrator.
The scripting object is further comprised of an interface for managing dynamic grammars. The management of dynamic grammars is comprised of, creating a user specific grammar when a virtual assistant application user session begins, storing the user specific grammar in the computer memory for use by a user during the user session, and deleting the user specific grammar from the computer memory when the user session ends. The user specific grammar is generated from a user- specified database.
The scripting object is further comprised of a state machine interface for controlling a state machine for managing external events, and a call management interface for controlling a session object, which manages telephone calls to and from a virtual assistant application user.
The abstraction layer is comprised of a speech recognition module, a telephony module and a text to speech module. The speech recognition module is comprised of an interface between the scripting object and the speech recognition server. The telephony module is comprised of an interface between the scripting object and the telephony hardware, such as, an adapter for allowing electronic communication between the virtual assistant application and the virtual assistant user and a conference call adapter. The text to speech module is comprised of an interface between the scripting object and the text to speech server.
Other features and advantages will become apparent based on the following detailed description ofthe preferred embodiments and the claims.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is an overview ofthe virtual assistant (VA) ofthe present invention;
FIG. 2 is a diagram ofthe VA Server;
FIG. 3 is a diagram ofthe VA Studio; FIG. 4 is a diagram ofthe VA Engine conceptual model;
FIG. 5 is a diagram ofthe VA Manager conceptual model;
FIG. 6 is a screen shot ofthe Microsoft Management Console for managing the VA Server Manger;
FIG. 7 is a screen shot of a web page that uses Active Server Pages to manage the VA Server Manager;
FIG. 8 is a diagram ofthe component relationships of a VA Server Set;
EIG. 9 is a diagram of a relatively small VA system;
FIG. 10 is a diagram of a large VA system;
FIG. 11 is a diagram of a very large VA system; FIG. 12 is a diagram of a VA discourse;
FIG. 13 is a diagram ofthe VA Discourse/Grammar Model;
FIG. 14 is a screen shot ofthe Main Application Window;
FIG. 15 is a screen shot ofthe Topic View Window;
FIG. 15 is a screen shot ofthe Task View Window; FIG. 16 is a screen shot ofthe Prompt Properties Dialogue Box;
FIG. 17 is a screen shot ofthe an Expanded Tree View ofthe Menu_Error Prompt Group; and
FIG. 18 is a screen shot ofthe Prompt Group Properties Dialogue Box.
DESCRIPTION OF THE PREFERRED EMBODIMENTS The subheadings used herein are meant only so as to aid the reader and are not meant to be limiting or controlling upon the invention. Generally, the contents of each subheading are readily utilized in the other subheadings.
Overview
Mobile professionals, such as physicians, attorneys, sales representatives and other highly mobile professionals often find it difficult to communicate with clients, customers, colleagues and assistants. These mobile professionals travel frequently and are not accessible via a desk phone or traditional, wired computer network. They typically employ assistants to relay important information, maintain their schedules and filter out all unnecessary interruptions. The virtual assistant ofthe present invention allows the mobile professional to access personal, company, and public information, including contacts, schedules, and databases from any interactive device, such as telephone.
The virtual assistant ("VA") system ofthe present invention is comprised of two main components: (1) the VA Server, which is built on a Windows NT telephony server platform, and (2) the VA Studio, which allows skilled information technology professionals to develop VA applications that interface with electronic messaging systems, such as Microsoft Exchange and Lotus Notes. The VA Server is a component ofthe Service Deployment Environment ("SDE"), which is discussed in more detail below. The VA Studio is a component ofthe Service Creation
Environment ("SCE"), which is also discussed in more detail below.
As shown in Figure 1, the VA Server 10 is comprised of a human interface 12 and a network interface 14 for handling calls and providing automated access to information to corporate 28, private 30 and pubic 32 information repositories and sources. The human interface 12 is comprised of a graphical user interface 22, which may be a web browser, a subscriber (or user) voice user interface 24, generally accessed by a telephone, and a public voice user interface 26. The virtual assistant allows a user to use a voice interactive device, such as a telephone, either wired or wireless, to access and update such information. The VA Server also manages all incoming communications by sorting, prioritizing, and filtering such communications, while providing notice to the user of important messages and events.
VA Server
As seen in Figure 2, a core component ofthe VA Server 40 is the voice- enabled Virtual Machine 42, which is also referred to as the VA Engine. The VA Engine receives spoken commands, interprets and executes them. The VA Engine supports a COM interface 44, which in turn enables VA applications to provide voice access to network applications.
The VA Engine also supports a telephony interface 46 to voice messaging 52 and private branch exchange systems 54, enabling third-party systems to be integrated with the VA Server. The VA Server conforms to Windows NT telephony and speech interface specifications. The voice-messaging interface 56 supports the VPIM (Voice Profile for Internet Mail) standard, and provides a gateway between proprietary voice messaging systems and VA Server.
The VA system management services provide operations, administration and maintenance capability (OA&M) 60. The OA&M applications also provide a Simple
Network Management Protocol ("SNMP") interface to third party management applications, for example, HP Openview and CA Unicenter.
In the preferred embodiment, the VA Server is operable on Windows NT Server, release 4.0 or higher, in both single and multiprocessor configurations. Those skilled in the art, however, recognize that the VA Server can be ported to other computing platforms. Multiple systems may be clustered together to support higher system workloads and fail-safe operation.
VA Application Suite
The VA Application Suite, in the preferred embodiment, is compatible with a messaging server 62, such as Microsoft Exchange/Outlook. The VA's architecture, however, advantageously permits integration with other commercially available and customized messaging applications. The VA Application can be easily modified to satisfy specific requirements of a user. The basic functions ofthe VA Application include: Messaging - voice-mail, e-mail, and faxes
Contact Management - scheduling, planning, group calendar, contact and referral organization
Call Control - remote users to perform conference calling and call management; notification and forwarding features allow remote users to be contacted immediately by phone/pager when they receive specific voice-mails, e-mails, faxes, or pages Internet Applications - users can access and internet via an internet server 64 and obtain public information such as weather, travel, financial, competitive data and news
Intranet Applications - users can remotely access information contained on a corporate network (inside the company firewall) using the VA, for example, customer data, shipping and inventory information, sales reports, and financial data, or any information on a database server 66, including SQL databases such as Oracle or Informix.
Customer Relationship Management applications - the VA Server integrates with commercially available customer relationship management (CRM) software applications 70, such as Siebel, Pivotal, Sales Logix and Onyx.
VA Studio
As seen in Figure 3, the VA Studio 80 is comprised of a grammar generator 82 and a publishing toolkit 84. The VA Studio allows a user to create, modify and debug applications that run on the VA Server 40 without requiring the user to be skilled in the complexities ofthe underlying components ofthe VA Server, such as the speech recognition engine, text to speech engine, switch control and unified messaging.
VA Studio employs a graphical user interface (GUI) application that runs on a Windows NT workstation. It allows developers to create projects, each of which defines a VA application. VA Studio is a multiple document interface (MDI) application that follows the workspace-based model.
The VA Studio follows the Microsoft Component Object Model (COM). VA applications are developed using Active Scripting languages such as VBScript and JavaScript, thus enabling integration with a variety of third party components. The VA applications created with the VA studio will include voice query to SQL databases, message stores, business logic and mainframe applications.
VA applications are composed of discourses and resources. Discourses are the context of conversations between a user and the VA. Resources are items like voice prompts and dictionaries. A developer can utilize the VA Studio Wizard to generate a "skeleton" VA application template. Application templates consist of packages of predefined discourses and resources. Discourses are the context of conversations between a user and the VA. Resources are items like voice prompts and dictionaries. Once a VA application template is generated, the application is further customized using any supported Active Scripting languages.
After writing the VA application, it is then submitted to the build process. During the build process, VA Studio checks for dialog errors, builds a master intermediate grammar and builds a master lexicon. Once compiled and error-free the application is ready to be published.
When an application is published, it is transported from the VA Studio to the VA Server. The VA Server allows a scripted application to access services such as voice mail, databases, and telephony equipment.
A VA application is created, modified, debugged and tested using the VA Studio. The completed application is then automatically installed and configured to run on the VA Server, which enables the VA application to take incoming calls and provide access to both public and private information.
Platform Overview
An Introduction to Virtual Assistant Applications
A VA application allows a user to manage electronic communications and access his or her business's computer resources through a telephone. Using speech recognition and text-to-speech technology, the VA communicates with callers in spoken English. By calling into the VA on a standard telephone, a user can perform functions such as the following:
• Sending and receiving voice mail messages
• Checking, replying to, and forwarding email messages
• Looking up phone numbers and addresses in an electronic address book • Accessing information in a company database
• Accessing information on the World Wide Web
In addition, the VA can perform many ofthe functions of a personal secretary, such as the following:
• Informing the user via pager when new voice and email messages arrive • Filtering incoming voice mail, email, and pages as instructed by the user • Automatically dialing phone numbers
In the preferred embodiment, the VA performs the above functions by interfacing with a company's Microsoft Exchange server. This application, in effect, allows users to use their desktop Outlook software over the telephone. The VA software includes a development platform (the SCE) and run-time platform (the SDE), which can host a variety of different VA's. The SDE provides the core components necessary for the functionality of a VA: a telephony interface, speech recognition facilities, a text-to-speech engine, interfaces with databases and mail servers, and an administrative framework in which the assistant applications will run. The SCE also includes development tools that programmers can use to create custom VA applications.
VA Platform Components
As discussed above, the VA Platform consists of three main components: • The Service Deployment Environment (SDE)
• Virtual Assistant Applications
• The Service Creation Environment (SCE)
The function of each of these components can be understood using a World Wide Web analogy. The SDE functions like a web server, providing connections with the network and telephone system, controlling the execution of VA applications, and providing resources such as text-to-speech and voice recognition engines that will be accessed by the applications that run on it.
The VA applications are analogous to web pages, determining the content that will be presented and controlling the interactions with the user. A VA application uses scripting languages such as VBScript, JavaScript, and Perl, so that developers can add significant functionality to a VA, such as performing mathematical calculations, processing text, and calling ActiveX and COM objects.
Just as Microsoft Front Page and Netscape Composer are used to create web pages, the SCE is the development environment used to create the VA applications. The main component ofthe SCE is the VA Studio application, which is based on the
' Microsoft Visual Studio paradigm and provides a graphical environment with a variety of tools that can be used to create, debug, and publish applications that are run on the SDE. The SCE also includes a set of COM objects that can be used in applications to perform functions such as checking email, reading from a database, and manipulating sound files.
The SDE Service Processes
The Service Deployment Environment consists of eight processes that run simultaneously and perform the functions necessary to support a VA application. In the preferred embodiment, each of these SDE components runs as a Windows NT Service or background process.
Although they may all run on the same hardware platform, for large VA implementations the components can be distributed across several servers and communicate over the network. Such distribution can allow, for example, one server to be dedicated to performing voice recognition functions while another supports the VA Engine that actually runs the applications. When multiple VA components are distributed across multiple machines, these machines are collectively termed a VA server set. The VA Engine
As illustrated in Figure 4, the VA Engine 100 is the virtual machine on which a VA application 102 runs. Based on the application's instructions, the VA Engine uses its telephony interface 104 to communicate with the user 106 and its speech interface 110 to recognize speech into text and translate text into speech. The VA Engine connects to an Active Scripting Engine 112 to execute the scripts contained in the VA application, and it also communicates with administrative processes such as the VA Server 114 and VA Manager 116.
A VA Engine process can support user interaction over only one telephone line, but multiple VA Engines can be run simultaneously on a single platform. If the VA platform is connected to more than one telephone line, then a separate VA Engine will be running for each incoming line.
The Text-to-Speech (TTS) Server The Text-to-Speech Server receives text from other components, translates it into speech (that is, into a sound file), and returns it to the requesting component. This speech translation service is isolated in a separate component to improve performance and to allow for TTS vendor-independence. The preferred embodiment uses the Acu Voice TTS system, but the platform can be easily modified to support a
TTS engine from a different vendor. Only the TTS Server component would have to be modified for such a customization, not the entire platform.
Multiple VA Engines can use the same TTS Server process, and more than one TTS Server can be running at the same site, allowing translation services to be distributed across multiple machines for load-balancing.
The Recognition Server
The Recognition Server 122 receives sound files from other components, attempts to recognize them as speech, and returns the recognized text. Like the TTS server, the Recognition Server is a component that isolates speech-recognition functions from the rest ofthe VA platform. The server provides an interface to a third-party voice recognition engine (in the preferred embodiment, Nuance) that can be changed to a different vendor's brand without requiring the entire VA platform to be modified. Multiple VA Engines can use the same Recognition Server process, and more than one Recognition Server can be running simultaneously.
Recognition Server Sub-Processes
The Recognition Server process requires three additional processes to be running:
The Resource Manager: The Resource Manager is a management process that automatically load-balances requests when more than one instance ofthe Recognition Server is running. Rather than making recognition requests to a particular Recognition Server, the VA Engine makes the request to the Resource Manager, which forwards it to the first available Recognition Server.
The Compilation Server: The Compilation Server compiles dynamic grammars. The License Manager: The License Manager server runs continually in the background and dispenses- licenses to all requesting components. Only one license manager need run in a single server set, but no Recognition Server components can launch unless the license manager is already running. In the preferred embodiment, all ofthe sub-processes ofthe Recognition
Server are recommended only for Nuance brand Recognition Servers. If a user uses different speech recognition software, a different set of processes may be needed.
The VA Server • The VA Server 114 performs persistent VA functions that occur even when no user is connected to a VA application. These functions include the following:
• Monitoring external sources such as email boxes, databases, and web sites for events (e.g. a new mail message arrives or a database field is updated) • Applying rules and filters to external source events to determine whether the VA system should take any actions
• Paging users when specified events occur
Only one VA Server can run on a system, but a single VA Server can provide persistent services to multiple VA Engines running both locally and on remote systems.
The VA Manager
As illustrated in Figure 5, each system that is running one or more VA components should also be running the VA Manager application 116. This application creates and monitors all VA components that are active on the system, and it provides management interfaces that are used for the following purposes:
• Configuration (both at start-up and during runtime)
• Signaling of events such as errors and informational messages
• Logging of events • Logging of each call received by the VA applications running on the system
• Performance monitoring (through an interface to Window NT's Perfmon utility) The VA Manager provides the interface through which the VA Server Manager 130 communicates with all systems in use at the site.
The VA Server Manager
The VA Server Manager 130 provides a single point of control for all ofthe processes and servers being used in a VA server set. It communicates with the VA Manager 116 running on each VA server in the set and, through this interface, allows an administrator to use a single system to manage the entire site. There are two ways an administrator can connect with the VA Server Manager application:
• Using the Microsoft Management Console (MMC): As illustrated in Figure 6, the VA software includes an MMC snap-in component 140 that allows the VA Server Manager services (and, thereby, the entire VA site) to be managed from the Microsoft Management Console application.
• Using an Administrative Web Page: The VA software also includes an administrative web page 142 that uses Active Server Pages to interface with the VA Server Manager service, allowing an administrator to manage the site through a standard web browser. Returning to Figure 5, the VA Server Manager 130 monitors all ofthe VA components (such as Recognition Servers 132, -TTS Servers 134, and VA Engines 136) running on all the systems within the server set, and it can be configured to page the system administrator with an alert if components fail or other system-critical events occur.
Additional VA Platform Components
In addition to the service processes, the following components are used on the VA platform.
The VA Database
The VA Server Manager process uses a Microsoft MSDE database to store configuration parameters and platform logs. The MSDE database engine is installed automatically as part ofthe VA platform install, and the required tables and initial data are created during the installation routine.
The VA Server Manager uses a COM object called DBManager to communicate with the database. This object is created automatically at start-up by the
VA Server Manager and provides a set of application programming interfaces (API's) that other VA components can use to retrieve configuration information and log data.
In addition, the DBManager object automatically handles version checking, database restoration, and other database management functions.
The VA Web Server
In the preferred embodiment, as illustrated in Figure 7, the VA platform uses a
Microsoft IIS Web Server to support browser-based administrative utilities. For example, the VA Logging Tool is used by the administrator to view and manage system logs.
VA Shared Directories
The VA software uses a set of shared directories for storing files necessary for platform operations. In a multi-server implementation, these shares are stored on a central server (the same server that hosts the VA Server Manager process) and can be accessed by all the systems in the server set. The shared directories used by the VA platform are described in the table below. Table 1-1 : VA Platform Shared Directories
Directory Description
%cσ?«'tαvα%\VAApplication Used to store the source files for the applications s* that will run on the platform
%conitava%\V Logs Used to store application logs %coτ-tt vα%\VAUsers Used to store information about VA users %conitava%WT AUiterances Used to store temporary sound files containing the commands spoken by VA users * %conitava% represents the base path under which the VA platform software was installed. By default, this path is c:\Program Files\Conita Virtual Assistant. VA Platform Configurations
The service processes that make up the VA platform either can be run on a single server (a VA platform server) or can be distributed across multiple servers (a VA platform server set). A single-server implementation is adequate for small companies that need to support only a few incoming VA calls at a time. For larger companies, however, a server set implementation will be necessary for load balancing. As illustrated in Figure 8, when the VA platform is distributed across multiple servers, one node in the server set is designated the Server Set Controller Node 150. As the platform's primary server, the Server Set Controller Node will host the following components:
• The VA Server Manager service 152
• The VA DBManager service 154 and the VA database 156
• The IIS web-server • Shared directories that will be used by all the servers in the server set to store logs, utterance files, application files, and user information Each secondary node 160 in the set will host one or more instances of VA Engines 162, TTS Servers 164, and/or Recognition Servers 166. These processes will be monitored by a VA Manager process 170 on each server, which will in turn communicate with the VA Server Manager 172 on the Server Set Controller Node
150. In single-server implementations, the'lone server is configured as the controller node, hosting the database, web-server, and VA Server Manager process along with all other VA services.
Scaling a VA Implementation
The way a business configures its VA platform will depend on the number of users who will be interacting with the VA application. As illustrated in Figure 9, for smaller sites, all the VA components can be run on a single server 180. Such a site could support several incoming telephone lines 182, allowing up to multiple instances ofthe VA application to be running simultaneously.
For larger sites that need to support many simultaneous VA application sessions, the VA components can be distributed across multiple systems. As illustrated in Figure 10, a medium-sized company may, for instance, use a six-server rack 184, with two ofthe servers running VA Engines 186a, 186b, two servers running Recognition Servers 190a, 190b, one running VA Servers 192, and one running TTS Servers 194.
A large organization may require even more scalability. As illustrated in Figure 11 , to support a public switch 196 with 32 incoming TI lines 200, the site may use upwards of eight systems for VA Engines 202, sixteen for Recognition Servers 204a, 204b, four for VA Servers 206, and four for TTS Servers 206.
Duties ofthe VA Administrator
The duties ofthe VA Virtual Assistant platform administrator include the following tasks:
• Preparing the server(s) for installation ofthe VA platform software • Installing the VA software on the systems
• Ensuring that the application software can communicate with the telephone system and other hosts such as a Microsoft Exchange server
• Configuring Microsoft Exchange to support VA users
• Using the VA management interfaces to manage the systems in the server set, start and stop the VA services, and run VA applications
• Monitoring the platform interfaces and error logs
• Maintaining the VA database
• Managing VA user accounts
In order to perform the above duties, a VA administrator needs to have experience with the following software packages:
• Windows NT
• Microsoft Exchange Server
• Microsoft Internet Information Server (IIS)
• Microsoft MSDE or SQL Server databases
Fundamentals of a VA Application A Virtual Assistant ("VA") application is a set of scripts and resources that leads callers through an interaction with a virtual assistant. The application provides the functions available to the user from the beginning of a call until its termination. A single VA application may, for example, allow the caller to check email messages, check voice mail messages, and look up a phone number in a computerized address book.
Components of a VA Application
•Discourses Each VA application is made up of smaller units called Discourses, which is illustrated in Figure 12. Each of these units can be thought of as a single interaction or conversation between the user and the Virtual Assistant. This interaction may be as simple as the assistant's asking a question ("Are you sure you want to quit?") and the user's giving a response ("yes" or "no"). Or, the interaction may be more complicated, such as an email-checking discourse in which the assistant responds to several different types of user commands (e.g. "read next message" and "delete last message").
Each discourse can be linked to other discourses, which will become activated when a user gives a particular command or a system event occurs. As illustrated in Figure 12, a "Main Menu" discourse 210, for example, may provide the user with a list of available options. When the user chooses the "Check email" option, the application will move from the "Main Menu" discourse to the "Check Email" discourse 212.
•Grammars
Each discourse is associated with a Grammar component, which is a list of the words and phrases that need to be recognized by that discourse. Depending on the functions performed by the discourse, its grammar may be simple or complex.
Returning to Figure 12, for the "Confirm Exit" discourse 214, the grammar needs to contain only two words: "Yes" and "No." The "Check Email" discourse 212, on the other hand, needs more words and phrases in its grammar to recognize all the verbal commands that would be used to read, reply to, and delete email.
The phrases included in a grammar are linked to actions that will be taken by the discourse when the user speaks a particular phrase. If the virtual assistant recognizes the user's saying, "Read next message," in the "Check Email" discourse, it would read the next email message to the user by translating the email text to speech.
The syntactical requirements for a grammar will be explained more below. Figure 13, however, shows the basic relationship between discourses 220 and grammars 222 within a VA application. The arrow 224 points from the grammar to the discourse (instead ofthe other way around) because the grammar is the driving force behind the virtual assistant's interaction with the user. When the user speaks into the phone, the virtual assistant engine attempts to recognize the phrase and looks it up in the grammar for the current discourse. If the phrase is found in the grammar, the engine calls the appropriate piece of code within the discourse to handle the required action. When a user says "read next message," for example, the virtual assistant engine recognizes the phrase, looks it up in the grammar, and finds the trigger
<command= "Read next">. The engine then calls the code matching the trigger in the discourse and executes it. In this case, the code for the trigger <command= "Read next"> would convert the text ofthe email message to a voice string and play it to the user over the phone or other device the user is using to interact with the virtual assistant, such as a personal digital assistant or any other device into which a user can input, and from which a user can receive, information. •Events
In addition to responding to the user's voice commands, a discourse can also perform predetermined actions when particular events occur. These events maybe related to the interaction with the user, such as a speech-recognition error
(RecognitionError), or they may be system events unrelated to the caller's activity, such as the launching ofthe virtual assistant application on the server (ApplicationEntry).
A virtual assistant application developer can use these events in a number of different ways. Each time a speech-recognition error event occurs, for example, a piece of code in the discourse can be triggered that increments a running count ofthe number the errors that have occurred in a particular session or over a predetermined amount of time. If this count exceeds a preset limit (suggesting, for example, that the phone connection is poor), the virtual assistant can terminate the call. Similarly, the developer can write a piece of code that is trigged by the ApplicationEntry event and loads whatever initialization data is necessary to prepare the virtual assistant application for incoming calls.
•Resources
A resource allows a discourse or event handler to reference components contained externally. These components include Phrase List and Script Modules, which a discourse uses to import pre-defined grammar entries and scripts, and Prompt Groups, which a discourse uses to incorporate voice output defined externally. All
Resources are defined globally, so once a resource is defined it can be use from within any discourse in the application. With resources, a developer can both increase the speed of application development (by reusing grammar and code components) and improve the sophistication ofthe interface, such as a voice interface, by using recorded and dynamic prompts.
When creating an application, for example, a developer might write a piece of code that sends an email message though the user's SMTP (simple mail transport protocol) account. Rather than rewriting this code in each discourse that needs to send an email message, the developer can place the code in a script module resource and reference that resource in each discourse that needs it. The script module resource can also be saved and used later in other VA applications.
Prompt group resources, similarly, allow a developer to define a particular piece of output and use it repeatedly throughout the application. A developer can, for example, record a live person speaking the phrase "I did not understand you" and play the sound clip whenever an unrecognized command is spoken by a caller.
Prompt Groups can also be used to create dynamic output that selects from and combines various voice segments. When informing a caller that a person is not available, a dynamic prompt can change the output depending on the person's gender (e.g. "He is not available" or "She is not available"). To make output less repetitive, a developer can record several different versions ofthe same message and let the system choose among them. In this way, instead of always saying "I did not understand you" when a user speaks an unrecognized command, the application can say "I did not understand you", "I do not know that command", or "That is not one ofthe commands I recognize."
Virtual Assistaixt Definition Files The discourses, grammars, event handlers, and other components that constitute a virtual assistant application are saved in a Virtual Assistant Definition (.vad) file, which is published to the VASiteManager and instructs the platform how to execute the application. To use the previously mentioned web analogy, the .vad file is to the virtual assistant platform as an HTML or Active Server Page (.asp) is to a web server.
The .vad file for a complex VA application may be hundreds of pages in length. A file for a very simple "hello world" application would be only a few lines of code. A developer could, in theory, write a virtual assistant application by creating a .vad file in a text editor and copying it to the virtual assistant server platform. Alternatively, the Service Creation Environment (SCE) simplifies the application- writing process. The SCE's VAStudio application provides a GUI interface (similar to Microsoft's Visual Studio) in which a developer visually lays out the various application components. Using VAStudio tools, the developer can then compile the application, build a .vad file, and publish it to the VA platform.
Creating a Basic VA Application
The general steps used for creating the basic VA application are as follow: 1. Add the first discourse to the application 2. Add handlers for discourse events such as Discourse Entry and
Recognition Error
3. Add a new topic to the discourse
4. Create a grammar for the discourse
5. Create tasks that are executed when particular phrases are spoken 6. Compile and build the application
7. Publish the application to the VA platform and run it Each of these steps in sequence will be discussed below, illustrating them with a sample VirtualOperator application. The functionality ofthe VirtualOperator is limited: it merely answers the phone and transfers callers to the department they request. It is, however, a complete application that can be compiled, published, and executed. More complex programming techniques will be discussed later.
' 5
Creating the Application
To start a new application, open the File menu and select New. The New Project dialogue will open. The fields in this dialogue are completed as follows:
• Name: The name ofthe application to be created. The example 10 application will be called VirtualOperator.
• Directory: The complete path ofthe directory in which the project files are to be stored. The entire path can be entered in the text field or the Browse button can be clicked and the path selected from a standard Windows file dialogue.
15 After the two fields are completed, the new application project will be created, and, as shown in Figure 14, the main application window 230 will be displayed in the workspace.
Creating Discourses 0 The first step when writing a VA application is to create the first discourse.
This discourse will automatically execute when a new call is received by the VA. •Adding a New Discourse To add a new discourse to an application, use the following procedure:
1. Double click on the application name in the left-hand tree view to 5 show the main screen.
2. Right click in the Discourses pane 232. A drop down menu will open.
3. From drop down menu select Insert New Discourse. A dialogue will open requesting the discourse's name
4. Enter the name for the new discourse enter the dialogue and click OK. 0 The new discourse will be added to the list in the Discourses pane.
The name of a new discourse should begin with a capital letter and cannot contain white space. For the Virtual Operator application, a start-up discourse that will guide the caller through the main menu options is needed. To create it, the following steps are performed: 1. Right click in the Discourses pane. (This pane should be empty) A drop down menu will open.
2. From drop down menu select Insert New Discourse. A dialogue will open requesting the discourse's name.
3. Enter "MainMenu" for the name ofthe new discourse and click OK. A discourse named "Main Menu" will be added to the list in the
Discourses pane,
4. When first created, the "Main Menu" discourse will not contain any functionality. The next step is to edit the discourse and add the logic needed to greet the caller. ^Editing a Discourse
•Opening the Discourse The edit a discourse, a developer can perform any ofthe following actions:
• Return to the -main workspace window and double click the name of the desired discourse in the Discourse pane. • Double click on the discourse's name in the application tree
• Right-click on the discourse's name in the Discourse pane ofthe main window and select Open Discourse from the drop-down menu.
When performing any of these three actions, the Discourse View window will appear in the main workspace. The Discourse View window consists of two panes: Topics 234 and Event Handlers 236. Listed in these panes will be the topics and event handlers that have already been added to the discourse being edited. Both of these panes will be empty for a new discourse.
To edit the newly created Main Menu discourse, double click on "Virtual Operator" in the application tree to open the main window in the workspace. Then, double click on "Main Menu" in the Discourse pane. The main window in the workspace will be replaced with the Discourse View window. The discourse is now opened for editing. Since the "Main Menu" discourse was just created, both its Topics and Event Handlers panes are empty. At this point, either a new topic or a new event handler can be added to the discourse.
To add a discourse to play a message that greets the user when the Main Menu discourse is executed, add a DiscourseEntry event handler that plays the greeting.
•Adding a New Event Handler The general process for adding a new event handler to a discourse is as follows:
1. From the Discourse View window, right click inside the Event Handler pane 236. A drop down menu will open.
2. Select Insert New Event Handler from the drop down menu. A dialogue box will appear containing a list of available event handlers for the discourse.
3. In the dialogue box, select the type of event handler to be inserted. The new event handler will be added to the list in the Event Handler pane.
To add a DiscourseEntry event handler to the Virtual Operator's Main Menu discourse, the following steps are performed:
1. From the Discourse View window, right click inside the Event Handler pane. A drop down menu will open. 2. Select Insert New Event Handler from the drop down menu. A dialogue box will open containing a list of available event handlers for the discourse. 3. In the dialogue box, select DiscourseEntry. "DiscourseEntry" will be added to the list in the Event Handler pane At this point, a DiscourseEntry event handler has been created, but it does not do anything. The next step is to edit the event handler so that it plays a greeting message when the DiscourseEntry event occurs. •Editing an Event Handler To edit an event handler event handler's name is selected in the Event Handler pane 236 ofthe Discourse View. The Event Handler View window will be loaded into the workspace.
To edit the DiscourseEntry event handler for the "Main Menu" discourse, double click on "Discourse Entry" in the Event Handler pane. The Event Handler View window will open, but it will be empty because we have not yet added any code. For the initial version of Virtual Operator, the greeting can be a simple one. In the empty Event Handler View window, type the following code: vavm.TTSString "Welcome to XYZ Incorporated. Please say the name of the department you wish to contact. To speak to an operator, say operator. "
This code uses the vavm object's TTSString API to read a text string over the phone to the caller. When a new call is received by the Virtual Operator application, the Main Menu discourse will execute. This fires the DiscourseEntry event for Main
Menu, which will cause the system to say to the caller, "Welcome to XYZ
Incorporated."
•Adding a New Topic At this point, the VA application will greet the user when a call is received, but it has no mechanism for responding to the user's spoken command. This type of functionality is performed within a discourse's topics. The next step in creating a
Virtual Operator application is to add a topic that listens for main menu commands.
To add a new topic to a discourse, perfonn the following steps: 1. From the Discourse View window, right click inside the Topic pane.
A drop down menu will open.
2. Select Insert New Topic from the drop down menu. A dialogue box will appear, prompting for the name ofthe new topic
3. In the dialogue box, enter the name for the topic and click OK. The new topic will be added to the list in the Topic pane.
Note: Topic names should begin with a capital letter and cannot contain white space.
Developers have flexibility in choosing a topic's name, but as a general rule the name should be descriptive enough to indicate the topic's function. For example, if the topic being created will listen for user responses to the main menu, it could be named, "MenuResponses".
To create the MenuResponses topic, the following steps are performed: 1. If the main window is not in view, double click on the Virtual Operator Node in the left-hand tree view.
2. Select the Main Menu discourse by double clicking on its name in the Discourse pane ofthe main window. This will open the Discourse View window for Main Menu.
3. From the Discourse View window, right click inside the Topics pane. A drop down menu will open.
4. Select Insert New Topic from the drop down menu. A dialogue box will appear, prompting for the name ofthe new topic 5. In the dialogue box, enter "MenuResponses". The new topic will be added to the list in the Topic pane. As with Discourses and Event Handlers, when a new topic is created it contains no functionality. The MenuResponses topics should be edited to make the topic listen for and respond to user commands. •Editing a Topic
To edit a topic, open the Topic View window for the topic to be modified by double clicking on the topic's name in the Topics pane ofthe Discourse View window. (Alternately, right click on the topic's name and select Open Topic from the drop down menu or double click on the topic's name in the left-hand application tree.) To edit the new MenuResponses topic, double click on the "Menu Responses" in the Topics pane ofthe Discourse view for Main Menu. The Topic View window 240, which is illustrated in Figure 15, will open. This Topic View window contains the following three panes:
• Phrase List: The list of spoken phrases that will be recognized by this topic
• Tasks: The actions that will be taken based on the phrase(s) spoken by the user
• Event Handlers: The actions that will be taken when particular events occur, such as user inactivity and recognition errors. •Phrase Lists
Before a topic can respond to user commands, a list of phrases that will be listened for from the user should be designated. The basic format for an entry in the phrase list is as follows: phrase <Tag=Value> .
Phrase indicates the phrase to be listened for. When this phrase is heard, the
Recognition object's Tag will be set to the specified Value. Note: Each entry in a phrase list should end with a period.
The Recognition (or rec) object is the mechanism by which the speech recognition engine communicates with the VA application. When the user speaks into the phone, or other input device, the sound is sent to the speech recognition engine, which attempts to "recognize" it— that is, to resolve it into a text string. The VA virtual machine then compares the recognized string to the entries in the phrase list. If the string is found in the list, the virtual machine sets a tag in the rec object to the value specified in the phrase list and executes the appropriate task for the phrase.
(Tasks will be explained in more detail below.)
When creating a phrase list for a topic, the phrases the topic needs to listen for, and how to assign the rec object's tags when each phrase is heard, should be defined.
In the DiscourseEntry event for the Main Menu discourse, a greeting message was defined that tells the user what the possible options are: he or she can say the name of a department and be transferred to it or can say "operator" to speak to an operator. The phase list for the MenuResponses topic, then, should contain entries for each ofthe departments at XYZ Corporation as well as the word "operator." The basic phrase list for doing so would appear as follows: operator <Transf er= "operator" > . sales <Transfer=" sales " > . accounting <Transf er="accnt " > . The above list specifies that when the virtual machine hears the user say
"operator," a rec object tag named Transfer will be set to the value "operator".
' The rec object does not have pre-defined tag names that a developer should use; a tag can be created with any name, provided that it is a single word beginning with a capital letter. In the phrase list above, for example, a developer could have used the tag "Action", "Trnsfr", or even "Blue" instead of "Transfer". Similarly, the values assigned to the tags can be any string the developer desires. Though two ofthe tag values in the above list match the phrase being listened for, the last one ("accnt") has been abbreviated. Tags and values are simply flags that are set by a developer for later use in the Tasks fields of his or her application. •Phrase List Operators In the sample phrase list above, all the phrases are single words. List entries, though, can contain multiple words joined by operators. The operators that can be used within a phrase list entry are listed in the table below:
Operator Description
(expl exp2) expl and exp2 (expl I expl) expl or exp2 lexpl exp2 (expl and exp2) or exp2
For example, some ofthe departments at XYZ Corporation may have two word names. For these departments to be recognized in the phrase list, use the and operator to group their names together. To add "human resources" to the list, for example, use the following syntax:
(human resources) <Transfer="hr"> .
Not all callers, of course, will call a particular department by the same name. Some callers may, for example, use the term "personnel" for the human resources department while others will say "human resources". For flexibility, use the or operator (a pipe '|' symbol):
((human resources) j personnel) <Transfer="hr"> .
This entry specifies that if the caller says either "human resources" or "personnel", the rec object's Transfer tag will be set to "hr". Similar to the or operator is the optional operator ('?'). This operator specifies that the word it precedes is not required. Suppose, for example, that some callers requested the sales department by saying "sales" while others used the term "product sales". In the phrase list, ad developer could have the engine recognize either version by using a combination of and and or operators: ((product sales) | sales) <Transfer="sales"> .
It should be noted that it would be more efficient to use the '?' operator to indicate that the word "product" is optional: (Pproduct sales) <Transfer="sales"> .
•Creating a Phrase List The Phrase List pane 242 on the Topic View window is a text editor that allows a developer to enter and edit the various phrases to be recognized by the topic. To enter a new phrase, simply click inside the pane to activate the cursor, position the cursor to the end ofthe list, and type the new phrase list entry. A developer can also edit existing phrases, as with any text editor. It should be noted that the phrase list pane supports standard Windows text editor functions such as cut, paste, and undo. For example, for the Virtual Operator application, the list of phrases that will be recognized by the MenuResponses topic needs to be defined. Click inside the
Phrase List pane in the Topic View window and type the following entries: operator <Transfer="operator"> . (Pproduct sales) <Transfer=" sales "> . accounting <Transfer="accnt "> . ((human resources) | personnel) <Transfer="hr"> .
These lists define the three departments that can be selected by the caller, as well as the operator who can be contacted in case assistance is needed.
Now that the MenuResponses topic has its list of recognized phrases defined, the application needs to be configured to take the proper actions when the phrases are spoken by the caller. This functionality is implemented in bits of script code called tasks.
•Using Tasks
Tasks are the basic building block for the actions that will be taken by a VA application when a particular command is given by the user. Each task has two parts: • The trigger: The trigger is the event that causes the task to be performed.
For most tasks, this trigger will be the speaking of a particular phrase by the caller. • The action: An action is a set of instructions that the topic performs when a particular trigger is activated. These instructions are written in standard Windows scripting languages such as VBScript or JavaScript. (For the examples in this chapter, VBScript will be used.) •Defining a Task A new task is defined for a topic, perform the following steps:
1. Open the Topic View window by double clicking on the desired topic name in the Discourse View or the left-hand application tree. The Topic View window will be displayed in the workspace.
2. Right click inside the Tasks pane ofthe Topic View window. A dropdown menu will open.
3. Select Insert New Task from the drop-down menu. A dialogue box will open, prompting for the name ofthe new task. 4. Enter the name for the task in the dialogue and click OK. The task will be added to the list in the Tasks pane. It should be noted that the name of a task should begin with a capital letter and cannot contain whitespace.
5. A new task will be empty of functionality when first created. To edit the task, double click on its name in the Tasks pane. As shown in Figure 15, the Task View window 250 will be opened in the workspace.
The Task View window contains two panes, Trigger 252 and Action 254.
Both these panes act as text editors, allowing a developer to type the free-form text that will be the trigger and the Action for the task.
•Defining the Trigger The first step when creating a new task is to define its trigger. The trigger definition has the following syntax: rec . Tag=" value"
This definition causes the virtual assistant to execute the task whenever the rec object's Tag tag is equal to the specified value. For example, the first task needed to implement for the Virtual Operator's
MenuResponses topic is to connect the caller to the operator when he or she says
"operator". This task can be created using the procedure outlined above, giving it the name "Connect_To_Operator", then open the Task View window to edit the task.
Reconsidering the phrase list defined in the previous section, it can been seen that the rec object's Transfer tag is set to "operator" when the caller says "operator". The trigger for the task, then, would be defined as follows: rec . Transf er= "operator"
This line should be typed in the Trigger pane ofthe Task View window. The Connect_To_Operator task will execute whenever the rec object's Transfer tag is set to "operator" (that is, whenever the caller says the word "operator"). •Defining the Action
The action for a particular task is defined as a series of scripting-language commands. (For example, VBScript is used for the command scripts, but any standard Windows scripting language such as JavaScript could also be used.)
The action the application is to perform in the Connect_To_Oρerator task is straightforward: transfer the user's call to the operator's extension. This function is performed by calling the vavm object's TransferfJ API. The API call has the following syntax:
TransferCall PhoneNumber
Where PhoneNumber is the string containing the number to which the call should be transferred. For example, if the number ofthe XYZ Corporation switchboard is 555-
0000, in the Action pane ofthe Task View window, the following line will instruct the application to transfer the caller to the operator's line: vavm. TransferCall "555-0000"
When this command is executed, the virtual machine will transfer the call to the specified number. When this occurs, the user will no longer be connected to the VA application, so the application will reset and wait for the next call.
It may not be desirable to immediately transfer the caller when the command "operator" is spoken. Instead, it may be desirable to first play a message to let the caller know the application understood the command and that the caller is about to be transferred. This is accomplished by adding the following line before the
TransferCall command in the Action pane: vavm.TTSString "Please hold while I transfer you to the operator . "
Once this line is added, the Connect_To_Operator task will be complete. When the caller says the word "operator" at the main menu, he or she will hear "Please hold while I transfer you to operator." The application will then transfer the call to the switchboard, ending the call.
•Defining the Remaining Tasks Now that the action has been defined for the "operator" command, new tasks can be added to handle the other menu functions (i.e. transferring the caller to a specified department). This is done by performing the above described, but using the rec tags as triggers for the task as defined in the phrase list.
For example, to create a new task named "Connect_To_HR " to transfer the call to the human resources department, the task's fields would appear as follows: trigger: rec . Transf er= "hr" action:vavm . TTSString " Please hold while I connect you with the human resources department " vavm . ransfer " 555- 0010 " •How Tasks Are Executed When the VA application is executing a discourse, it performs the actions specified for initialization events (such as DiscourseEntry), then waits for the user to speak. When speech is heard, the virtual machine will send it to the speech recognition engine for recognition. The virtual machine then takes the text string received back from the recognition and scans the phrase list, looking for a match. If a match is found, the virtual machine sets the rec object's tag as instructed by the phrase's entry in the phrase list. Then, it scans the topic's task collection, comparing the trigger for each task against the rec object. For each task found whose trigger condition evaluates to true, the virtual machine will execute the script contained in that task's action definition. > For example, when the Virtual Operator's Main Menu discourse is executed, the application will call the DiscourseEntry event and play the prompt, "Welcome to XYZ Incorporated. Please say the name ofthe department you wish to contact. To speak to an operator, say operator." The application will then wait for the user to speak and try to recognize the sound. If the phrase "operator" is heard, the virtual machine will set the rec object's Transfer tag to "operator" and scan the
MenuResponse's topic list, looking for triggers that match the rec object. In this case, the Connect_To_Operator task's trigger will evaluate to true, so the virtual machine will execute the action script for the task. This script will tell the caller, "Please hold while I transfer you to the operator," then will transfer the caller to the phone number "555-0000". Because the TransferCall API ends the user session, the application will then reset and await the next call.
Extending the Basic Application
Adding Additional Event Handlers
The virtual assistant application has been configured to answer an incoming call, listen for a spoken department, and transfer the caller. However, there is no mechanism to handle a caller's saying the name of a department that is not included in the phrase list or to take actions if the caller says nothing for an extended period. These situations are addressed using event handlers. •Handling RecognitionError Events A RecognitionError event occurs any time the user speaks a phrase that is not defined in a topic's phrase list and therefore is not understood by the speech recognition engine. Such an event would occur if, for example, the caller says a department name (such as "maintenance") that has not been defined in the phrase list for the MenuResponse topic. Similarly, if the caller mumbles a command or there is excessive background noise, the application may not recognize the speech and will fire a RecognitionError.
For the application to take an action when a recognition error occurs, a RecognitionError event handler should be inserted into the MainMenu discourse and a script should be added that will be executed when such an event occurs. This can be done by performing the following steps:
1. Open the Discourse View window for the MainMenu discourse.
2. Right click inside the discourse's Event Handlers pane. A drop-down menu will open.
3. Select Add Event Handler from the menu. A combo box will open displaying the available event handlers that can be added.
4. From the combo box, select RecognitionError. An empty Event Handler View window will appear in the workspace. 5. In the Event Handler View window, enter the VBScript code that should be executed -when a Recognition Error occurs. In the example, because the Virtual Operator application will be used by callers who may not be familiar with using VA applications, the handler for recognition errors should provide a helpful prompt explaining what the user is expected to do. In the
Event Handler View window, then, add the following code: vavm.TTSString "I'm sorry. I did not understand that response . Please say the name of the department you wish to contact. To speak to an operator, say operator."
When the user says a phrase that cannot be recognized, the application will play the above message and wait for the user's next command.
•Handling Inactivity Events
A second type of occurrence that a virtual assistant application should handle is Inactivity events. These events occur when there is no voice input from the caller for a specified amount of time (this period is variable and can be set by the application developer). For example, if the caller to the Virtual Operator says nothing for five seconds, the application may need to prompt the user for input. Such silence may indicate that the user is confused about the menu options and does not know what to say, so the prompt should give some direction about what to do next.
As with any discourse-level events, an Inactivity handler is defined by inserting an entry into the discourse's Event Handlers pane and the handler is then edited to add the VBScript to be executed when the event occurs.
To add an Inactivity event to the MenuResponses topic, the procedure outlined above for inserting the RecognitionError event should be followed,. but Inactivity should be selected as the event type, and the following code should be entered in the
Event Handler View: vavm.TTSString "I did not hear a response from you. Please say the name of the department you wish to contact. To speak to an operator, say operator. "
•Setting the Inactivity Timer Although the handler for Inactivity events has been added to the topic, an inactivity timer should be set. Otherwise, by default, an application has no inactivity timer set, so the application will wait forever for the user to speak. The interval a NA assistant will wait for a user response before firing an Inactivity event is determined by the component-level parameter "InactivityTimeout." A developer sets these parameters by calling using the vavm object's SetParametierO method. The syntax of this call is as follows: vavm. SetParameter ParameterName, Value
PαrαmeterNαme indicates the name ofthe parameter to be set, and Value indicates the value to which it should be set. To set the inactivity timer in an application to five seconds, for example, the following script line would be used: vavm. SetParameter "InactivityTimeout" , 5
Where this line is inserted will depend on how the application is to behave. Until the SetParameter command is executed, the inactivity timer for the entire application will remain at its default start-up value of undefined (that is, the application will never fire an Inactivity event). Once the parameter is set, the new value will remain in effect until changed by a subsequent SetParameter call.
For example, for the Virtual Operator application, it may be useful to set the inactivity timer at the start ofthe Main Menu discourse. This can be accomplished by inserting the SetParameter call into the DiscourseEntry event handler for the Main
Menu discourse. This is done by opening the Discourse View window for Main Menu and double clicking on DiscourseEntry in the Event Handlers pane. The Event Handler View window will open in the workspace and should contain the TTSString API call that is used to greet the caller. The text in the Event Handler View window should be edited so that it appears as follows: vavm. SetParameter "InactivityTimeout" , 5 vavm.TTSString "Welcome to XYZ Incorporated.
Please say the name of the department you wish to contact. To speak to an operator, say operator."
Any time the user is silent for more than five seconds, the Inactivity event will fire and the user will be prompted to say a department name. •Using the DiscourseStart Event Handler -
It can be observed that in the Main Menu discourse, several different events repeat basic menu instructions to the user, for example, "Please say the name ofthe department you wish to contact," or "To speak to an operator, say operator." The
DiscourseStart event can be used to streamline the prompts presented to the user.
The distinction between the DiscourseEntry and DiscourseStart events is an important one. DiscourseEntry occurs only once during the discourse's execution, but DiscourseStart occurs each time the discourse resets after processing user input. For example, if the user says a phrase that is not recognized in the phrase list, the code in the Inactivity event handler will execute. Then, the discourse will start over and wait for the next input from the user. At this point, the DiscourseStart event will occur again, but DiscourseEntry will not.
The reason for the differing behavior between the DiscourseEntry and DiscourseStart events is due to the defined flow of a discourse. This flow can be described as follows:
1. The discourse begins execution and processes the code in the DiscourseEntry event , 2. The DiscourseStart event occurs and its code is executed 3. The application waits for the user to speak. One ofthe three following things can happen:
• The user says nothing for the interval defined by the InactivityTimeout parameter, so an Inactivity event occurs. The application executes the code in the Inactivity event handler and resets to step #2 (DiscourseStart).
• The user says a phrase that is not recognized in the phrase list, so a RecognitionError event occurs. The application executes the code in the RecognitionError event handler and resets to step #2.
• The user says a phrase that is recognized in the phrase list. The rec object's tags are set as specified in the phrase list, and the appropriate task is executed. Unless the task performs an action to exit the discourse, the discourse will reset to step #2. For example, in the Virtual Operator application, any time the caller speaks a recognizable phrase he or she will be transferred to a different phone line, which ends the discourse. However, the word "help" could be added to the phrase list and, any time the word is recognized, execute a task that reads extended instructions to the user (through the TTSString API call.) When this task completes execution, the discourse would not end (since a command such as TransferCall that terminates the discourse was never issued) but would rather reset and fire a new DiscourseStart event.
Based on this behavior, the prompt structure in the Virtual Operator's Main Menu discourse can be simplified by defining a DiscourseStart event that prompts the user to name a department. The greeting "Welcome to XYZ corporation" would remain in the DiscourseEntry event handler, since the user should be greeted only when the call is first received, not each time the discourse resets. Because the DiscourseStart event now prompts the user each time a voice command is expected, the duplicated prompts in the RecognitionError and InactivityError event handlers can be removed. To make these changes, the event handlers in the Main Menu discourse should be edited as specified in the table below.
Event Handler Script
DiscourseEntry vavm • SetParameter
" InactivityTimeout " , 5 vavm.TTSString "Welcome to XYZ Corporation"
DiscourseStart vavm.TTSString "Please say the name of the department you wish to contact.
To speak to an operator, say- operator. "
RecognitionError vavm.TTSString "I'm sorry. I did not understand that response . "
Inactivity vavm.TTSString "I did not hear a response from you."
After the event handlers have been changed, the application should be compiled, published and tested to see how the prompts are read to the user. The prompts can then be modified and the application republished to determine how the flow ofthe discourse proceeds.
•Other Event Handlers
The event handlers described above are only a few ofthe set available to the application developer. Other handlers and a description of each are included in the table below:
Event Handler Description
ApplicationEntry Occurs when the application executes
ApplicationExit Occurs when the application terminates
Bargeln Occurs when a user interrupts the VA application while it is presenting output. The Bargeln feature can be turned on and off by setting the AllowBargeln component-level parameter.
CallStart Occurs when a new call is received by the VA application
CallTerminated Occurs when a call terminates (i.e. the user hangs up the phone)
DiscourseEntry Occurs when a discourse is activated. (This event fires only once during the execution of a discourse)
DiscourseStart Occurs each time a discourse resets following an event or the execution of a topic.
FatalError Occurs when the VA platform experiences a fatal error; after the code in the Fatal Error event executes, the application will terminate.
Inactivity Occurs no input is received from the user during a set period of time. (The period can be set using the InactivityTimer component-level parameter.)
RecognitionError Occurs when the user speaks a phrase that is not recognized by the voice recognition engine.
RecognitionResult Occurs when the user speaks a phrase that is recognized by the rec engine.
Scripting Error Occurs when an error occurs within the script for a task, event handler, or resource.
Adding Additional Discourses More complicated applications should have multiple discourses. For example, the next step in creating the VirtualOperator application may be to add a second discourse.
In creating discourses, as a general rule, each set of tasks that should have its own set of recognized phrases should be isolated in its own discourse. For example, the VirtualOperator will prompt a user to name a department and then transfer the call as specified. However, no provision has been made for employees who may be away from the office and wish to use the application for functions other than simply calling someone at the firm. An employee may, for example, wish to call the VirtualOperator and access his or her voice messages or email. This functionality needs to be isolated into separate discourses because it involves different prompts and commands than requesting a particular department.
For example, all callers—whether employees or customers—who dial into the VirtualOperator will be greeted with the prompt, "Please say the name ofthe department to which you wish to connect." When an employee calls in and wishes to access his or her messages, he or she simply has to say "It's me" or "personal" in the Main Menu, and will then be taken to a new discourse that will allow email and voice mail to be accessed.
•Selecting the New Discourse To add a task to the original MainMenu discourse that will transfer internal users when they identify themselves, the following steps should be performed:
1. Add the following entry to the phrase list for the MenuResponses topic:
2. ((It's me) | (personal)) <Transfer="internal">
3. Create a new tasks called "InternalCaller" and assign it the following trigger and action:
Trigger: rec.Transfer="internal"
Action: vavm.TTSString "Welcome back" vavm. SelectDiscourse "VerifylnternalCaller"
The vavm's SelectDiscourse API call transfers the flow of control from the current discourse to the one named in the parameter. By adding this new topic, the application will say "Welcome Back" and switch to the VerifylntemalCaller discourse whenever a user says "It's me" or "personal" at the main menu prompt.
The vavm's SelectDiscourse method, when called, instructs the virtual machine as to which discourse to switch to after the current discourse has completed its execution; it does not interrupt the execution ofthe current discourse. For this reason, the order ofthe commands in the InternalCaller topic's Action script could be reversed without changing the way the application executes: vavm. SelectDiscourse "VerifylntemalCaller" vavm.TTSString "Welcome back"
With this script, the caller would still hear the phrase "Welcome back" before the VerifylntemalCaller discourse began executing. •Creating the New Discourse
In the example, now that a mechanism is in place for transferring the user to the VerifylntemalCaller discourse, that discourse needs to be created by following the same procedure used to create the initial Main Menu discourse:
1. Double click on VirtualOperator in the left-hand tree view to show the main screen. . 2. Right click in the Discourses pane. A drop down menu will open. 3. From drop down menu select Insert New Discourse. A dialogue will open requesting the discourse's name 4. Enter "VerifylntemalCaller" for the discourse name and click OK. The new discourse will be added to the list in the Discourses pane. Since, for security reasons, it is not desirable to allow any user log in and access messages, the virtual assistant needs to verify that the caller is an employee ofthe company or is otherwise authorized to access messages. This can be accomplished by having the user enter a four-digit PIN number. The VerifylntemalCaller discourse can be used to prompt the user for the PIN number! look up the PIN in a database, and (if the PIN is verified) transfer the caller to yet another discourse that will allow him or her to perform the functions restricted to internal callers. Continuing with the example, to implement this functionality, the VerifylnteralCaller discourse should be opened and a new DiscourseStart event handler should be added that prompts the user to enter his or her PIN number. The prompt may appear as follows: vavm.TTSString "Please enter your personal identification number"
Now, when VerifylnteralCaller executes, it will prompt the user for a PIN and wait for a response.
•Using Typed Phrase List Variables
One way to handle the response to the PIN prompt would be to listen for a particular four-digit number and, when one is recognized, trigger a task that corresponds to the user who called in. This could be done by creating a new topic (named, for example, PINResponse) and adding the following entries to the topic's phrase list:
(one six nine three) <Caller="John Smith" >.
(two four seven eight) <Caller="Nancy Jones" >.
(nine two two six) <Caller="Tim Wilson" >.
There are limitations to this method, however, that are readily apparent. First, a separate task for each employee would have to be created. More importantly, this method hardcodes employee names and PIN numbers, so the VirtualOperator application would have to be recompiled any time a new employee joins the firm or an existing employee wishes to change his or her PIN. A better way to implement the verification discourse would be to use a typed phrase list variable. Such a variable allows the application to listen for a particular type of voice input rather than a predefined phrase. In the example, the virtual assistant would listen for any four-digit number rather than a particular number. The virtual assistant could verify that four-digit number on the fly by looking it up in a database. The format for a Digit type variable is as follows:
[Digit , Tag, Length=Z]
Tag indicates the name ofthe rec object tag that should be set to the recognized set of digits, and Vindicates the number of digits expected. To listen for a four-digit number and store it in a rec tag named "PIN," the following typed variable expression would be used: [Digit , PIN, Length=4 ]
The next step is to add this typed variable expression to a rule in the phrase list for the PINResponse topic then use this rale in a regular entry. Such a phrase list would appear as follows: NumberSpoken.
NumberSpoken := [Digit , PIN, Length=4] .
Based on these entries, when the user speaks his or her PIN number, the grammar will recognize a four-digit number, so the rec object's PIN tag will be set to the spoken number and the NumberSpoken rule will evaluate to true. It should be noted that he first entry with NumberSpoken by itself is needed because even though the typed variable is defined, the virtual machine will not scan the list looking for triggers. The isolated NumberSpoken entry will force the scan. •Responding to Type Variables Now that the phrase list recognizes a PIN number, a task is needed to look it up in the database and validate it. Unlike the other task triggers used so far in the
VirtualOperator, it is not known in advance what value the rec object's PIN tag will be set to when a PIN is spoken. For this reason, a task is defined with the following trigger: rec . PINo " " This task will execute anytime a four-digit number is spoken by the caller (that is, anytime the PIN tag is set to some value.)
Using Resources A resource allows topics and event handlers to reference components contained externally. These components include Phrase List and Script Modules, which a topic uses to import pre-defined grammar entries and scripts, and Prompt Groups, which topic and event handlers use to incorporate voice output defined externally. All resources are global, so once a resource is defined it can be use from within any discourse in the application. With resources, a developer can both increase the speed of application development (by reusing grammar and code components) and improve the sophistication ofthe voice interface (by using recorded and dynamic prompts).
Creating a Resource
The resources that are available in an application are listed in the "Resources" pane in the main application window. When a new application is created it will have no resources associated with it, so this pane will be empty. (Available resources can also be viewed in the left-hand application tree by clicking on the "ResourceView" table at the bottom ofthe tree pane.)
To add a resource to an application, right click inside the "Resources" pane. A drop-down menu will open providing the following choices:
Insert New Phrase List Module
Insert Phrase List Module from File
Insert New Script Module
Insert Script Module from File Insert New Prompt Group
Insert Prompt Group from File
From this menu, select the type of resource to add.
The three types of resources are available: phrase lists, script modules, and prompt groups. Prompt Groups
A Prompt Group is the output counterpart of a Grammar. Using prompt groups, the developer can define a set of phrases that will be output to the caller as needed during the execution of a VA application.
One type of Prompt Group resource is a recorded prompt that will be played to the user. Instead of using the Text-to-Speech (TTS) facility to translate the text
"Welcome to XYZ Corporation," for example, a developer could record a live person speaking the greeting and play the resource when an incoming call is received. Although the TTS facility is essential for any dynamic spoken output (such as reading the user's email over the telephone), recorded resources can be used to make the static output smoother and more sophisticated.
Prompt Groups also allow dynamic output that combines various static pieces of output depending on the identity ofthe caller and the circumstances. Such a resource would select pre-recorded excerpts and assemble them to say, for example, "he has ten messages", "she has six messages", and "he has one message".
Unlike Grammars, which can be linked to individual topics within discourses, a Prompt Group is associated only with the application as a whole. Once a Prompt Group has been defined, it can be used in any topic or event handler in the application.
•Creating a New Prompt Group
To create a new prompt group, the following steps are performed:
1. Right click inside the "Resources" pane in the main application window. A drop-down menu will open. 2. From the drop-down menu, select Insert New Prompt Group. A dialogue box asking for the prompt group's name will appear.
3. In the dialogue, enter the name for the new prompt group. The name of the prompt can be anything the developer chooses, but it should be descriptive ofthe prompt's purpose. A prompt group that is used to output the number of messages in the user's email inbox, for example, might be named "CheckJEmail."
4. Click the OK button. The new prompt group will be added to the list of available Resources on the main menu and to the tree of available Resources in the left-hand tree- view pane. The new default group will be created with a default set of properties, including
Scripting Language and Selection Mode. •Adding Prompts
When a group is first created, it does not contain any prompts. To add a prompt, perform the following steps: 1. In the Resources pane on the main menu, double click on the prompt group to which to add a prompt. The prompt group view window will be opened in the main workspace. (Because no prompts have been defined, this window will be empty.)
2. Right click inside the prompt group view window. A drop down menu will open.
3. From the drop-down menu, select Insert New Prompt. A dialogue box will be displayed to ask for the prompt name . 4. In the dialogue, enter the name for the new prompt and click OK. A developer can give the prompt whatever name he or she wishes, but the name should be sufficiently descriptive to identify the prompt's purpose. The new prompt will be added to the list of prompts in the prompt group and to the Resources tree in the left-hand tree-view.
5. To change the properties ofthe prompt, double click on its name in the Prompts list. The Prompt Properties dialogue box will open, which is illustrated in Figure 16.
6. Modify the prompt properties and click OK. •Prompt Properties
As illustrated in Figure 16, the Prompt Properties dialogue 259 is the primary interface for editing a prompt. The fields in this dialogue have the following meanings:
•Type 260 The type ofthe prompt can be either Simple or Expression:
• Simple prompts have only a Filename and/or Text string defined for their output; this output is static and cannot be changed during application execution. When the Simple type is selected for a prompt, the Expression and Properties text fields are grayed out on the dialogue. A prompt that says "Thank you for calling" is an example of a Simple prompt.
• Expression prompts allow for dynamic output. Rather than playing a predefined text string or sound file, Expression prompts allow the developer to specify an expression that will be evaluated when the prompt is called and determine what sound is output to the user. When the Expression type is selected, the Filename and Text fields are grayed out on the dialogue. A prompt that says, "You have Xnew messages" (where the number X is passed into the prompt by the application) is one example of an Expression prompt.
•Filename 262 The filename property contains the name of a sound file that will be played when the prompt is called. The property is used only for Simple prompts.
•Text 264 The text property contains a text string that will be converted to speech and played when the prompt is called. The property is used only for Simple prompts. A developer can define both a Filename and a Text property for a prompt. In such a case, the Text field will act as a back up. When the prompt is called, the system will attempt first to play the sound file indicated by Filename. If this file cannot be found or read, the system will revert to the Text field and play the specified string through the Text-to-Speech facility.
•Expression 266 The Expression property is used to set an expression that will be evaluated when the prompt is played. The property is valid only for prompts ofthe Expression type. The contents of this property are explained in more detail later in this section.
•Properties 268 Properties can be thought of as member variables or test conditions for a particular prompt. Their values are set by the application or resource that calls the prompt and are used to determine whether a particular prompt in the group will be selected for output. Setting and using Properties are explained in detail later in this section.
•Creating a Simple One-Prompt Resource
The most basic use of a resource is to play a single prompt when a particular event occurs. A developer, for example, could use a resource to play a greeting message when a caller first connects with the Virtual Assistant. The developer would create such as resource by using the following steps:
1. In the main SCE window, right click inside the Resource pane and select the Insert New Prompt Group option. 2. Enter the name "Greeting" for the prompt group.
3. Double click on the "Greeting" prompt group to open its view window.
4. Right click inside the main window and select Insert New Prompt from the drop-down menu. An input box will open.
5. Enter "Gl " for the prompt's name and click OK. The prompt (with default values) will be added to the prompt tree for the Greeting group.
6. In the Greeting group tree, double click on "Gl" to open the Prompt Properties dialogue and edit the prompt. 7. In the Prompt Properties dialogue, select Simple in the Type drop-down combo box.
8. In the Filename combo box, type the name ofthe sound file that contains the recorded greeting to be played. 9. In the Text combo box, type a back-up string (e.g. "Welcome to XYZ
Incorporated") that will be translated to speech in case the file is corrupted or cannot be found by the application. 10. Select OK to save the new settings.
Because the same greeting will be played to all callers, there is no need for the developer to assign additional prompts to the group or add Properties that will determine when each prompt is played. After the steps listed above are performed, the Resource is ready to be used in the application.
To use the resource, the developer would add the following code to the application (most likely in the DiscourseEntry event ofthe start-up discourse): vavm. Pla Prompt "Greeting"
•Creating a Basic Expression Prompt
Simple prompts that define a Filename and or Text string are static, producing the same output each time they are played. Expression prompts allow developers to insert macros and pieces of script into their resources so that the output will be dynamically determined at run time. An example of such a resource is one that reports to the user how email many new messages are in his or her inbox. The output line will always be the same ("you have new messages") except that the value of X will vary.
To create such a resource, a developer uses a prompt that defines an Expression rather than a sound filename or text string. An Expression consists of a series of macros, references to other resources, or both. •Macros A macro, which is always preceded by a percent sign (%), will be translated by the VA virtual machine when the resource is processed. The %siience {duration) macro, for example, will be translated to a duration -millisecond pause by the system.
Similarly, the %string [output_string] macro instracts the virtual machine to translate the specified output fstring of text into speech. The following expression, when evaluated, instracts the application to count to three, with a one-second pause separating each number:
Expression:
%string(""one"") %silence (1000) %string ( " "two" " ) %silence (1000) %string ( " "three" " )
When using a literal string value as a macro parameter, the string should be enclosed in two quotations marks (e.g. ""string""). The first quotation mark in each pair acts as an escape character.
The table below lists the complete set of macros available for use in resource expressions. Table 2: Resource Expression Macros
Macro Description
%silence(String duration) Pauses voice/sound output for the number of milliseconds specified by duration.
%string(String Translates the text specified by output jstring to output ^string) speech.
%istring(String Translates the text specified by output_string to output jstring, String type) speech, using intelligent formatting.
%num(String Translates the specified text string as a number. number jstring) "99", for example, will be read to the user as "ninety-nine".
%play(String filename) Plays the sound file specified by filename. %record(String filename) Records the user's voice input into the specified file. (This file will saved in Sphere format.)
%phone(String number) Translates the text specified by number into speech in the format of a phone number. "5553322", for example, would be translated "five-five-five-three- three-two-two".
%date(Date output date) Translates the Date variable specified by output _date into speech.
%idate(String datestring) Intelligently translates the text in datestring to date-formatted speech.
%time(Date output_time) Translates the Date variable specified by output Jime into speech.
%itime(String timestring) Intelligently translates the text in timestring to time-formatted speech. •Macro Arguments Macros by themselves offer limited utility. The real power of a macro appears when it is combined with resource arguments, which allow data to be sent to the resource when it is called by the application- The syntax for referencing arguments is
®x, where 'x' is the order in which the argument was submitted to the resource.
Using resources with parameters, the developer can create dynamic prompts for users. A VA application, for example, might define a resource named check mail that informs the user how many email messages are in his or her box. The application can use the resource's arguments to pass in the number of messages available and the number unread. The call may appear as follows in the application: num = 10 num_unread=3
PlayPrompt "check_mail (@1) " , num_new, num_unread
To process this information, the prompt in the check mail resource would have its
Expression field defined as follows:
Expression:
"%string ( " "You have"") %num(@l) %string (" "messages in your box"") %num(@2) %string(""of which are unread"")"
When the application runs, the following text string will be read to the user:
"You have ten messages in your box, three of which are unread."
•Nested Resources A resource expression can call another resource, providing a sophisticated method for building dynamic output. Rather than using the Text-to-Speech facilities to render the entire prompt, for example, a developer may use recorded prompts for the constant elements in the prompt and the TTS facilities for the dynamic data.
For a simple resource that informed the user ofthe time of day, the developer could record a human voice saying "the current time is," save it in a file named cwrent_time.wav, and create a resource named current_time that had its file field set to "current_time.wav." The developer could then create a second resource named read_time that added the dynamic data and called the first resource. The expression field for the second resource would appear as follows:
Expression :
Current time %time (@l )
When calling another resource from within a prompt expression, the developer simply enters the name ofthe resource in the expression. Any text within the expression field that is not preceded by a % sign is assumed to be a resource. When the developer's application wanted to inform the user ofthe current time, it would play the resource with the following code: PlayPrompt " read_time" , Now
When called at 3:12 PM, the read_time resource would say the following to the user: "The current time is three twelve p.m."
The VBScript Now function returns a Date object specifying the current date and time according the computer's system clock.
A resource expression can also specify arguments when calling another resource. A developer, for example, could call the read_time resource from within another resource using the following expression:
Expression :
Read time (@i ;
•Creating Multiple Prompts within a Resource
Although a resource returns only one prompt to the system at a time, multiple prompts can be defined within a single resource (hence the name "Prompt Group" as the resource type). Using multiple prompts adds variety to the output of a VA application and allows the application's responses to be dynamically generated.
A developer, for example, may want to create a resource that plays a more detailed error message when the user is a customer than it does when the user is an employee. Such a resource would define two possible prompts that would be played back depending on the identity ofthe current user was. The system determines which prompt to play back based on one or more Properties defined for each prompt in the prompt group.
To create this resource, the following steps should be followed:
1. Add a new prompt group to the application and give it a name such as "Menu_Error."
2. Insert a new prompt, giving it a same such as "insiderPrompt"
3. Open the Prompt Properties dialogue and set the prompt's type to Simple.
4. Enter the filename and text to be played to insiders. For example:
File :
InsiderPrompt .wav
Text :
Invalid menu choice
Under the two Properties text fields, click on the New button to add a new Property.
In the text field, enter insider=true. This sets a new Property for the prompt, indicating that the prompt should be played each time the
Insider property is equal to true. (This property will be set by the application when it calls the resource.)
7. Click OK to save the changes. 8. Insert a new prompt, giving it a same such as "insiderPrompt" 9. Open the Prompt Properties dialogue and set the prompt's type to
Simple.
10. Enter the filename and text to be played to insiders. For example:
File :
InsiderPrompt . wav
Text
Invalid menu choice 11. Under the two Properties text fields, click on the New button to add a new Property.
12. In the text field, enter insider=true. This sets a new Property for the prompt, indicating that the prompt should be played each time the
Insider property is equal to true. (This property will be set by the application when it calls the resource.)
13. Click OK to save the changes.
14. Insert a second prompt, giving it a descriptive name such as "outsiderPrompt".
15. Open the Prompt Properties dialogue and set the prompt's type to Simple.
16. Enter the filename and text to be played to outside. For example:
File :
OutsidePrompt . wav
Text
You have made an invalid selection. If you need more instructions, please say help.
17. Under the two Properties text fields, click on the New button to add a new Property. 18. In the text field, enter Insider=f aise.
19. Click OK to save the changes. After both prompts have been added to the resource, double click on the Menu_Error resource on the main screen and, as illustrated in Figure 17, a tree-representation of the prompt group 270 can be viewed. The tree-view feature provides a useful way for seeing the structure of a prompt group.
•Valid Prompt Properties The name of a Property assigned for a prompt can be any word the developer chooses except the reserved words, such as "File," "Text," and "Expression." The value of each property, similarly, can be either a boolean, integer, or string. The MenuJError uses boolean values (true/false), but strings could be used as well. For example, a similar prompt group could be created with the status=""insider"" or status- '"outsider"". The data type used for Property values will likely depend on how the resource is called from the application. •Calling Resources with Multiple Prompts
To use a resource with multiple prompts, the developer should call it from within another resource and set the Property to the desired value. To call the Menu Error resource, the developer would need to first create an Expression-type resource that calls Menu_Error and sets the Insider Property to true or false. This call (defined in the Expression field ofthe calling resource) has the following syntax:
Expression:
<resource> : [<Property>=<value>]
To call the MenuJError resource, for example, the developer could create a prompt group resource named Choose_Error with the following settings: pi
Type=Expresssion Expression=Menu_Error : [Insider=@l]
When a menu-selection error occurred, the application would, in turn, use the following code to play the dynamic error message for a user who was an employee: vavm. PlayPrompt "Choose_Error (@1) " , true
When this code is executed, the ChooseJError resource will be invoked and passed an input parameter with the value true. Choose_Error will, in turn, set
Menu_Error's Insider Properties to the input value (true) and invoke the prompt group.
•Using Scripts to Assign Property Values
When calling another resource, an expression can also use a piece of scripting code to assign the value to the resource's Property. When doing so, the expression statement uses the following format Expression :
<resource> -. [<Property>=<%Script_Expression%>] Before re-.o--7-ce.is called, the system will evaluate the <script expression^ and set the resource's Property to the returned value.
For example, expression in the ChooseJError resource could call MenuJError as follows: p
Type=Expresssion
Expression=Menu_Error : [Insider=<%cstr (%1) = ""employee ""]
When a menu-selection error occurred for an inside user, the application would use the following code to play the dynamic error message: vavm. PlayPrompt "Choose_Error (@1) " , "employee"
When this code is executed, the <%cstr (%ι) = " "employee" "%> code will evaluate to tme and, which will be passed an input parameter to MenuJError. If ChooseJError was called with any parameter other than "employee" (e.g. "customer" or "visitor"),
Insider would be set to false. The VBScript cstrfj function is necessary because arguments to resources are passed as Variants. CstrO guarantees that the Variant is translated to a string value correctly for the comparison with "employee."
•How the Prompt Group Selects Prompts When a prompt group is invoked, it uses the custom-defined properties to test each prompt and determine which one should be returned for output. To do so, it steps through its list of prompts, evalutes the properties for each prompt, and returns the first prompt whose properties all evaluate to true.
For example, when the MenuJError prompt group is called with the Insider property set to false, the prompt group first checks "insiderPrompt." Because it has insider=true defined as its property (and Insider is equal to false), the prompt is not returned. The prompt group next checks "outsiderPrompt". Its only property is insider=f aise, which evaluates correctly, so the sound file "outsidePrompt.wav" is played. •Selecting Prompts with Multiple Properties
While the MenuJError example is straightforward, it is a more complicated situation when a prompt group has more than two prompts and each prompt has multiple Properties defined. Consider, for example, the following prompt group named MessageJ eport: I Type=Simple
Text="She has one message"
Filename=
Gender="" female""
Number=" "singular" " P2
Type=Simple
Text="She has two messages"
Filename=
Gender="" female"" Number=" "plural ""
P3
Type=Simple
Text="He has one message"
Filename= Gender=""male""
Number=" "singular" " P4
Type=Simple
Text="He has two messages" Filename=
Gender=""male"" p5
Type=Simple
Text="The person has messages" Filename=
Such a prompt group might be called from another resource with the following expression:
Expression=Message_Report : [Gender=@l] [Number=@2 ] If the resource were called with "male" and "singular" as parameters, MessageJ eport would be invoked and would step through each of its prompts looking for one whose Properties matched. PI would be rejected because Gender was not "female"; P2 would be rejected because neither Gender nor Number matched. P3, since both its Properties evaluated true, would be selected and returned. Neither P4 nor P5 would be evaluated, since the prompt group stops searching once a matching prompt is found.
In the example above, a Number Property was not defined for P4 and neither a Number nor Gender was defined for P5. These prompts act as default cases. P4 would be returned anytime Gender was equal to "male" and Number was not
"singular" (P3 would intercept all Number^" singular" cases). P5 would be returned in any other circumstances, such as when Gender was set to "unknown."
•Using Random Selection Mode In its default configuration, a prompt group uses a Sequential mode to select which prompt to return. That is, it starts with the first prompt in the list and proceeds in order to the last. In most cases, this is the desired behavior, but a developer may wish to have the prompts tested randomly.
To change a prompt group's selection mode, use the following procedure: 1. Right click on the prompt group in the Resources pane on the main menu. 2. From the drop down menu, select Prompt Group Properties. As shown if Figure 18, the Prompt Group Properties 272 dialogue box will open.
3. In the Selection combo box 274, choose "Random."
4. Press OK to put the changes into effect.
In Random selection mode, the prompt group checks its prompts' properties in a random order until a true condition is found. This mode is useful for creating resources that provide variety in their output.
For example, a resource that is played when a user makes an invalid menu selection might define several possible prompts that could be returned. Rather than saying "Invalid choice" each time, the application could say variously "that choice is incorrect," "you selected an invalid option," and "that choice is not available." The system would randomly select which prompt to use, bringing variety to the user's interaction with the application. Such a prompt group might be defined as follows:
PI
Type=Simple
Text="That choice is incorrect"
Filename=
P2
Type=Simple
Text="You selected an invalid option"
Filename=
P3
Type=Simple
Text="That choice is not available"
Filename=
In this case, no custom Properties were defined for the prompts. It should not matter which prompt is played so long as they are played in a random order. If desired, a developer can combine random selection mode with Properties, creating a prompt group that randomly searches its list of prompts until a valid one is found.
•Changing Prompt Order In the Prompt group Properties dialogue, the Prompts text area contains a list ofthe prompts that have been defined in the prompt group and the order in which they will be evaluated when the group is in Sequential Selection mode. To change the order of a prompt, click on its name to highlight it, then click the Up or Down buttons to move the prompt through the list. Expression and Prompt Properties can be used to create sophisticated output.
For example, different prompts can be selected to create resources that use singular and plural nouns appropriately. The check_mail resource used as an example above would be adequate as there were more than one message in the user's inbox. If the box contained only a single new message, however, the returned prompt would sound awkward: "You have one messages in your box, one of which are new." This problem can be solved by using nested resources and test conditions. The resources recommended would appear as follows: resource message_select Type=Prompt outl {
Text=" essage" singular=true
} out2 {
Text="messages" singular=false
} ' } resource be_select {
Type=Prompt outl {
Text=" is" singular=true } out 2 {
Text =" are" singular=f alse
} }
resource num_mes sages { Type = prompt
Pi { Expression="%string (""You have"") %num(@l) message_select : [singular = <%cint(@l) =1%>] %string(""in your box"") %num(@2) %string(""of which"") be_select : [singular=<%cint (@2)=1%>] %string( ""unread"")"
}
}
A developer would use the following code to call these resources from an application: vavm. PlayPrompt "num_messages (@1 , @2) " , num, num_unread
The variable num would be set to the number of messages in the user's mailbox and numjμnread set to the number of messages that are unread.
Phrase List Modules
A phrase list module allows the definition and importation of sections of a grammar that will be accessible from within phrase lists anywhere in the application, that is, in the phrase lists for a discourse or topic. Using a phrase list module saves a developer from having to repeatedly define the same phrases in multiple lists.
•Defining Phrase List Modules
Like Prompt Groups, a Phrase List Module can be defined only globally. To create a new phrase list module, perform the following steps: 1. Open the main window for the application. 2. Right click inside the "Resources" pane. A drop-down menu will open.
3. From the drop down menu, select "Insert New Phrase List Module." A dialogue box will open, prompting for the name ofthe new module.
4. In the dialogue box, enter the name for the new phrase list module and click OK. The new module will be added to the list in the Resources pane. 5. Double click on the name ofthe new module in the Resources list. A new window will open containing the text ofthe phrase list. (Because a new phrase has just been defined, it will be empty when first opened.) 6. Edit the phrase list to add the entries to be accessible from other grammars in the application. •Format of a Phrase List Module
The format of a phrase list module is identical to that for regular phrase lists in a discourse or topic. In addition to specifying recognized phrases and rec object tags, a developer can define rales, use typed variables, and implement dynamic grammars. For example, one use of a phrase list is to recognize positive and affirmative voice commands from the user. In a large application, multiple discourses and topics may ask the user to answer questions by saying "yes" or "no." Without a phrase list module, the developer would have to define the phrase "yes" and "no" in the phrase list for every discourse or topic that listened for such responses. If the developer wished to make his or her application flexible and recognize variant responses such as "yeah" and "okay," the amount of coding required would increase substantially.
By using a phrase list module, the developer can perform the work of defining acceptable "yes" and "no" responses only once and then use the same grammar code repeatedly throughout the application. The text of such a phrase list module might be called YesNoEPL and could appear as follows:
_YesNo_Yes_Words <Answer="Yes" >'.
YesNo No := YesNo No Words <Answer="No"> .
_YesNo_Yes__Words : =
( (yes ? (_YesNo_It_Is | _YesNo_Thats_Right) )
I _YesNo_It_Is
I _YesNo_Thats_Right
I yup
I yeah
I okay
I sure
I (you got it)
_YesNo_It_Is := (it ? (sure | certainly) is) .
_YesNo_Thats_Right : = (? (that's | that_is) (right | correct) ) .
_YesNo_No_Words :=
( (no ? (_YesNo_It_Is_Not j _YesNo_Thats_Wrong way) )
I nope
I (absolutely not) ) . _YesNo_It_Is_Not : = ( (it isn ' t) | ( it ' s not) | ( it is not) ) .
_YesNo_Thats_Wrong := (? (that's | that_is) (wrong | incorrect (not (correct | right) ) ) ) .
As defined by this phrase list module, the system will recognize a positive response and set the rec object's Answer tag to "yes" no matter whether the caller says "yes", "yup", "yes that's right", or any of several other affirmatives. Similarly, the system will recognize a negative response no matter whether the caller says "no",
"nope", or "that's wrong" .
•Referencing Phrase List Modules from Within a Discourse Once defined for an application, a phrase list module can be inserted into the phrase list of a topic simply by adding the name ofthe module followed by a period. To insert the YesNoEPL defined in the previous section, the developer would simply open the phrase list for the desired topic and add the following line:
YesNoEPL .
Thus, the topic's phrase list will, in addition to whatever other phrases and rules the developer wishes to define, will recognize all the variants of "yes" and "no" defined in the YesNoEPL module.
•Inserting a Phrase List Module from A File
In addition to creating new phrase lists modules, a developer can also import an existing module from a file. This allows modules that were created for other VA applications to be used in a new application. After developing a few full-featured applications, in fact, a VA developer should have a considerable library for phrase list modules on hand, allowing him or her to reuse the same sets of common responses such as affimatives/negatives, email commands, and menu commands.
To be inserted into an application, a phrase list module should be stored in a .vaplm file. These types of files are generated by the VA Studio compiler each time a project is built. In most instances, there will be only one .vaplm file for each application, and it will share the application's name. The phrase list module file for the VirtualOperator project (VirtualOperator.vasp), for example, could be named VirtualOperator. vaplm .
To import a module from an existing application, perform the following steps:
1. Open the main window for the application.
2. Right click inside the "Resources" pane. A drop-down menu will open. 3. From the drop down menu, select "Insert Phrase List Module from File." A dialogue box will appear prompting for the name ofthe module and its filename.
4. In the dialogue box, type the name for the phrase list module. (This name will be used in the Resources list and in topics' phrase lists to reference the module.)
5. Type the filename in which the phrase list module is save or click on the Browse button to find for the file using a standard Open File dialogue.
6. Click OK. The phrase list module will be added to the Resources list.
Once a phrase list module has been inserted from a file, it can be edited to add or remove grammar entries. To do so, double click on the name of the new module in the Resources list. An editor window will open containing the text ofthe phrase list. Script Modules
, Like Phrase List Modules, Script Modules help the developer create applications rapidly. Instead of defining modules that can be used in a topic's phrase list, script modules allow developers to reuse the same piece of scripting code in multiple topics or event handlers.
Several discourses within an application, for example, may need to instantiated a user object and use it to read a caller's information from the VA database. Rather than having to retype the same code in each topic, the developer can implement the code as a public function in a single script module and reference it in each topic Action or event handler that needs the functionality. •Defining a Script Module
The process for defining a script module is very similar to that for a phrase list module: 1. Open the main window for the application.
2. Right click inside the "Resources" pane. A drop-down menu will open. 3. From the drop down menu, select "Insert New Script Module." A dialogue box will open, prompting for the name of the new module.
4. In the dialogue box, enter the name for the new script module and click OK. The new module will be added to the list in the Resources pane. 5. Double click on the name ofthe new module in the Resources list. A new window will open containing the text ofthe script. (Because a new script has just been defined, it will be empty when first opened.) 6. Edit the script to add the code to be accessible from topics and event handlers throughout the application. •Format of a Script Module
A script module is analogous to a C/C++ header file or an imported Java class. Within a script module, constants and public functions are defined and made available outside the module by using the vavm's Export API method. Once defined, these constants and functions can be referenced in the code of any topic or event handler that imports them using the vavm's External method.
When defining a script module, include the following elements:
• A public constant called MODULE JNAME that is set to the name ofthe script module
• In the module's main block, a call to vavm's Export method with MODULE_NAME as a parameter
• Code for the public subroutines and functions that will be called by the topics and event handlers that import the module.
The syntax ofthe vavm's Export method is as follows: vavm. Export {Modul eName) ModuleName is a string containing the module's name. Although ModuleName can be defined as a literal string, by convention VA applications define the module's name in a constant string called MODULEJSTAME and use that constant in the Export method call.
For example, one type of functionality that might be repeated in several topics in an application is the sending of an email message using the user's SMTP account.
This functionality involves dozens of lines of code, and by isolating it in a script module the developer can reuse the same code in multiple topics. The example below shows a script module named "SendingEmail" that defines a public function (SendMailMessage) that will be used by all the topics in the application that need to send an email message.
Public Const MODULE_NAME = "SendingEmail"
' Properties
Private eMailServer Private theUserName
Private toUserAddress Private mailSubject
Main
Export the interface
vavm. export (MODULE_NAME)
' Public Interface
Public Sub SendMailMessage (userName, address subject, attachment)
eMailServer = "mail.conita.com" vavm. SignalEvent 0,3,"eMail Server: " & eMailServer theUserName = userName vavm.SignalEvent 0, 3 , "userName : " & theUserName toUserAddress = address vavm.SignalEvent 0, 3 , "toUserAddress : " & toUserAddress mailSubject = subject vavm.SignalEvent 0, 3 , "Subject : " & MailSubject
vavm.SignalEvent 0, 3 , "Creating SMTPDO. Session.1
Obj ect "
Set oSMTP = CreateObject ( "SMTPDO. Session.1")
vavm.SignalEvent 0,3, "Opening Connection to
Mail Server" oSMTP . Connect eMailServer
vavm.SignalEvent 0, 3 , "Attaching the Problem WAV
File" oSMTP. ttach attachment
vavm.SignalEvent 0, 3 "Sending Mail" oSMTP.Send theUserName, "ConitaCSR@conita.com", "ConitaCSR@conita.com", toUserAddress, 1, mailSubject, "EMAIL", "This is a test, an included file attached."
vavm. aitForPendingRequests
vavm.SignalEvent 0,3, "Closing Connection to Mail Server" oSMTP. Disconnect
Set oSMTP = Nothing vavm.SignalEvent 0,3, "Ending SendMail Sub"
End Sub
•Referencing a Script Module
Once a script module is defined in the application's Resources, the module can be used within the code of any topic or event handler. To do so, use the vavm's External API method to import the module. The syntax of External is as follows: vavm. External (ModuleName)
ModuleName is a string containing the name ofthe script module to be imported. Once External has been called, the script module's public subroutines and functions can be called just as if they were a part ofthe topic or event handler's code. In effect, the External method defines an object with the name ofthe module; any calls to that module's subroutines or functions will be made as if the developer were calling the methods of an object.
For example, the code from a topic's Action script that uses the SendingEmail script module defined previously is shown below: Topic: EmergencyNotif ication
Trigger: rec .Action="Notify"
Action: vavm.TTSString "Please say the message you would like me to send to the administrator. " vavm. RecordWaveform "RecordedMessage" vavm.TTSString "I'm sending the email notification now. "
vavm. external ("SendingEmail")
SendingEmail . SendMailMessage
"Administrator" , "admin@conita.com" , "A sound file containing the problem is attached. " , RecordedMessage
vavm . TTSString "The message has been sent . " •External Modules and Namespaces Once a script module has been imported using the External API call, its variables, subroutines, and functions are available anywhere within the namespace in which it was created, just as if it had been instantiated an object. For this reason, if External is used to load a module in one topic, the module will be available in any other topic in the application. For this reason, a developer can use an External API call in the ApplicationEntry event handler and make a script module's functions available globally throughout the application.
Advanced Agent Assistant Creation
The User Object (VAUserDO) •Instantiating a User Object
To instantiate a user object, a developer should use the vavm's CreateUserObj API call. This call returns a new User object with uninitialized values.
•Methods The method: void SetUse (String IdentString, String Tdent etho ) sets the user contained in the object based on the criteria specified by IdentString. The method used depends on the value set in the IdentMethod parameter. The possible values for IdentMethod correspond to fields in the Users table ofthe VA database:
"UserName" "FullName" "CallerlD" "DNISNumber" • "Namecode" f the value in IdentString is found in the database field specified by IdentMethod, that user's information will be loaded into the User object. If no match is found, an error will be thrown.
The method: void Authenticate (String AuthString, String AuthMethod) authenticates the user contained in the object based on the criteria specified in the
AuthMethod parameter. The possible values for AuthMethod correspond to the following fields in the VA database:
• "Password": A password spoken and recognized by the speech engine • "Passcode": A multi-digit passcode typed by the user on the phone's keypad If the value in AuthString matches the value in the specified field for the user, then the user object is considered authenticated. If the AuthString does not match, an error is thrown. Until a user object is authenticated, it will return only Public values to the calling program.
The method: void setValue (String Name, String Value [, String Privilege] ) sets a keyword/value pair in the User object's memory cache. If the keyword Name already exists in the cache, its value will be updated to that specified by Value. The optional parameter Privilege can be used to specify whether the privilege for the newly created keyword/value pair should be "Public" or "Private." A "Public" value (such as a telephone number) is accessible to anyone who wishes it. A "Private" value (such as a PIN or account number) can be read only by a User object that has been successfully authenticated. Calling the setValue method will change the keyword/value pair only in the memory cache, not in the actual database record. To change a user's database record, the CommitValue or CommitValues method must be called.
The method:
String GetValue (String Name)
Returns a string containing the value for the user keyword specified by name. Before a User object has been authenticated, its GetValue method will return only keywords with a public privilege. To retrieve private values, the User object must first successfully execute its' authenticate method.
The method: void DeleteValue (String Name) deletes the keyword/value pair specified by Name. To delete a keyword from a user's database record, the User object must be successfully authenticated.
The method: void CommitValue (BSTR Name) writes the value contained in the user keyword specified by Name to the database. For a value that with "Private" privileges to be committed to the database, the User object must have been successfully authenticated.
The method: void CommitNalues ( ) writes all the keyword/values contained in the User object's memory cache to the database. For values with "Private" privileges to be committed to the database, the User object must have been successfully authenticated.
The method: void RefreshValue (String Name) reloads the value ofthe keyword specified by N--7?ze from the database into the User object's memory cache.
The method: void Ref reshValues ( ) reloads all user data values from the database into the User object's memory cache.
•Accessing User Object Data
In addition to using the GetValue method call, a developer can also treat a keyword as a member variable ofthe User object. The example below uses both techniques to retrieve the user's email name and email password from the User object: On Error Resume Next set UserInfo=vavm. CreateUserObj Userlnf o . SetUser inputName , "UserName" Userlnf o . Authenticate inputPassword, " Password" if err then
MsgBox ( "Could not create user obj ect" ) else emailusername =
Userlnfo . getValue ( "EmailUserName" ) emailpassword = Userlnfo. EmailPassword // end if
Using the SMTP Object (SMTPDO)
The SMTP Object allows VA applications to interface with a mail server and send email messages for the user. •Methods
The method: void Connect (String Servemame) establishes a connection to an SMTP server. This method must be called before a message can be sent. Servemame specifies the name ofthe SMTP server with which to connect and can be either in DNS name format (e.g. mail.bigmail.com) or IP Address format (e.g. 123.45.231.2).
The method: void Disconnect ( ) disconnects the object from the SMTP server.
The method: void attach (String Filename) attaches the document specified by Filename to the outgoing email message. This method can be called multiple times to attach more than one document to a message, and it should be called before the send method. When the attach method is called, the SMTP object will cache the filename internally. When the send method is called, the object will attach each ofthe documents in its cache to the message before sending it to the SMTP server. The method: void send (String FromName, String FromAddress ,
String ReplyTo, String To, String .Importance, String Subject, String VAMessageType, String Text) sends an email message through the SMTP server. The parameters for send are listed in the table below. All are string values.
Table 3: SMTP Object Send Method Parameters
Parameter Description
FromName A string containing the name to be displayed in the from field ofthe message.
FromAddress The email address that the message will be sent from
ReplyTo The address that should be used by the recipient to reply to the message (In some cases, this may differ from the FromAddress).
To The email address to which the message should be sent
Importance A string indicating the importance ofthe message.
Standard SMTP values for Importance are "high", "low", and "normal"
Subject The subject ofthe message
VAMessageType The type of the message
Text The text ofthe message to be sent.
.Using the SMTP Object (VBScript)
To use the SMTP object within a VBScript application, it should be first instantiated using the CreateObject function. The syntax for this function call is as follows: set <o-oj ect_varia Ie_name>=CreateObj ect ( "SMTPDO . Ses sion . X" )
Where X is an integer determined by the user. This call will invoke a new STMP Object Session with the ID specified by X.
The following example shows how to use the SMTP object to send a basic email message: set mySMTP=CreateObject ( "SMTPDO. Session.1" ) mySMTP . Connect "smtp .bigmail . com"
messageText="This is the text of the sample message . " mySMTP . Send "John Smith" , " j smith@bigmail . com" ,
"j smith@bigmail . com" , "bbrown@desination. com" ,
"normal", "Test Message", "MessageType" , messageText mySMTP .Disconnect mySMTP = Nothing
77ze Recognition Object
The Recognition object is used by the VA virtual machine to return to results of a speech recognition operation to the VA application. For example, when the user speaks into the telephone, the system recognizes the speech and compares the results with the phrase list for the active grammar. If a match for the recognized speech is found in the phrase list, the speech recognition data is stored in the rec Recognition object, which is passed back to the application. A recognition objects is also returned by the vavm API RecognizeUtterance, which allows a developer to use the speech recognition facilities to recognize a sound file and translate it to text.
A Recognition object has no methods. Three member variables (listed in the table below) are used to store the speech recognition results. In addition, the user can define the custom tags that are set within grammars when particular phrases are recognized.
Table 4: Recognition Object Member Variables
Variable Type Description confidence integer A value from 0 to 100 indicating the relative level of confidence that the speech was correctly recognized resultstring String The text string generated by the recognition engine when the speech was recognized utterance String Name ofthe file in which the utterance was recorded Using the Recognition Object
The example below uses the vavm.RecognizeUtterance API call to record a selection ofthe user's speech, recognize it, and read the text back: vavm.TTSString "Please say the name of the department you would like to contact" vavm.RecordWaveForm "department . av" , "RIFF" vavm.TTSString "Thank you." result=vavm-RecognizeUtterance ("Departments" , "depart ment.wav") if (result .confidence < 50) then vavm.TTSString ("I'm sorry. I did not understand the name you gave . " ) else vavm.TTSString "Transferring you to " + result . resultstring
Transfer . department=result . resultstring vavm. SelectDiscourse "Transfer"
end if
Using the Security Object (SecurityDO)
The Security object can be used by a VA to encrypt and decrypt text-file communications (such as email). It supports both standard encryption as well as high- security 64 bit algorithms.
•Methods
The method:
String Encrypt (String Text, String Key) returns a string containing the encrypted version of Text as encoded using the specified Key.
The method: String Decrypt (String EncryptedText, String Key) returns a string containing the decoded version of EncryptedText, as decoded using the specified Key.
The method:
String Base64Encrypt (String Text, String Key) returns a string containing the encrypted version of Text as encoded using the specified Key with Base-64 Encryption.
The method:
String Base64Decrypt (String EncryptedText , String Key) returns a string containing the decoded version of EnayptedText, as decoded using the specified Key with Base-64 Encryption.
Using the Audio Object (AudioDO)
The Audio object can be used to perform several audio utility functions for VA applications. These functions include the following:
• Generating a pron (pronunciation) string from a piece of text • Converting sound files from Sphere to Riff format (and vice versa)
• Concatenating sound files •Methods
•PronFactory Interface
The method: void Connect (String ServerName, Integer Port) connects to the port Port ofthe speech recognition server specified by ServerName.
The method: void Disconnect () disconnects the Audio object from the server. Themethod:
String GeneratePronFromName (String FirstName, String astName) concatenates FirstNαmeFile and LαstNαmeFile and returns a Pron string. This method is useful for functions such as building dynamic address books.
The method:
String GeneratePron (String Text) returns a Pron string that can be used to recognized the specified text.
•Converter Interface
The method:
void SphereToRiff (String SphereFileName, String
RiffFileName, Boolean ConvertTo8Bit) converts the Sphere file contained in SphereFileName to Riff format and saves it in the file RiffFileName. If ConvertTo8Bit is true, the file will be converted to an 8-bit Riff format.
The method: void RiffToSphere (String RiffFileName, String SphereFileName) converts the Riff file contained in RiffFileName to Sphere format and saves it in the file SphereFileName.
•Concatinate Interface
The method: void concatintate (String InputAFileName, String
JnputBFileNa-rie, String OutputFileName) reads the sound files in InputAFileName and InputBFileName, concatenates them, and stores the result in OutputFileName.
Using Dynamic Grammars
In one embodiment, the NA applications use static grammar definitions. The phrase lists defined for these applications remain the same each time the application executes.
In an alternative embodiment, a developer will need to use grammars that change according to who is using the application or when the application is being used. A developer, for example, may wish to implement an address book feature that allows users to call in and request the addresses and/or telephone number for contacts.
For this type of application, static grammars are insufficient. For the application to process the request, "give me the phone number for John Brown," it would have to have "John Brown" in its phrase list. To function properly as an address book, in fact, each contact name would have to be added to a static phrase list. This static design would require that the application be recompiled each time a new contact was added to the address book.
Dynamic grammars provide a way around these problems. They allow an application to import grammar definitions both at start-up and on the fly during the course of its execution. Using dynamic grammars, an address book application can wait until a call is received and, once the user is identified, load a list of recognizable contact names from that user's database records.
•Understanding Pron Strings
The key components in dynamic grammars are pron strings (short for
"pronunciation strings"). These strings contain the data that instracts the speech recognition engine how to identify particular phrases. Although developers of a NA application will probably never have to construct a pron string manually, it is useful to understand the information they contain.
The standard format for a pron string has four separate elements: word_name pronunci a ti on recogni ti on_probabi 1 i y NL__statement
These four elements represent the following information: • word_name: the text representation ofthe word to be recognized.
• Pronunciation: a phonetic string indicating how the word is pronounced.
• Recognition_Probability: the probability that the word will be recognized.
• NL Statement. A command string that tells the speech recognition engine what to do when the word is recognized.
The pron string for the name "George" would appear as follows: george "jh ao r jh" 1 " { return (george) } "
This string instracts the speech recognition engine that when the phonetic pattern "jh ao r jh" is heard, it should return the text string "george" to the VA virtual machine.
•Creating Pron Strings
It would be a tedious process if a VA application developer had to write pron strings manually for each non-standard word (such as a proper name) that had to be recognized. Fortunately, there are VA object methods that can be used to construct pron strings automatically.
One ofthe easiest ways to generate a pron string for a particular name is to use the AudioDO object's PronFactory interface. This interface's methods allow a developer to submit a person's name and receive back a formatted pron string that is ready for insertion into a dynamic grammar. The AudioDO object makes calls to a PronFactory server for the actual translation ofthe text to a pron string. This PronFactory serrver runs as a Windows
NT service (and possibly on a separate machine from the VA Manager), so the
AudioDO's Connect method should be called before calling the methods to generate pron string. The connect method has the following syntax: connect (String ServerName, Integer Port)
Where ServerName is the name ofthe server on which the PronFactory is running and Port is the port on which it is listening for requests.
Once the AudioDO object is connected to the Pron Server, the GeneratePron method can be used to generate a new pron string for a word. The method has the following syntax:
String GeneratePron (String Text) •Generating Pron Strings for Names Because one ofthe most common uses of dynamic grammars is to allow a VA application to recognize names, the AudioDO includes a GeneratePronFromName method that converts a first and last name into a single pron string: String GeneratePronFromName (String FirstName,
String LastName)
When this method is called, it will concatenate FirstName and LastName and generate a pron string for the full name. This pron string is actually more complicated than the simple example given for "George" above. The pron strings generated by GeneratePronFromName will recognize several different variant pronunciation of the word and also contains the commands necessary to set two variables, FirstName and LastName, in the rec object whenever the name is spoken. (Using the rec object with dynamic grammars is discussed later in this chapter.) For example, the VBScript code below instantiates an AudioDO object and uses it to generate a pron string for the name "Michael Harrison." In this case, the Pron Factory server is running on the same machine as the application and is listening for requests on port 9999: set myAudioObject=CreateObject ( "AudioDO . PronFa ctory .1" ) myAudioObj ect . Connect "localhost" ,9999 newPronName=τnyAudiobbject .GeneratePronFromName (
"Michael" , "Harrison" )
•Adding Dynamic Entries to a Phrase List
Dynamic entries can be added to a grammar by using the Dynamic typed variable. Dynamic essentially creates a placeholder in the grammar that will be filled by dynamically generated entries at ran time. The format ofthe typed variable is as follows:
[Dynamic , Name]
Name is the name used as an identifier for the dynamic grammar. .Setting Rec Object Tags
If the AudioDO object's GeneratePronFromName method is used, the returned pron string will automatically contain instructions that set tags in the rec object. When the name in the pron string is recognized by the dynamic grammar, a rec object will be returned to the application with its FirstName and LastName tags set to the first and last name ofthe person identified. For example, the following illustrates how the Dynamic typed variable might be used. In creating an address book, the application needs to recognize the phrase "give me the address for" plus the desired contact name. The contact name, however, is unknown at the time of writing the application (since the user will periodically add and remove names to his database-stored address book), so the Dynamic placeholder is used instead ofthe contact name in the phrase list:
(give me the address for [Dynamic ,
Contact_Name] ) <Action=" Query_Address" > .
Thus, if the user says "give me the address for" plus a name that has been loaded into the ContactJSfame dynamic grammar, a rec object will be returned with an Action tag set to "Query_Address" and FirstName and LastName tags set to the first and last name ofthe identified person.
The functionality for looking up the address could be handled by a task with the following trigger and action:
Trigger: rec.Action ="Query_Address"
Action: TTSString "Looking up the name for " & rec . Contact_Name
λ ' Λ ' Add code here to look up address using SQL query
if found then
TTSString "The address for " & rec. FirstName & rec. LastName & " is " & address else
TTSString "I could not find " & rec . FirstName & rec. LastName & " in your address book . " •Adding Entries to the Dynamic Grammar
Once a Grammar typed variable has been added to a phrase list, new entries can be added to the dynamic grammar during application execution by using the vavm's LoadDynamicGrammarEntry API call. The format of this call is as follows:
LoadDynamicGrammarEntry (String GrammarName, String En try)
GrαmmαrNαme represents the name ofthe dynamic grammar to which the entry should be added. Entry is a pron string for the word that is being dynamically added.
To add the name "George Smith" to the simple address book example used above, the following code could be added somewhere within the application (such as a
DiscourseEntry event handler): set myAudioObj ect=CreateOb j ect ( "AudioDO . PronFactory
.1") myAudioObj ect . Connect "localhost" , 9999 newPronName=myAudioObj ect . GeneratePronFromName ( "Mich ael" , "Harrison" ) myAudioObj ect . Disconnect set myAudioObj ect=nothing vavm . LoadDynamicGrammarEntry newPronName, "Contact_Name"
This type of example is impractical because it hardwires the name Michael Harrison into the application and thus defeats the purpose of a dynamic grammar. A truly dynamic way to implement the address book would be to listen for a user's call, then read in a list of names from that user's contact database and add them to the dynamic grammar.
Because a pron string is a regular ASCII text string, it can also be stored in a database. By storing the pron string in the database, it can be loaded directly into the dynamic grammar when a user connection is made and thereby avoid the overhead of converting contact names to pron strings each time a call is received by the application.
Scripting API's
In the preferred embodiment, the virtual assistant scripting application programming interfaces (API's) are as follows:
•AnswerCall
•Syntax vavm.AnswerCall
•Description Answers a call that has been received by the NA Engine. This method is needed only in applications that use a manual call flow, and it should always come directly after a WaitForCall API call.
•See Also WaitforCall
•CallAΝI
•Syntax
AΝIString=vavm . CallANI
•Description Returns a string containing the ANI value for the current call. (The ANI value indicates the phone number from which the call originated.)
•CallDNIS
•Syntax DNISString=vavm. CallDNIS
•Description Returns a string containing the DNIS value for the current call. (For a facility with multiple shared phone lines, the DNIS value indicates the number that was dialed by the caller to reach the business.)
.CallED •Syntax IDString=vavm. CalllD
•Description Returns a string containing the unique ID assigned to the call by the NA system. The ID is particular useful for logging and transcriptions.
•CallPort
•Syntax PortString=vavm. CallPort •Description
Returns a string containing the port on which the current call was received.
•CreateΝameSpace •Syntax vavm . CreateΝameSpace (String NameSpace)
•Description Creates a new name space within the code module. For more information on name spaces, refer to Chapter 5, "Extending the Basic NA Application."
•Parameters NαmeSpαce: The name ofthe new namespace.
•DiscourseExit •Syntax vavm. DiscourseExit
•Description Used only within the code in the DiscourseExit event handler, this function forces a change iri grammar from that ofthe cuπent discourse to that ofthe new discourse.
-EnroUUtterance •Syntax PronString=vavm.EnrollUtterance (String ϋt teranceFi 1 e)
•Description Takes the specified sound file, recognizes it, and returns a Pron string that can be loaded dynamically into a grammar. This method is useful for dynamically teaching new recognition during the execution of an application.
•Return Values Returns a Pron string.
•Parameters UtterαnceFile: The name of the file to be recognized.
•See Also RecognizeUtteranceO, LoadDynamicGrammarEntryO
•Export »Syntax vavm. Export (String ModuleName)
•Description When specified within a script module, allows the module's public variables, subroutines, and functions to be referenced from topics and event handlers (using the External API call).
•Parameters Module Name: A string indicating the name by which the module will be referenced.
•Example The example below shows the code of a script module that uses the export
API to make its subroutines and public variables available to topic and event handlers. The module contains a single function, Chop, that receives a string as an input parameter, chops off any leading whitespace, and returns the modified string. Public Const MODULE_NAME = "SMUtility"
vavm. export (MODULE_NAME) Public Function Chop (strlnString as String) as String
Private chLeft as String Private strlnput as String
strlnput = strlnString
if len (strlnput) <1 or strlnput=" " or strlnput=chr$ (9) then Chop="" exit function end if chLeft=left$ (strlnput, 1) do while (chLeft=" " or chLeft=chr$ (9) ) and len (strlnput) >1 strlnput=right$ (strlnput , len (strlnput) -1) chLeft=left$ (strlnput, 1) loop chop=strlnput End Function
•See Also ExternalO
•External .Syntax vavm. External (String ModuleName)
•Description
Imports an external script module for use in the code of a topic or event handler. The module to be imported must have been defined in the application's Resources list and made available using the Export call.
Once a script module has been made available using the External API, it can be called from any topic or event handler that shares the same namespace in which the call was made. It is common for VA developers to make all their External calls in an ApplicationEntry event handler so that the external modules will be available throughout the application.
•Parameters Module Name: A string indicating the name ofthe external module to be referenced.
•Example The example below shows the VBScript code from a task that uses the Export API call to access the chop function that was defined in a script module named SMUtility. vavm. external ( " SMUtility" ) strTemp= " hello world . " vavm . SignalEvent 0 , 1 , "before : " & strTemp strTemp=SMUtility. Chop (strTemp) vavm . SignalEvent 0 , 1 , " after : " & strTemp
•See Also ExportO
•GetParameter .Syntax
Value=vavm. GetParameter (String ParameterName)
•Description Returns the value ofthe specified component-level parameter.
•Return Values Returns a string containing the value ofthe specified parameter
•Parameters ParameterName: The name of the parameter to be retrieved
•See Also SetParameterQ
-GetUserObj •Syntax Set OJb ect=vavm. GetUserObj (String IdentString, String JdentMethod)
•Description Creates a new User Object that can be used to access user information in the database. The two parameters, IdentString and IdentMethod, are used to set the identity ofthe cuπent user. IdentMethod should be assigned one ofthe method names listed in the table below. Each of these methods coπesponds to a field in the Users table in the VA Database. The IdentString is the value that will be compared against the database field to determine the identity ofthe user.
If no match is found for IdentString using the specified IdentMethod, then an eπor will be thrown. Table 5: Identification Methods for the VAUserDO Object
Method Description
"UserName" The login name ofthe user.
"FullName" The full name ofthe user
"CallerlD" The telephone number from which the user calls. A developer can retrieve the ANI (CallerlD) number for a particular call using the vavm.CallANI method and use this identification method to compare it against the value stored in the Users table.
"DNISNumber" The number dialed by the user to reach the VA. In an environment with multiple incoming lines dedicated to Virtual Assistants, each user can be given a different number to call to connect to the VA. Though these numbers will all connect to the same application, the DNISNumber identification method can be used to determine which user has called in.
Namecode
•Return Values Returns a User Object (VAUserDO). The object is returned unauthenticated; the developer will need to call the User Object's authenticate method. For more information, see Chapter 7, "Creating Advanced VA Applications." •Examples The following code instantiates a VAUserDO object and sets its user to "Bill
Smith":
Set oMyUser=vavm.GetUserObj ("Bill Smith" , "FullName" )
The following code instantiates a VAUserDO object and identifies the user by
DNIS Number: strDNIS=vavm. CallDNIS Set oMyUser=vavm.GetUserObj (strDNIS, "DNISNumbe r")
•LinkDiscourses
•Syntax vavm. LinkDiscourses (String SourceDi scour se , String Des tDi scourse)
•Description Creates a symbolic link between one discourse and another. This linking can be used to allow dynamic discourse switching within in an application. •Parameters Sour ceDis course: The name that will be used as the discourse link
DestDiscourseTho target discourse to which SourceDiscourse will point.
•LoadDynamicGrammarEntry βSyntax vavm. LoadDynamicGrammarEntry (String GrammarName , String Entry)
•Description Loads a new dynamic grammar entry into the specified phrase list. For more information on Dynamic Grammars, see Chapter 7, "Creating Advanced VA
Applications."
•Parameters GrammarName: The name ofthe grammar or phrase list into which the new entry should be loaded.
Entry A pron-string containing the new grammar entry.
•Example: The following example uses the LoadDynamicGrammarEntry API to add the name "Michael Harrison" to the "ContactJNTame" phrase list, set myAudioObj ect=CreateObj ect ( "AudioDO. PronFactory
.1") myAudioObj ect . Connect "localhost" , 9999 newPronName=myAudioObject . GeneratePronFromName ( "Mich ael" , "Harrison" ) myAudioObj ect .Disconnect set myAudioObj ect=nothing vavm. LoadDynamicGrammarEntry ne ProriNarne, "Contact_Name"
•LoadDynamicGrammarFile •Syntax vavm. LoadDynamicGrammarEntry (String GrammarName, String FileName)
•Description
Loads a file of dynamic grammar entries and adds them to the specified grammar. For more information on Dynamic Grammars, see Chapter 7,
"Creating Advanced VA Applications." •Parameters
GrammarName: The name ofthe grammar into which the new entries should be loaded.
FileName The name ofthe file containing the new grammar entries.
•LogEntry
•Syntax vavm. LogEntry (String Entry [ , long Level] ) •Description Writes a text-string message to the debug file.
•Parameters Entry: A string containing the message to be written to the debug log. Level: (Optional) The verbosity level ofthe message. The valid range of values for Level is 1 to 5, with one indicating a brief message and 5 indicating a very verbose message. (To shorten the length of log files, Administrators can set an option to filter out messages with a verbosity greater than a particular level.)
•PlaceCall
•Syntax vavm. PlaceCall (String PhoneNumber) •Description
Places a call to the specified number. When the call is connected, the assistant will resume execution.
Note: This method is used only when the VA application initiates contact with the user. When the user calls into the NA and then wishes to dial another number, the TransferCall method should be used. If the call cannot be successfully placed (e.g. no available phone line, the number dialed is busy) an error will be thrown. •Parameters
PhoneNumber: The phone number to be called. .See Also
TransferCallO
•PlayPrompt
•Syntax vavm. PlayPrompt (String Expression)
•Description Plays the prompt resource specified in Expression. For more information on the PlayPrompt API, see Chapter 6, "Using Resources."
•Parameters Expression: A string indicating the prompt resource and/or text to be played. The expression can contain macros and parameters that will be interpreted by the virtual machine before the expression is processed.
Table 6: PlayPrompt Macro Definitions
Parameter Description
%silence(String duration) Pauses voice/sound output for the number of milliseconds specified by duration.
%string(String Translates the text specified by output jstring to output _str ing) speech.
%istring(String Translates the text specified by outputjstring to output jstring, String type) speech, using intelligent formatting. For a list of the recognized values for the output type, see the table in the TTSFile entry above.
%num(String Translates the specified text string as a number. number jstring) "99", for example, will be read to the user as "ninety-nine".
%play(String filename) Plays the sound file specified by filename. %record(String/z/e7-α7.-e) Records the user's voice input into the specified file. (This file will saved in Sphere format.)
%phone(String number) Translates the text specified by number into speech in the format of a telephone number. "5553322", for example, would be translated "five-five-five- three-three-two-two' '.
%date(Date outputjlate) Translates the Date variable specified by outputjlate into speech.
%idate(String datestring) Intelligently translates the text in datestring to date-formatted speech.
%time(Date output ime) Translates the Date' variable specified by outputjime into speech.
%itime(String timestring) Intelligently translates the text in timestring to time-formatted speech. •PlayWaveform
•Syntax vavm. PlayWaveform (String FileName [ , String Type] )
•Description Plays the specified wave-form file.
•Parameters
FileName: The name ofthe wave-form file to be played
Type (Optional) Identifies the file's type. Cuπently, two types of sound files are supported: "Riff (Microsoft Riff Format, .wav) and "Sphere" (Sphere format). If no type is specified, the method will attempt to detect the type automatically from the file.
•RecognizeUtterance »Syntax oRec=vavm. RecognizeUtterance (String Grammar, String Utte anceFile)
•Description
Takes the specified sound file and attempts to recognize it, using the phrase list and rales defined by the specified grammar. The method returns a
Recognition object which can then be used by the application in a manner similar to the standard Rec object.
•Return Values
Returns a Recognition object with its parameters set according to the rules defined in the specified grammar.
•Parameters
Grammar: The name ofthe grammar to be used in recognizing the sound file.
UtteranceFile:ThG name ofthe sound file to be recognized. .See Also
EnrolϊUtteranceO •RecordWaveform
•Syntax Length=vavm. RecordWaveform (String FileName [ , String
Type] )
•Description
Records voice input from the VUI into a sound file. The recording starts when the user begins speaking and terminates when the user pauses. •Return Values
RecordWaveform returns a long integer containing the length (in milliseconds) ofthe recorded clip. •Parameters
FileName: The name file in which the recording should be saved. Type (Optional) Identifies the file's type. Cuπently, two types of sound files can be specified: "RIFF" (Microsoft's Riff/wav Format,) and "SPHERE"
(Sphere format). If no type is specified, the recording will be saved as a
Sphere file.
•SelectDiscourse
•Syntax vavm. SelectDiscourse (String DiscourseName)
•Description Switches the flow of control to the indicated Discourse. •Parameters
DiscourseName: The name ofthe discourse to be selected.
•SetlnactivityTimeout *Syntax vavm. SetlnactivityTimeout (long Timeout)
•Description Sets the amount of time that must pass without input from the VUI before an Inactivity event is fired.
•Parameters Timeout: A long integer indicating the timeout value in milliseconds.
•SetParameter •Syntax vavm. SetParameter (String ParameterName, String Value)
•Description
Sets a component-level parameter to the specified value. The available parameters are listed in the table below.
•Parameters ParameterName: The name of the parameter to be set.
Value The value to be assigned to the parameter.
•See Also
GetParameterO
Table 7: Component-Level Parameters
Parameter Description
VAEngine.CoProcess
InitialVADFile
Port
PortStatus
TelephonyProgID
SpeechRecProgID The ProglD ofthe Recognition engine that will be used by the VA Application.
TTSProgID The ProglD ofthe Text-to-Speech server that will • be used by the VA Application.
InactivityTimeout The period in milliseconds that must pass without input from the VUI before an Inactivity event is fired.
EnableDetailedLogging A boolean value indicating whether logging should be detailed (TRUE) or brief (FALSE). •SetSpeechRecognitionParameter •Syntax vavm. SetTelephonyParameters (String ParameterName , String FileName)
•Description Sets the value of one ofthe configuration parameters for the speech recognition engine. The set of available parameters are listed in the table below.
•Parameters ParameterName: The name of the parameter to be set Value: The value to which the parameter should be set.
Table 8: Speech Recognition Parameters
Parameter Value Domain Description
UtteranceFileName string The base filename that will be used for saving temporary utterance files.
UtteranceDirectory Specifies the directory in which temporary utterance files will be written.
UsePlayfileDelay
RecognitionTimeout rec.numNBest integer The number of possible matching words that will be returned by the speech recognition engine. For most VA applications, this parameter should always be set to 1. rec. GenParti alresults rec.ConfidenceRejectionThres integer (0-100) The minimum confidence hold level that the speech recognition engine must reach before a phrase is considered to be properly recognized. config.ServerHostname config.ServerPort , config.RecClientHostname config.RecClientPort lm.Addresses
CompilationServer
CompilationServerPort
•S etTelephonyParameter
•Syntax vavm. SetTelephonyParameter (String ParameterName , String FileName)
•Description Sets the value of one ofthe configuration parameters for the telephony interface. The set of available telephony parameters are listed in the table below.
•Parameters ParameterName: The name ofthe parameter to be set Value: The value to which the parameter should be set. Table 9: Telephony Parameters
Parameter Value Domain Description
Audio. Provider "native" Audio.dialogicAntares "TRUE", "FALSE"
Audio.dialogic.ISDN Audio.dialogic.Lines Audio. dialogic. SecondaryDev "21-23" ice
Audio.dialogic.PooledSecond "TRUE","FALS aryDevice E"
Audio .inputVolume integer audio.OutputVolume integer audio.Device audio.BargelnSNR integer client.AllowBargeln "TRUE", Determines whether the
"FALSE" barge-in feature is active or inactive. Barge-in interrupts the current output stream when new user voice input is received, allowing a caller to interrupt the VA's output. clientKillPlaybackOnBargeln "TRUE",
"FALSE" client.NormalizePrompt client.NormalizePromptLevel ep.ThresholdSnr integer ep. Starts econds float ep.AdditionalStartSilence float ep.EndSeconds float
•Shutdown
•Syntax vavm. Shutdown
•Description Shuts down the application. When Shutdown is called, the VAEngine process executing the application will be stopped.
•SignalEvent
•Syntax vavm. SignalEvent ( long EventlD, long Severi ty, String
Description) •Description Signals the VA Site Manager that an event has occurred. The Site Manager can be configured to take actions (such as shutting down the applications or paging the system administrator) when a specific event is signaled. •Parameters
EventlD: The numeric ID coπesponding to the event. A set of events have been defined by the SDE, which uses them internally to log messages and alert administrators. This events are listed in the table below. A developer can also create custom events using any Event ID he or she chooses, provided it is not already in use by the SDE
Severity A long integer indicating the severity ofthe event (see table below).
Description A string describing the event (for use in logging).
Table 10: Event Severity Levels
Level Indicates
0 Success
1 Informational
2 Warning
3 Eπor
Table 11 : Predefined SDE Events
Event Descriptor Event ID
SDE_VAVM_ALREADY_STARTED Ox C0000001L
SDE_VAVM_ERROR_PARSΓNG_VADL_STRTNG 0xC0000002L
SDE_VAVM_ERROR_PARSiNG_VADL_FiLE 0xC0000003L
SDE_VAVM_EXTERNAL_MODULE_NOT_FOUND 0xC0000004L
SDE_VAVM_MODULE_NAME_ALREADY_EXPORT OxC0000005L ED
SDE_VAVM_START_DISCOURSE_NOT_SPECIFIED 0xC0000006L
SDE_VAVM_DISCOURSE_NOT_DEFINED 0xC0000007L
SDE_TTS_NOT_TNITIALIZED Ox C0000008L
SDE TTS NULL STRING Ox C0000009L Event Descriptor Event ID
SDE_TTS_UN NOWN_TYPE Ox C000000AL
_AUDIO_STREAM_NOT_DEFINED Ox C000000BL
SDE_AUDIO_STREAM_OVERFLOW OxCOOOOOOCL
SDE_TTS_RENDER_STRING_ERROR OxCOOOOOODL
SDE_TTS_SYNTHESIS_FAILED Ox C000000EL
SDE_WRITE_OPERATION_FAILED Ox COOOOOOFL
SDE_VAVM_ΓNTERNAL_ERROR Ox C0000010L
SDE_VAVM_GETSCRIPTINGHOST_FAILED OxCOOOOOl lL
SDE_VAVM_ΓNVALID_DISPID_FOR_NAMESPACE 0xC0000012L
SDE_VAVM_NAMESPACE_MEMBER_NOT_DEFΓNE Ox C0000013L
D
SDE_VAVM_ΓNVALID_DISPID_FOR_RECRESULT Ox C0000014L
SDE_VAVM_SCRIPT_ITEM_NAME_ALREADY_DEF 0x C000OO15L ΓNED
SDE_VAVM_SHUTDOWN Ox C0000016L
SDE_VAVM_REMOTE_HANGUP 0x C0000017L
SDE_VAVM_BARGE_ΓN Ox C0000018L
SDE_VAVM_NO_ACTIVE_CALL 0x C0000019L
SDE_VAVM_RECOGNITIONJERROR Ox C000001AL
SDE_NANM_RECOGNITION_UNRECOGNIZED Ox C000001BL
SDE_SPEECH_ΓNITIALIZE_ERROR Ox COOOOOlCL
SDE_SPEECH_UNTNITIALIZE_ERROR Ox C000001DL
SDE_SPEECH_GRAMMAR_ERROR Ox C000001EL
SDE_SPEECH_ABORT_ERROR Ox C000001FL
SDE_SPEECH_RECOGNITION_ERROR Ox C0000020L
SDE_SPEECH_SETPARAMETER_ERROR Ox C0000021L
SDE_TELEPHONY_TNITIALIZE_ERROR Ox C0000022L
SDE_TELEPHONY_UNEVΠTIALIZE_ERROR Ox C0000023L
SDE_TELEPHONY_ANSWER_CALLJERROR Ox C0000024L
SDE_TELEPHONY_TERMΓNATE_CALL_ERROR Ox C0000025L
SDE_TELEPHONY_PLACE_CALL_ERROR Ox C0000026L Event Descriptor Event ID
SDE_TELEPHONY_TRANSFER_CALL_ERROR Ox C0000027L SDE_TELEPHONY_RECORD_ERROR Ox C0000028L SDE_TELEPHONY_STOP_RECORD_ERROR Ox C0000029L SDE_TELEPHONY_PLAY_ERROR Ox C000002AL
SDΈ_TELEPHONY_SETPARAMETER_ERROR Ox C000002BL
SDE_TELEPHONY_CHANNEL_CLOSED Ox C000002CL
SDE_VAMANAGER_PROCESSID_OUT_OF_RANGE Ox C000002DL
SDE_VAMANAGER_DELETED_PROCESS Ox C000002EL SDE_NAMANAGER_PARAMETER_BUFFER_TO_S Ox C000002FL
MALL
SDE_VAMANAGER_PARAMETER_CHANGE Ox 40000030L
SDE_VAMANAGER_PERFMON_F AILED Ox C0000031L
SDE_INSUFFICIENT_MEMORY_RESOURCES Ox C0000032L
SDE_COUNTER_ARRAY_TO_SMALL Ox C0000033L
SDE_ΓNACTIVITY_TIMEOUT Ox C0000034L
SDE_VAVM_INITIALIZATION_ERROR Ox C0000035L
SDE_VAVM_TOPIC_NOT_FOUND Ox C0000036L
SDE_VAVM_STARTUP_ERROR Ox C0000037L
SDE_VAVM_STARTUP_COMPLETE Ox 00000038L
SDE_VAVM_NOT_IMPLEMENTED Ox C0000039L
SDE_VAVM_GRAMMAR_NOT_DEFINED_FOR_DIS Ox
COURSE C000003AL
SDE_VAVM_SCRIPTING_ERROR Ox C000003BL
SDE_TTS_ΓNITIALIZE_ERROR Ox C000003CL
SDE_BEGIN_CALL Ox 4000003DL
SDE_USER_LOGΓN Ox 4000003EL
SDE_END_CALL Ox 4000003FL
SDE_TTS_TOO_MANY_CHANNELS Ox 80000040L
SDE_TTS_SEND_FAILED Ox 80000041L
SDE_SYSTEM_ERROR Ox C0000042L
SDE DCOM ERROR Ox C0000043L Event Descriptor Event lD
SDE_TTS_ABORTED Ox 00000044L
SDE_TTS_CLIENT_CONNECTED Ox 40000045L
SDE_TTS_CLIENT_DISCONNECTED Ox 40000046L
SDE_VAMANAGER_PROCESS_TERMINATED Ox C0000047L
SDE_VAMANAGER_PROCESS_STARTING Ox 40000048L
SDE_VAMANAGER_PROCESS_FAILED_TO_START Ox C0000049L
SDE_TTS_UNKNOWN_STRING_TYPE Ox C000004AL
SDE_VAVM_PROMPT_RESOURCE_NOT_FOUND Ox C000004BL
SDE_VAVM_PROMPT_ΓNVALΓD_RESOURCE Ox C000004CL
SDE_VAVM_ΓNVALID_PROMPT_EXPRESSION Ox C000004DL
SDE_VAVM_ΓNVALID_PROMPT_RESOURCE Ox C000004EL SDE_VAMANAGER_WRITE_TO_LOG_FILE_FAILE Ox C000004FL
D.
SDE_VAMANAGER_ΓNVALID_COMPONENT_ID Ox C0000050L
SDE_VAMANAGER_PROCESS_NOT_FOUND 0x C0000051L
SDE_VAMANAGER_PROCESS_ALREADY_EXIST Ox C0000052L
SDE_VAMANAGER_CANNOT_DESTROY_RUNNΓN Ox C0000053L G_PROCESS
SDE_VAMANAGER_COMPONENT_NOT_FOUND Ox C0000054L
SDE_VASERVER_USER_ALREADY_SELECTED Ox C0000055L
SDE_VASERVER_USER_NOT_SELECTED Ox C0000056L
SDE_VASERVER_USER_NOT_FOUND Ox C0000057L
SDE_VASERVERJ SER_ALREADYJEXIST Ox C0000058L SDE_VASERVER_USER_UPDATE_FAILED Ox C0000059L SDE_VASERVER_USER_CREATION_F AILED Ox C000005AL
SDE_VASERVER_UNKNOWN_AUTH_METHOD Ox C000005BL
SDE_VASERVER_USER_NOT_AUTHENTICATED Ox C000005CL
SDE_VASERVER_UNKNOWN_IDENT_METHOD Ox C000005DL
SDE VASERVER DUP IDENT VALUE Ox C000005EL Event Descriptor Event ID
SDE_VASERVER_PARAMETER_CREATION_FAILE Ox C000005FL D
SDE_VASERVER_PARAMETER_UPDATE_FAILED Ox C0000060L
SDE_VASERVER_PARAMETER_BUFFER_TO_SMA Ox C0000061L LL
SDE_VASERVER_PERMISSION_DENIED Ox C0000062L
SDE_VASERVER_DATABASE_ERROR Ox C0000063L
SDE_VASERVER_RULE_CREATION_FAILED Ox C0000064L
SDE_VASERVER_RULEID_ALREADY_EXIST Ox C0000065L
SDE_VASERVER_RULEID_NOT_FOUND Ox C0000066L
SDE_VASERVER_USER_HAS_PENDING_REQUEST Ox C0000067L
SDE_POPMON_NOT_ΓNITIALIZED Ox C0000068L
SDE_POPMON_ALREADY_ΓNITIALIZED Ox C0000069L
SDE_POPMON_RULES_ENGINE_NOT_SPECIFIED Ox C000006AL
SDEJPOPMONJDUPLICATEJ ULE Ox C000006BL
SDEJPOPMONJUSERJRELEASED Ox 4000006CL
SDE_POPRULE_SYNTAX_ERROR Ox C000006DL
SDE_POPRULE_UNDEFINED_KEYWORD Ox C000006EL
SDE_POPRULE_WRONG_NUM_PARAMS Ox C000006FL
SDEJPOPRULE_EMPTY_RULE Ox C0000070L
SDE_SITEMANAGER_SELECTSITE_FAILED Ox 8000007IL
SDE_SITEMANAGER_ADDSITE_F AILED Ox 80000072L
SDE_SITEMANAGER_REMOVESITE_F AILED Ox 80000073L
SDE_SITEMANAGER_ENUMERATION_ERROR Ox 80000074L
SDE_SITEMANAGER_SELECTCOMPUTERJFAILED Ox 80000075L
SDE_SITEMANAGER_ADDCOMPUTER_F AILED Ox 80000076L
SDE_SITEMANAGER_REMOVECOMPUTER_FAILE Ox 80000077L
D
SDE_SITEMANAGER_SETPARAMETER_FAILED Ox 80000078L SDE_SITEMANAGER_CONNECT_FAILED Ox C0000079L SDE_SITEMANAGER_SETSITENOTIFYJFAILED Ox 8000007AL SDE_SITEMANAGER_SELECTPROCESS_FAILED Ox 8000007BL Event Descriptor Event lD
SDE_SITEMANAGER_ADDPROCESS_FAILED Ox 8000007CL
SDE_SITEMANAGER_PROCESSCOMMAND_FAILE Ox 8000007DL D
SDE_SITEMANAGER_CONNECTLOCAL_FAILED Ox C000007EL
SDE_SITEMANAGER_CONNECTREMOTE_FAILED Ox C000007FL
SDE_SITEMANAGER_ADDUSERNOTIFY_FAILED Ox 80000080L
SDE_SITEMANAGER_SETUP_FAILED Ox 8000008IL
SDE_SITEMANAGER_DATASOURCE_FAILURE Ox C0000082L
SDE_SITEMANAGER_CONNECTION_TIMEOUT Ox 80000083L
SDE_SITEMANAGER_COMPONENTMANAGER_FAl Ox 80000084L LURE
SDE_SITEMANAGER_VAMANAGER_CONNECTIO Ox 40000085L
N SDE_VASERVER_APP_NOT_FOUND Ox C0000086L SDE_VAPROCESS_NOT_STARTED Ox C0000087L SDE_UNKNOWN_PAGE_TYPE Ox 80000088L SDE_UNKNOWN_PAGE_KEYWORD Ox 80000089L SDE_PAGE_FILE_SYNTAX_ERROR Ox 8000008AL SDE_APPLICATION_GENERAL_SUCCESS Ox 0000008BL SDE_APPLICATION_GENERAL_INFORMATIONAL Ox 4000008CL SDE_APPLICATION_GENERAL_WARNTNG Ox 8000008DL SDE APPLICATION GENERAL ERROR Ox C000008EL
•SuspendRecognition
•Syntax vavm. SuspendRecognition (String GrammarName)
•Description Suspends speech recognition on the specified grammar. This call should be used when an application needs to complete a processing task before acting on new user commands or events. To end the suspension, use the ResumeRecognition API call.
•Parameters GrammarName: The name ofthe grammar to be suspended
•See Also ResumeRecognitionO
•TerminateCall
•Syntax vavm . TerminateCall
•Description
Terminates the cuπent call Terminate disconnects the phone line for the current call and resets the application to wait for a new incoming call.
•TranscriptionEntry
•Syntax vavm. ranscriptionEntr (long Operation, String StringArgl , String StringArg2 , long IntArgl , long IntArg2) •Description Writes an entry to the VA call-transcription facility. •Parameters Operation A long integer indicating the type of operating being performed. The valid values for Operation are the same as for the SignalEvent event types. For a complete list defined events, see the SignalEvent entry above. A developer can define custom operation IDs if needed, provided that the ID has not already been assigned to a particular operation or event.
StringArgl, StringArg2 String values that will be written to the transcription. These values can indicate whatever the developer wishes IntArgl, IntArg2 Integer values that will be written to the transcription. These values can indicate whatever the developer wishes. •TransferCall
•Syntax vavm.TransferCall (String PhoneNumber)
•Description Transfers the user's call to the specified phone number. When the transfer is complete, the application resets and awaits a new incoming call. If the transfer cannot be completed (e.g. the line is busy), a trappable error will be thrown.
•Parameters PhoneNumber: The phone number to which the caller will be transfeπed
•See Also PlaceCallO
•TTSFile »Syntax vavm.TTSFile (String FileName [, String Type] )
•Description Reads a file and converts its contents from text to speech.
•Parameters FileName: The name ofthe file to be converted to speech.
Type (Optional) Identifies the type of text being converted. If omitted, the method will convert the file's text literally into speech. The recognized values for Type are listed in the table below.
Value Description html Translates HTML files to speech, ignoring tags phone Reads numbers by each digit rather than the full value.
"4532", for example, will be read "four-five-three-two" when the phone type is specified. (If no type were specified, it would be read "four thousand five hundred thirty two.") rtf Translates rich text files (.rtf) to speech, ignoring formatting information. email.body email.subject email.from email.date
•TTSString
•Syntax vavm. TTSString (String TextString [ , String Type] )
•Description Converts a string from text to speech.
•Parameters TextString: The string containing the text to be converted to speech. Type (Optional) Identifies the type of text being converted. If omitted, the method will convert the file's text literally into speech. For a list ofthe recognized values for Type, see the table in the TTSFile entry above.
.WaitForCall
•Syntax vavm.WaitForCall
•Description Pauses execution of a script until a new incoming call is received. This API is needed only in applications that use a manual call flow, and it should always be followed by an AnswerCall API call.
•Example The following code shows an application with manual call flow that suspends execution and waits for a new call. vavm.SignalEvent 0,1, "Waiting for next call . .
vavm. aitForCall vavm.SignalEvent 0,1, "New call received." vavm.AnswerCall vavm. SelectDiscourse "DMainMenu" •See Also AnswerCall
•WaifForPendingRequests
•Syntax vavm . WaifForPendingRequests
•Description Pauses execution of a script until all pending requests (such as voice output) are complete.
The above description ofthe prefeπed embodiments detail many ways in which the present invention can provide its intended purposes. Programmers skilled in the art are able to produce workable computer programs to practice the teachings set forth above. While several prefeπed embodiments are described in detail hereinabove, it is apparent that various changes might be made without departing from the scope ofthe invention, which is set forth in the accompanying claims.

Claims

What is claimed is:
1. A virtual assistant engine for running a virtual assistant application, comprised of: an interpreter for parsing, storing in a computer memory, and executing source code for a virtual assistant application; a scripting object that provides methods and properties for creating a virtual assistant application; and an abstraction layer for interfacing with a speech recognition server, telephony hardware and a text to speech server, wherein the scripting object provides the interface between the abstraction layer and the virtual assistant application.
2. The virtual assistant engine of claim 1, wherein the interpreter is comprised of a parser for parsing the source code and storing the parsed source code in the computer memory.
3. The virtual assistant engine of claim 2, wherein the parser is constructed using the Purdue Compiler Constructor Tool Set.
4. The virtual assistant engine of claim 2, wherein the interpreter is further comprised of a state machine for executing the stored source code.
5. The virtual assistant engine of claim 4, wherein the state machine determines the tasks to be performed by the virtual assistant application responsive to input from the user.
6. The virtual assistant engine of claim 4, wherein the state machine manages a barge-in command received from the user.
7. The virtual assistant engine of claim 4, wherein the state machine manages external events responsive to output from the virtual assistant application, the output indicating that an external event has occurred, and is configured to cause the user to be notified ofthe occuπence ofthe external event.
8. The virtual assistant engine of claim 7, wherein the external event is selected from the group consisting of receipt of a telephone call, placement a telephone call, receipt of an electronic message, a meeting reminder, a task reminder, a change in a database and a change in monitored information.
9. The virtual assistant engine of claim 4, wherein the interpreter is further comprised of: a scripting host object; and a scripting engine, whereby the scripting host object interfaces with the scripting engine.
10. The virtual assistant engine of claim 9, wherein the scripting engine executes scripts written in a scripting language selected from the group consisting of VBScript, JavaScript, Perl, REX and Python.
11. The virtual assistant engine of claim 9, wherein the interpreter is further comprised of a session object.
12. The virtual assistant engine of claim 11, wherein the session object manages telephone calls to and from a virtual assistant application user.
13. The virtual assistant engine of claim 11, wherein the session object is comprised of a call state manager for tracking the status of a telephone call to or from the virtual assistant application user.
14. The virtual assistant engine of claim 13, wherein the status ofthe telephone call is selected from the group consisting of connected, on hold and in conference.
15. The virtual assistant engine of claim 11 , wherein the session object is further comprised of a call object for managing calls to the virtual assistant user from the virtual assistant and to the virtual assistant from the virtual assistant user.
16. The virtual assistant engine of claim 11 , wherein the session object is configured to generate and store in the computer memory a log of information about a virtual assistant application user session.
17. The virtual assistant engine of claim 1 , wherein the information log is comprised of call statistics information, the call statistic information being comprised of information about the length ofthe user session, the DNIS number and the ANI number.
18. The virtual assistant engine of claim 16, wherein the information log is comprised of call transcription information, the call transcription information being comprised of information about commands issued by the virtual assistant user and the responses from virtual assistant to the commands during the user session.
19. The virtual assistant engine of claim 16, wherein the information log is comprised of speech recognition information, the speech recognition information being comprised of latency information and speech recognition performance.
20. The virtual assistant engine of claim 19, wherein the latency information is comprised of information about the amount of time taken by the virtual assistant to respond to commands from the user during the user session.
21. The virtual assistant engine of claim 19, wherein the speech recognition performance information is comprised of information about speech recognition accuracy and the amount of processing time for speech recognition.
22. The virtual assistant engine of claim 16, wherein the information log is comprised of system information, the system information being comprised ofthe amount of time spent by the virtual assistant receiving input from the user, processing the input received from the user, and providing output to the user.
23. The virtual assistant engine of claim 11, wherein the interpreter is further comprised of a discourse manager.
Ill
24. The virtual assistant engine of claim 23, wherein the discourse manager activates the appropriate discourse responsive to input from the virtual assistant application user.
25. The virtual assistant engine of claim 24, wherein the discourse manager activates the appropriate grammar responsive to the active discourse.
26. The virtual assistant engine of claim 1, wherein the scripting object is configured to provide 'output to the user asynchronously.
27. The virtual assistant engine of claim 26, wherein the output is generated by rendering text into speech.
28. The virtual assistant engine of claim 26, wherein the output is generated by playing recorded prompts.
29. The virtual assistant engine of claim 26, wherein the scripting object is further comprised of a management interface.
30. The virtual assistant engine of claim 29, wherein the management interface is configured to generate and store in the computer memory a log of information about virtual assistant application eπors that occurred during a virtual assistant user session.
31. The virtual assistant engine of claim 29, wherein the management interface is configured to enable the management and configuration of a virtual assistant system by a system administrator.
32. The virtual assistant engine of claim 29, wherein the scripting object is further comprised of an interface for managing dynamic grammars.
33. The virtual assistant engine of claim 32, wherein the management of dynamic grammars is comprised of: creating a user specific grammar when a virtual assistant application user session begins; storing the user specific grammar in the computer memory for use by a user during the user session; arid deleting the user specific grammar from the computer memory when the user session ends.
34. The virtual assistant engine of claim 33, wherein the user specific grammar is generated from a user specified database.
35. The virtual assistant engine of claim 32, wherein the scripting object is further comprised of a state machine interface for controlling a state machine for managing external events.
36. The virtual assistant engine of claim 29, wherein the scripting object is further comprised of call management interface for controlling a session object, wherein the session object manages telephone calls to and from a virtual assistant application user.
37. The virtual assistant engine of claim 1, wherein the abstraction layer is comprised of a speech recognition module, a telephony module and a text to speech module.
38. The virtual assistant engine of claim 37, wherein the speech recognition module is comprised of an interface between the scripting object and the speech recognition server.
39. The virtual assistant engine of claim 37, wherein the telephony module is comprised of an interface between the scripting object and the telephony hardware.
40. The virtual assistant engine of claim 39, wherein the telephony hardware is comprised of an adapter for allowing electronic communication between the virtual assistant application and the virtual assistant user, and a conference call adapter.
41. The virtual assistant engine of claim 37, where in the text to speech module is comprised of an interface between the scripting object and the text to speech server.
PCT/US2001/006882 2000-03-06 2001-03-05 Virtual assistant engine WO2001067241A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2001241965A AU2001241965A1 (en) 2000-03-06 2001-03-05 Virtual assistant engine

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US51923400A 2000-03-06 2000-03-06
US09/519,234 2000-03-06

Publications (1)

Publication Number Publication Date
WO2001067241A1 true WO2001067241A1 (en) 2001-09-13

Family

ID=24067432

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2001/006882 WO2001067241A1 (en) 2000-03-06 2001-03-05 Virtual assistant engine

Country Status (2)

Country Link
AU (1) AU2001241965A1 (en)
WO (1) WO2001067241A1 (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6781962B1 (en) 2002-02-26 2004-08-24 Jetque Apparatus and method for voice message control
US7640550B1 (en) 2005-07-28 2009-12-29 Avaya Inc. Context sensitive contact broker
US8805688B2 (en) 2007-04-03 2014-08-12 Microsoft Corporation Communications using different modalities
US8983051B2 (en) 2007-04-03 2015-03-17 William F. Barton Outgoing call classification and disposition
US9396185B2 (en) 2006-10-31 2016-07-19 Scenera Mobile Technologies, Llc Method and apparatus for providing a contextual description of an object
CN108874374A (en) * 2018-05-25 2018-11-23 厦门雅基软件有限公司 A kind of script engine interface abstraction layer and its application method
US10217453B2 (en) 2016-10-14 2019-02-26 Soundhound, Inc. Virtual assistant configured by selection of wake-up phrase
US10338959B2 (en) 2015-07-13 2019-07-02 Microsoft Technology Licensing, Llc Task state tracking in systems and services
US10564815B2 (en) 2013-04-12 2020-02-18 Nant Holdings Ip, Llc Virtual teller systems and methods
US10635281B2 (en) 2016-02-12 2020-04-28 Microsoft Technology Licensing, Llc Natural language task completion platform authoring for third party experiences
US11017028B2 (en) 2018-10-03 2021-05-25 The Toronto-Dominion Bank Systems and methods for intelligent responses to queries based on trained processes

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5745675A (en) * 1996-04-23 1998-04-28 International Business Machines Corporation Object oriented framework mechanism for performing computer system diagnostics
US5809303A (en) * 1995-10-18 1998-09-15 Sun Microsystems, Inc. Device I/O monitoring mechanism for a computer operating system
US6028999A (en) * 1996-11-04 2000-02-22 International Business Machines Corporation System and method for non-sequential program statement execution with incomplete runtime information

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5809303A (en) * 1995-10-18 1998-09-15 Sun Microsystems, Inc. Device I/O monitoring mechanism for a computer operating system
US5745675A (en) * 1996-04-23 1998-04-28 International Business Machines Corporation Object oriented framework mechanism for performing computer system diagnostics
US6028999A (en) * 1996-11-04 2000-02-22 International Business Machines Corporation System and method for non-sequential program statement execution with incomplete runtime information

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6781962B1 (en) 2002-02-26 2004-08-24 Jetque Apparatus and method for voice message control
US7640550B1 (en) 2005-07-28 2009-12-29 Avaya Inc. Context sensitive contact broker
US9396185B2 (en) 2006-10-31 2016-07-19 Scenera Mobile Technologies, Llc Method and apparatus for providing a contextual description of an object
US8805688B2 (en) 2007-04-03 2014-08-12 Microsoft Corporation Communications using different modalities
US8983051B2 (en) 2007-04-03 2015-03-17 William F. Barton Outgoing call classification and disposition
US10564815B2 (en) 2013-04-12 2020-02-18 Nant Holdings Ip, Llc Virtual teller systems and methods
US11023107B2 (en) 2013-04-12 2021-06-01 Nant Holdings Ip, Llc Virtual teller systems and methods
US10338959B2 (en) 2015-07-13 2019-07-02 Microsoft Technology Licensing, Llc Task state tracking in systems and services
US10635281B2 (en) 2016-02-12 2020-04-28 Microsoft Technology Licensing, Llc Natural language task completion platform authoring for third party experiences
US10217453B2 (en) 2016-10-14 2019-02-26 Soundhound, Inc. Virtual assistant configured by selection of wake-up phrase
US10783872B2 (en) 2016-10-14 2020-09-22 Soundhound, Inc. Integration of third party virtual assistants
CN108874374A (en) * 2018-05-25 2018-11-23 厦门雅基软件有限公司 A kind of script engine interface abstraction layer and its application method
US11017028B2 (en) 2018-10-03 2021-05-25 The Toronto-Dominion Bank Systems and methods for intelligent responses to queries based on trained processes
US11928112B2 (en) 2018-10-03 2024-03-12 The Toronto-Dominion Bank Systems and methods for intelligent responses to queries based on trained processes

Also Published As

Publication number Publication date
AU2001241965A1 (en) 2001-09-17

Similar Documents

Publication Publication Date Title
US7286985B2 (en) Method and apparatus for preprocessing text-to-speech files in a voice XML application distribution system using industry specific, social and regional expression rules
US6466654B1 (en) Personal virtual assistant with semantic tagging
US7920678B2 (en) Personal virtual assistant
KR101560600B1 (en) Unified messaging state machine
US20110123006A1 (en) Method and Apparatus for Development, Deployment, and Maintenance of a Voice Software Application for Distribution to One or More Consumers
US7921214B2 (en) Switching between modalities in a speech application environment extended for interactive text exchanges
EP1486054B1 (en) System and method for providing a message-based communications infrastructure for automated call center operation
US5724406A (en) Call processing system and method for providing a variety of messaging services
US7406418B2 (en) Method and apparatus for reducing data traffic in a voice XML application distribution system through cache optimization
US20050033582A1 (en) Spoken language interface
WO2010129056A2 (en) System and method for speech processing and speech to text
US20090144131A1 (en) Advertising method and apparatus
WO2001067241A1 (en) Virtual assistant engine
US20030055649A1 (en) Methods for accessing information on personal computers using voice through landline or wireless phones
US7451086B2 (en) Method and apparatus for voice recognition
WO2000018100A2 (en) Interactive voice dialog application platform and methods for using the same
Gallivan et al. VoiceXML absentee system
WO2001075555A2 (en) Personal virtual assistant
Herman et al. Between laboratory and field trial: Experience with a communications services testbed
Cone Voice technologies in advanced computer systems
Ångström et al. Royal Institute of Technology, KTH Practical Voice over IP IMIT 2G1325
Amores et al. Implementation of a Natural Command Language Dialogue System

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP