WO2003091827A2 - Systeme et procede pour creer des applications vocales - Google Patents

Systeme et procede pour creer des applications vocales Download PDF

Info

Publication number
WO2003091827A2
WO2003091827A2 PCT/GB2002/001929 GB0201929W WO03091827A2 WO 2003091827 A2 WO2003091827 A2 WO 2003091827A2 GB 0201929 W GB0201929 W GB 0201929W WO 03091827 A2 WO03091827 A2 WO 03091827A2
Authority
WO
WIPO (PCT)
Prior art keywords
client
server
format
voice
language
Prior art date
Application number
PCT/GB2002/001929
Other languages
English (en)
Other versions
WO2003091827A3 (fr
Inventor
Emmanuel Rayner
Original Assignee
Fluency Voice Technology Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fluency Voice Technology Limited filed Critical Fluency Voice Technology Limited
Priority to AU2002253334A priority Critical patent/AU2002253334A1/en
Priority to PCT/GB2002/001929 priority patent/WO2003091827A2/fr
Publication of WO2003091827A2 publication Critical patent/WO2003091827A2/fr
Publication of WO2003091827A3 publication Critical patent/WO2003091827A3/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/547Remote procedure calls [RPC]; Web services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/20Software design

Definitions

  • the present invention relates to the development and deployment of applications for access via a voice browser or similar in a client-server environment.
  • WWW Worldwide Web
  • the WWW has moved beyond the mere provision of data, and is also widely used for performing transactions, such as on-line shopping, making travel reservations, and so on.
  • the WWW is based on pages presented in Hypertext Markup Language (HTML), which are accessed from a server over the Internet by a client using the hypertext transport protocol (HTTP).
  • HTTP hypertext transport protocol
  • the client typically a conventional personal computer, normally runs browser software for this purpose, such as Microsoft Internet Explorer.
  • This type of client generally connects to the Internet though a modem/telephone link to an Internet Service Provider (ISP), or over a local area network (LAN) to an Internet gateway.
  • ISP Internet Service Provider
  • LAN local area network
  • Modern desktop computers generally support a wide range of multimedia capabilities, including graphics, sound, animation, and so on, which can be exploited by WWW content.
  • WML Wireless Markup Language
  • WAP Wireless Application Protocol
  • a WWW server 110 provides content that is accessible over the Internet 120 (or any other suitable form of data connection, such as intranet, LAN, etc). Also shown in Figure 1 is a client system 130, which interacts with the WWW server 110 over the Internet 120. Client system 130 is further connected to a conventional telephone 150 via the public switched telephone network (PSTN) 140 (although mobile/cellular telephone networks, etc could also be used).
  • PSTN public switched telephone network
  • the client system 130 acts as an intermediary in the overall architecture between the WWW server 110 and the user of telephone 150, and is sometimes referred to as a voice browser.
  • the user of telephone 150 typically dials the number corresponding to client system 130, which may be implemented in known fashion by an interactive voice response (IVR) system.
  • the client system 130 accesses WWW server 110 in order to retrieve information for handling the call, converts the information into audio using a text to speech (TTS) capability, and then transmits this audio over the telephone network to the user at telephone 150.
  • client system 130 can also receive audio input from the user of telephone 150 and convert this into a form suitable for transmission back to WWW server 110.
  • Such audio input is generally in the form of dual tone multiple frequency (DTMF) key presses and/or spoken input.
  • client system 30 includes a speech recognition (Reco) system.
  • a typical caller transaction using the system of Figure 1 is likely to involve an audio dialogue between the caller and the client system 130, and/or a HTTP based dialogue between the client system 130 and the WWW server 110.
  • the client system 130 may know itself to prompt the caller for this information, and then send a complete request to the WWW server 110.
  • the client system 130 may send an initial request without date/time information to the WWW server 110, and will then be instructed by the WWW server 110 to obtain these details from the caller. Accordingly, the client system 130 will collect the requested information and forward it to the WWW server 110, whereupon the desired response can be provided.
  • general WWW server 110 must be specifically adapted to handle voice browsing, since clearly for an audio telephone connection all graphics and such-like will be discarded.
  • the spoken interface also means that only a very limited amount of information can be presented to a caller in a reasonable time, compared to a normal HTML page.
  • the caller would normally be asked one question on the form at a time, rather than having the whole form presented to them in its entirety, as on a computer screen.
  • the WWW server 110 in the architecture of Figure 1 does not have to function as a conventional WWW server at all. Rather, in many implementations the server 110 is dedicated to voice applications, with user access only through an audio interface. Moreover, server 110 need not necessarily be connected to the Internet, but could be linked to the client system 130 via an intranet, extranet, or any other appropriate communications facility (which may simply be a point-to-point link, rather than a broader network).
  • VoiceXML is rapidly establishing itself as the de facto industry standard for interactive voice- enabled applications using an architecture such as shown in Figure 1.
  • VoiceXML is a scripting language based on (technically a schema of) the extensible mark-up language (XML), which in turn is a development of the HTML.
  • XML itself is described in many books, for example "XML for the Worldwide Web” by Elizabeth Castro, Peachpit Press, 2001 (ISBN 0-201-71098-6), and "The XML Handbook” by Charles Goldfarb and Paul Prescod, Prentice Hall, 2000 (ISBN 0-13-014714-1).
  • VoiceXML provides a platform independent language for writing voice applications based on audio dialogs using TTS and digitised audio recordings for output, and speech and DTMF key recognition for input.
  • dialogs There are two main types of dialog, a form, which presents information and gathers input, and a menu, which offers choices of what to do next. (In practice most applications have been developed using only forms, since these can also be used to implement a menu structure).
  • the VoiceXML code is downloaded from a server onto a client providing a VoiceXML browser to render the VoiceXML code to a caller
  • VoiceXML Probably the major attraction of VoiceXML is that it insulates the application writer from needing to know anything about the underlying telephony platform. VoiceXML also has the advantage of being specifically structured to support voice dialogs, although as a programming environment it does suffer from certain limitations and complexities, as will be described in more detail below. A copy of the formal VoiceXML specification plus other information about VoiceXML can be downloaded from www.voicexml.org.
  • VoiceXML VoiceXML
  • Practical deployment systems are almost invariably implemented using dynamically generated VoiceXML, in which at least a portion of the relevant code is only created in response to a particular client request.
  • One typical reason for this is that the data required for a response (such as availability and pricing of a particular item) is almost always stored in a (separate) database system. If a user then requests information about such an item, the response is created on the fly by importing the current availability and pricing data from the database into the returned output, thereby ensuring that the user receives up-to-date information.
  • WWW servers typically use something like Perl-based CGI (common gateway interface) processes or Java servlets in order to handle dynamic page creation (Java is a trademark of Sun Microsystems Inc.). This is a much more challenging implementation framework that a static environment. Experience available to date suggests that in practice there are serious problems involved in building and maintaining dynamic VoiceXML application in this context.
  • CGI common gateway interface
  • client-side code is VoiceXML
  • server-side code is Java or Perl. It is therefore non-trivial to move functionality from one side to the other, since this normally requires the relevant code to be rewritten in a completely different language.
  • advantages and disadvantages in siting code on the server rather than client (or vice versa) and the relative merits of these may change with time or circumstances, or simply be impossible to predict accurately in advance.
  • TellMe Network also supply the"Jumpstart Perl Server Package” fl t ⁇ p://studio.tellme.com/downloads/VoiceXML-Sefver/Server.html " ). which makes it possible to write CGI-based VoiceXML applications in nearly pure Perl. This package will then generate VoiceXML code for execution on the client to perform simple speech input and output as required by the application. Note however that neither of these two packages provides any significant flexibility in terms of movement of code between the server and client in VoiceXML applications.
  • EP-A-1100013 describes a novel XML-based language, referred to as CML (conversational markup language) that can be used to generate multi-modal dialogues.
  • CML conversational markup language
  • a single dialogue can be developed which can then be transformed as appropriate into HTML, WML, VoiceXML, and so on.
  • CML yet another specialised XML language
  • this level of complexity and generality is not needed in many situations where a WWW server site is developed with a specific access mode in mind (e.g. by telephone), and the requirement is to optimise behaviour for this particular access mode.
  • WO 01/73755 describes a system for developing a specialised class of Web-based voice applications that use speech recognition.
  • a Web application with voice functionality can be written in standard script languages such as Jscript or Perlscript, with interface objects used by the script being provided as Active X objects.
  • the script and interface objects are then downloaded for execution on a client system having speech recognition capabilities.
  • the system bypasses VoiceXML altogether, but uses instead a "Teller" interface to process the application on the client. This approach is therefore limited to systems that support such an interface, in contrast to VoiceXML applications that are portable across a range of systems .
  • a method of developing a voice application for a client- server environment The server supports a high-level procedural language in which data objects have a first format, and the client supports a voice mark-up language in which data objects have a second format.
  • the method begins with the steps of writing the voice application in a high-level procedural language, and providing one or more annotations for the voice application.
  • the annotations are indicative of which part of the voice application is to be executed on the server, and which part of the voice application is to be executed on the client.
  • the part of the voice application to be performed on the client is the transformed from the high-level procedural language into the voice mark-up language supported by the client in accordance with the annotations, and the part of the voice application that is to be executed on the server is modified to associate data objects in the first format on the server with data objects in the second format on the client.
  • This approach allows an application to be readily developed as normal code in a high-level procedural language such as Java, without worrying about the details of the client-server interactions. Rather, these are controlled by the set of annotations, which can then be updated later as desired, without having to modify the original application.
  • This separation makes application development much easier, in that an application having the correct logic flow can first be developed in a familiar environment to perform the processing relevant to both server and client. In a subsequent phase, the annotations are added to determine where each part of the application is to be executed. Functionality can thus be moved from client to server, or vice versa, with only minimal changes that do not affect the application's control logic.
  • a set of speech functions that can be invoked from the voice application in a high-level procedural language.
  • these functions can be provided as a set of methods within a utility class.
  • the speech functions are necessarily to be performed on the client, and so any invocations of these speech functions are automatically transformed into the voice mark-up language supported by the client. This simplifies matters for the developer, in that there is no need to include specific annotations for such speech functions. In fact, if no annotations are provided at all, then only these minimum speech functions will be performed on the client, with all the remaining processing being performed on the server.
  • the voice application can divided into three portions: a first portion, which is to be executed on the client, and second and third portions, which are to be executed on the server.
  • the second portion comprises code that interacts directly with the first portion
  • the third portion comprises code that does not interact directly with the first portion.
  • the first portion is transformed from the high-level procedural language into the voice mark-up language supported by the client (typically VoiceXML).
  • the second portion is modified to associate data objects in a first format on the server with data objects in a second format on the client.
  • the third portion is generally not subject to modification (although may be slightly adapted, for example to conform to the particular web server configuration employed).
  • the annotations are used to explicitly identify functions that belong to the second portion of the voice application. If we regard the application functions as being arranged in an invocation hierarchy, than those above the annotated functions belong to the third portion (since the application must commence on the server), while those below the annotated functions belong to the first portion. More specifically, the latter can be identified automatically by determining the transitive closure of functions called by the annotated functions. It will be appreciated that by only having to identify the subset of functions that actually transfer control from the server to the client, the annotation task is considerably simplified.
  • the modification of methods in the second portion in this embodiment involves the replacement of the function by a corresponding proxy that is used to store information specifying the associations between data objects on the server and data objects on the client. (This may also involve a suitable adaptation of the invoking code in the third portion to call the proxy, rather than the original function). These associations are important to ensure that the server and client code behave as a coherent unit.
  • the proxy translates the arguments of its corresponding function into code in the voice mark-up language, and then transfers the code to the client for execution.
  • This facility represents one mechanism for handling dynamic data objects (i.e. those that are only specified at run-time), and so greatly extends the range of applications that can be developed using the above approach.
  • a method of developing a voice application for a client-server environment Typically the server supports a high-level procedural language in which data objects have a first format, while the client supports a voice mark-up language in which data objects have a second format.
  • the method begins with writing the voice application in a high-level procedural language. A part of the voice application is to be executed on the server, and a part of the voice application is to be executed on the client.
  • the voice application is then compiled, and the part of the voice application that is to be executed on the client platform is transformed into the voice mark-up language, while the part of the voice application that is to be executed on the server is modified in order to associate data objects in the first format on the server with data objects in the second format on the client.
  • This approach allows all the application code to be written in a single high level procedural language.
  • this language is Java (or more specifically a subset of Java), but other languages such as C, C++ and so on could be used instead.
  • This has the advantage of generally being a much more familiar programming environment than a standard voice mark-up language, so it is easier for users to attract and retain developers and support staff with the requisite experience.
  • the voice mark-up language is VoiceXML.
  • VoiceXML VoiceXML
  • conditional constructions and loop constructions in the high-level procedural language can be compiled into VoiceXML conditional subdialog calls and recursive VoiceXML subroutine calls, respectively.
  • the VoiceXML specification includes support for ECMAScript-compatible code.
  • functions that are to be executed on the client platform in the voice mark-up language and that do not directly call basic speech functions on the client are compiled into such ECMAScript- compatible code.
  • the motivation for this is that the performance of ECMAScript code on most VoiceXML platforms tends to be better than general VoiceXML.
  • the functions that directly call basic speech functions are retained in VoiceXML, since ECMAScript does not support this functionality.
  • a method of running a voice application for a client-server environment in which a server supports a high-level procedural language in which data objects have a first format, and a client supports a voice mark-up language in which data objects have a second format.
  • the method starts with commencing the voice application on the server, and the voice application then runs on the server until processing is to be transferred to the client.
  • code in the voice mark-up language is dynamically generated on the server from the voice application in a high-level procedural language. This dynamically generated code supports the transformation of server-side data objects from the first format into the second format.
  • the dynamically generated code in the voice mark-up language is then rendered from the server to the client for execution on the client.
  • the voice application is normally commenced in response to a request from the client, which may be received over any appropriate communications facility.
  • client request itself is typically generated in response to an incoming telephone call to the client.
  • the voice application comprises three portions: a first portion that is to be executed on the client, and which is transformed from the high-level procedural language into the voice mark-up language supported by the client; a second portion that is to be executed on the server to interact directly with the first portion, and which is modified to associate data objects in the first format on the server with data objects in the second format on the client; and a third portion that is to be executed on the server, but that does not interact directly with the first portion.
  • the second portion is responsible for dynamically generating code on the server in the voice mark-up language, the dynamically generated code supporting the transformation of the at least one data object from said first format into said second format.
  • This dynamically generated code is then combined with said first portion for rendering from the server to the client for execution on the client.
  • the first portion of the code may itself be dynamically generated at run-time, but in one preferred embodiment is previously generated through a compilation process. This improves performance by avoiding the need to have to generate the first portion of code in the voice mark-up language each time the voice application is run.
  • the second portion maintains a table indicating the association between data objects on the server in the first format and data objects on the client in the second format.
  • the updated version of the object received from the client can be matched to the original version on the server, and this original version then updated accordingly.
  • Another embodiment of the invention provides apparatus for developing a voice application for a client-server environment, in which a server supports a high-level procedural language in which data objects have a first format, and a client supports a voice mark-up language in which data objects have a second format.
  • the voice application is written in a high-level procedural language, and accompanied by one or more annotations indicative of which part of the voice application is to be executed on the server, and which part of the voice application is to be executed on the client.
  • the apparatus comprises: means for transforming the part of the voice application to be performed on the client from the high-level procedural language into the voice mark-up language supported by the client in accordance with the annotations; and means for modifying the part of the voice application that is to be executed on the server in order to associate data objects in the first format on the server with data objects in the second format on the client.
  • Another embodiment of the invention provides apparatus for developing a voice application for a client-server environment, in which a server supports a high-level procedural language in which data objects have a first format, and a client supports a voice mark-up language in which data objects have a second format.
  • the voice application is written in a high-level procedural language.
  • the apparatus comprises a compiler for performing a compilation process on the voice application and includes: means for transforming the part of the voice application that is to be executed on the client platform into the voice mark-up language as part of the compilation process; and means for modifying the part of the voice application that is to be executed on the server in order to associate data objects in the first format on the server with data objects in the second format on the client as part of the compilation process.
  • Another embodiment of the invention provides a server for running a voice application in a client-server environment, in which the server supports a high-level procedural language in which data objects have a first format and a client supports a voice mark-up language in which data objects have a second format.
  • the server includes an application server system for launching the voice application (which includes at least one data object in the first format).
  • the voice application then runs on the server until processing is to be transferred to the client.
  • the server further includes a dynamic compiler for generating code on the server in said voice mark-up language from the voice application, wherein the dynamically generated code supports the transformation of the data object from the first format into the second format, and a communications facility for rendering the dynamically generated code in the voice mark-up language from the server to the client for execution on the client.
  • Another embodiment of the invention provides a computer program for use in developing a voice application for a client-server environment, in which a server supports a high-level procedural language in which data objects have a first format, and a client supports a voice mark-up language in which data objects have a second format.
  • the voice application is written in a high-level procedural language and accompanied by one or more annotations indicative of which part of the voice application is to be executed on the server, and which part of the voice application is to be executed on the client.
  • the program comprises instructions to perform the steps of: transforming the part of the voice application to be performed on the client from the high-level procedural language into the voice mark- up language supported by the client in accordance with the annotations; and modifying the part of the voice application that is to be executed on the server in order to associate data objects in the first format on the server with data objects in the second format on the client.
  • Another embodiment of the invention provides a compiler for developing a voice application for a client-server environment, in which a server supports a high-level procedural language in which data objects have a first format, and a client supports a voice mark-up language in which data objects have a second format.
  • the voice application is written in a high-level procedural language
  • the compiler includes program instructions for performing a compilation process on the voice application.
  • the compilation process includes: transforming the part of the voice application that is to be executed on the client platform into the voice mark-up language; and modifying the part of the voice application that is to be executed on the server in order to associate data objects in the first format on the server with data objects in the second format on the client.
  • Another embodiment of the invention provides a computer program providing a platform for running a voice application in a client-server environment, in which a server supports a high-level procedural language in which data objects have a first format, and a client supports a voice mark-up language in which data objects have a second format.
  • the program includes instructions for commencing the voice application (which involves at least one data object in the first format) on the server, and running the voice application on the server until processing is to be transferred to the client.
  • the program instructions support the dynamic generation of code on the server in the voice mark-up language, wherein the dynamically generated code supports the transformation of the data object from the first format into the second format.
  • the program instructions then render the dynamically generated code in the voice mark-up language from the server to the client for execution on the client.
  • the above computer programs are executed on one or more machines.
  • execution includes interpreting, such as for some Java code, or rendering, such as for mark-up languages.
  • the computer programs may be preinstalled on disk storage relevant machines, or supplied as a computer program product.
  • Such program product which typically comprises the program instructions stored in/on a medium, may be downloaded over a network (such as the Internet), or supplied as a physical storage device, such as a CD ROM.
  • the program instructions are usually first copied into main memory (RAM) of the machine, and then executed by the processor(s) of the machine.
  • program instructions are normally copied and saved onto disk storage for the machine, and then are executed from this disk storage version, although they may also be executed directly from the CD ROM, etc. It will be appreciated that the apparatus and computer program/computer program product embodiments of the invention will generally benefit from the same preferred features as described above with reference to method embodiments of the invention.
  • the above approach greatly simplifies the process of developing and deploying interactive spoken dialogue applications implemented in a voice mark-up language, such as dynamic VoiceXML, for use in a client-server environment.
  • a compiler, a run-time environment, and other associated software modules are provided, which in the preferred embodiment enable specification of the application in a subset of Java equipped with some speech utility classes.
  • the application program in Java can then be compiled into either server-side or client-side code as desired. More particular, the compilation process results in a mixture of Java and VoiceXML, with the Java running on the server side and the VoiceXML on the client side.
  • the distribution of code between the client side and the server side is controlled by a set of annotations, as set out in a file or other suitable facility.
  • the voice application can also be compiled to run in Java- only environment (typically on a single system).
  • Figure 1 is a schematic illustration of the general use of a voice browser
  • Figure 2 is a schematic diagram illustrating the main components in the voice application development system of the present invention
  • Figure 3 is a simplified schematic diagram illustrating the main components involved in running a voice application in a single processor environment
  • Figure 4 is a flowchart illustrating the compilation of a voice application into dynamic VoiceXML
  • Figure 5 is a simplified schematic diagram illustrating the main components involved in running a voice application in a dynamic client-server VoiceXML environment
  • Figure 6 is a flowchart illustrating the steps performed in running a voice application in a dynamic client-server VoiceXML environment
  • Figures 7A-7K illustrate the communications between the components of Figure 5 in performing the steps of Figure 6; and Figure 8 is a flowchart providing an overview of the voice application development process.
  • Figure 2 depicts the main components of the voice application development environment as disclosed herein. It will be appreciated that the underlying motivation of this environment is to generally allow a voice application to be efficiently developed for use in a configuration such as shown in Figure 1.
  • the main components illustrated in Figure 2 are: (1) The SpeechJava language definition 10, including a utility class SpeechIO, which carries out basic speech input and output operations;
  • a single processor runtime environment 20 including a suitable implementation of the SpeechIO class, which enables execution of SpeechJava applications as normal Java programs;
  • a "dynamic compiler” 40 which converts a SpeechJava program, together with a small set of annotations, into a dynamic VoiceXML program comprising a standard Java program and a collection of static VoiceXML pages.
  • the annotations specify which parts of the application are to be run on the server, and which parts on the client; and
  • a dynamic VoiceXML runtime environment 50 implemented on top of the standard Tomcat gateway or a similar piece of software, which enables execution of code generated by the dynamic compiler.
  • Tomcat is a freeware server technology available from http://jakarta.apache.org/).
  • the dynamic VoiceXML environment 50 depends on the dynamic SpeechJava to VoiceXML compiler 40, which in turn depends on the static SpeechJava to VoiceXML compiler 30.
  • the single-processor environment depends only on the SpeechJava language definition 10.
  • SpeechJava represents a subset of Java of the programming language, and in one embodiment contains at least the following constructs:
  • Switch statements 10. Definitions of inner classes. These classes can contain definitions of data members, but not necessarily anything else. 11. At least the following types of expressions: a. Arithmetic expressions; b. Relational expressions; c. Array element expressions; d. 'new' expressions.
  • SpeechJava language definition incorporates a newly defined utility class (“SpeechIO”) that provides low-level speech functions for application input and output.
  • the SpeechIO class contains methods for at least the following operations: a. speech recognition using a specified grammar; b. speech output using a recorded wavfile; c. speech output using a text-to-speech engine.
  • a further output utility class (“RecStructure”) is also provided that represents the result of performing a speech recognition operation (this can be incorporated into the SpeechIO class if desired).
  • SpeechJava programs can be developed and run in any normal Java environment equipped with an implementation of the SpeechIO class.
  • One easy way to do this is to implement a server which can carry out the basic speech input and output operations, and then realise the SpeechIO class as a client to this server (note that the client and server do not necessarily have to be on the same system, for example if remote method invocation is used).
  • the RecStructure class can be implemented as an extension of Hashtable or some similar class, and may be provided as an inner class for example of the SpeechIO class.
  • FIG 3 This is illustrated in Figure 3, where a SpeechJava application 301 runs in Java environment 305.
  • the SpeechJava application 301 calls methods in the SpeechIO class 302 in order to perform voice input/output operations.
  • Figure 3 does not show the RecStructure class separately, rather this is treated as a component of the SpeechIO class 302).
  • the SpeechIO class 302 can be regarded in effect as a wrapper for functionality provided by the underlying voice platform 303. This platform is typically outside the Java environment 305, but can be accessed by suitable native language calls from the SpeechIO class 302. It will be appreciated of course that Figure 3 represents a standard architecture for writing conventional (non Web-based) voice applications in Java.
  • This compiler converts SpeechJava programs into equivalent static VoiceXML programs.
  • the basic idea is to realise definitions of static SpeechJava methods as VoiceXML ⁇ form> elements, invocations of static SpeechJava methods as VoiceXML ⁇ subdialog> elements, and instances of SpeechJava inner classes as Javascript (ECMAScript) objects. Most of this process can be carried out using standard techniques, as described in more detail below.
  • the first limitation can be overcome by noting that although VoiceXML syntax does not permit insertion of a ⁇ subdialog> element into an ⁇ if> element, it is nevertheless possible to give a
  • conditional will occur in a context where local variables are defined, and these variables may be referenced in the body of the conditional. It is thus necessary to pass the local variables into the new subdialogs sub_if and sub_else, return their new values on exiting the subdialogs, and use the returned values to update the local variables in the translated conditional. b. It may also happen that one or both of the branches of the conditional contains an occurrence of 'return'.
  • SpeechJava into static VoiceXML is normally only possible for code that can be executed entirely on the client (this happens to be the case for the example application given in section 3 below). More generally, when at least some portions of the original Java code are to be run on the server, either for reasons of efficiency or because some server processing is necessary in the application, then a dynamic compiler must be used, as will now be described.
  • Compiler 40 transforms annotated SpeechJava programs into Java-based dynamic VoiceXML programs.
  • the resulting code from the compilation process comprises a Java program to be executed on the server, together with a set of one or more pieces of VoiceXML to be executed on the client.
  • Annotations are utilised to control which Java methods are to be executed on the server (hence remaining as Java), and which are to be executed on the client (hence being compiled in VoiceXML).
  • a developer can use the annotations to identify those methods that are desired to transfer processmg from the server to the client.
  • the compiler knows that the SpeechIO methods must be implemented on the client (since only this has speech facilities), and so will automatically convert these into the appropriate VoiceXML, with the necessary transfer of control from the server.
  • the annotations therefore allow a developer to specify additional processing to be performed on the client. If no annotations are provided, then only the basic minimum of speech operations will be performed on the client.
  • step 410 Internalise the source code and the annotations (step 410). This transforms the code into representations of flow, method calls, and so on that are easier to work with. Note that internalisation per se is well-known in the art. Indeed, in the preferred embodiment, the internalisation is performed using a publicly available piece of freeware, namely the ANTLR parser-generator, together with the accompanying grammar for Java (see www.antlr.org).
  • step 430 Use the call graph and the annotations (step 430) to separate the method declarations into three groups: a. Normal server side methods. These will stay as Java. b. Client side methods. These will become VoiceXML. c. Server side methods that transfer control to the client side. These will be replaced by special
  • the separation into the three groups is performed based on the knowledge that the program starts on the server side, and remains there until a transfer is encountered. Such a transfer can either be explicitly identified in the annotations file, or else implicit (thus a call to a method in the SpeechIO class must cause a transfer to the client, since only the client supports the necessary audio input/output facilities). Note that each transfer is effective for the duration of a single called client method, after which processing is returned back to the server via a "submit" operation. 4. For each method in group (c) above, use the call graph to compute the transitive closure of the method under the invocation relation (step 440).
  • each call to the client comprises a single method
  • this called method may in turn invoke further methods that are also performed on the client (in the same manner as a conventional function or program stack).
  • these further methods have completed, we return to the originally called client method, and then back to the server.
  • This set of methods comprises the transitive closure of the called method.
  • Each method in the set is therefore translated into VoiceXML using the SpeechJava to static VoiceXML compiler, as previously described.
  • a corresponding "proxy" method is created (step 450) that performs the following actions: a. Call a utility method to translate the arguments of the group (c) method into a piece of VoiceXML code, and store a table associating server side objects with client side objects. b. Combine this piece of VoiceXML code with the VoiceXML code compiled in (4). c. Render out the combined VoiceXML to the client. d. Wait for a new submit from the client containing the returned information. e. Decode the returned information, and if necessary update server-side objects and/or compute a return value.
  • step 460 For each method in group (a) above modify as follows (step 460): a. Replace calls to methods in group (c) above with calls to the corresponding proxy methods. b. Replace calls to SpeechIO primitives with calls to the corresponding methods that render out client-side code.
  • server side units are converted back to external Java syntax (step 470).
  • Figure 5 illustrates the components of the client/server environment for execution of the compiled SpeechJava voice application programs in dynamic VoiceXML mode. Note that this environment includes both client-side and server-side processing.
  • Figure 5 illustrates a server platform 110 and a client platform 130 (mirroring the arrangement of Figure 1).
  • the client and server are connected by a link 501 that supports HTTP communications.
  • Link 501 may therefore be implemented over the Internet if so desired. It will be appreciated that any other appropriate communications facility or protocol could be utilised instead, provided they were appropriately supported by the various components.
  • both the client and server platforms each comprise standard desktop computers running the Windows NT operating system (Windows 2000), available from Microsoft Corporation.
  • the server side software 502 includes the Tomcat server implementation previously mentioned, as available from http://jakarta.apache.org, which creates new Tomcat servlets in response to incoming requests from clients.
  • the server includes the following main components:
  • the server side Java code 510 produced by the dynamic VoiceXML compiler. Note that this is running on a Java virtual machine 505 (i.e. a Java run-time environment).
  • the (pre-compiled) VoiceXML code 520 as produced by the dynamic VoiceXML compiler.
  • An HTTP gateway process 540 This incorporates a Gateway Server 541, a Gateway Client
  • a utility class 530 that is responsible for translating at run-time between server-side (Java) and client-side (VoiceXML) representations. For this purpose, utility class 530 maintains a correspondence table 535, which stores associations between objects on the client and on the server. Note that this utility class 530 may if desired be split into multiple classes, and in some embodiments may be incorporated into the gateway process 540.
  • the client system 130 includes a standard VoiceXML browser 550, which communicates with the server-side gateway process via HTTP.
  • the VoiceXML browser 550 is used to render the VoiceXML code 551, which is downloaded from the server.
  • browser 550 is implemented by the V-Builder program, available from Nuance Corporation (www.nuance.com).
  • client system 130 is then provided with a SoundBlaster audio card to provide audio input/output.
  • the production version of this embodiment (which unlike the development environment is telephony enabled and supports multiple sessions) utilises the Nuance Voice Webserver program to provide the VoiceXML browser 550, which renders the VoiceXML code 551.
  • Client side system 506 then incorporates suitable telephony interface software and hardware (not shown in Figure 5), as supported by the Nuance Voice Web Server product. (See http://www.nuance.com/products/voicexml.html for more details of these various Nuance products).
  • suitable telephony interface software and hardware not shown in Figure 5
  • the client begins by sending a 'run' request 710 to the gateway process 540 on the server over link 501 (step 605, Figure 7A).
  • This request specifies the name of a particular desired application, and will normally be generated in response to an incoming call to client system 130. Typically this will be a conventional telephone call over a land or mobile network, although some clients may support Voice over Internet Protocol (VoIP) calls, or some other form of audio communication.
  • client system 130 may send a run request 710 as part of outbound call processing (as well as or instead of inbound call processing).
  • the particular application request sent to the server 110 may be dependent on one or more parameters such as the called number, the calling number, the time of day, and so on.
  • the gateway process 540 communicates with the server side software 502 in order to start executing the named application program as a new thread 510 (step 610, Figure 7B). It will be appreciated that the processing so far conforms (per se) to existing VoiceXML applications.
  • a proxy method is one that results in a call to be performed on the client. (Note that if there are no such proxy methods encountered, the processing effectively goes straight to step 670, described below).
  • the proxy method in the Java application 510 calls a utility method from the translation utility class 530 to translate the arguments 720 of the proxy method into a piece of VoiceXML code 730, which is then returned back to the proxy method (step 620, Figure 7D).
  • the utility method also stores a table 535 that contains the associations between server side objects in the Java application 510 and the corresponding client side objects in the returned VoiceXML code 730.
  • the proxy method combines the newly generated VoiceXML code 730 received back from the utility class with the appropriate portion 735 of the pre-compiled VoiceXML code 520.
  • This latter component represents a translation of the set of client methods called from this particular proxy method (step 625, Figure 7E). It will be understood therefore that much of the VoiceXML code for execution on the client can be determined (statically) in advance from the original Java code. However, some of the client VoiceXML code can only be generated dynamically at run-time, based for example on the particular request from the client or on particular information in a database.
  • the VoiceXML code 740 comprising the combination produced in the preceding step of the statically prepared VoiceXML code 735 with the dynamically created VoiceXML code 730, is passed to the HTTP gateway process 540, which then renders it out over communications link 501 to the client 130 (step 630, Figure 7F).
  • VoiceXML code 740 as downloaded to the client in Figure 7F corresponds to VoiceXML code 551 as illustrated in Figure 5).
  • This VoiceXML browser 550 receives the VoiceXML code 740 from the server 110, and starts to execute this received VoiceXML code 740 (step 640, Figure 7G). Meanwhile, the server-side process thread in the Java application 510 suspends.
  • the VoiceXML browser 550 client executes the received VoiceXML code 740 until it returns to the top level call of this code.
  • the VoiceXML code forming this top level call concludes with a ⁇ submit> element. This triggers a return to the server-side gateway process 540, passing back a return value or values 750 if appropriate (step 645, Figure 7H). This allows, for example, spoken information recognised during the call to be submitted back to the Java application 510 for processmg on the server side.
  • the HTTP gateway process 540 on the server side 110 wakes up the relevant thread in Java application 510 (this thread having been suspended as part of step 640).
  • the gateway process 540 then passes the reawakened server thread the return information received from the client as part of the submit process of the preceding step (step 650, Figure 71). Note that at this stage the returned information is still in the form of VoiceXML objects 760 (i.e. as received from the Voice XML code on the client 130).
  • the thread in the Java application 510 that received the VoiceXML objects 760 from the gateway process calls a method (or methods) in the translation utility class 530 to translate these methods back into Java, (step 655, Figure 7J).
  • the VoiceXML objects submitted back from the client are passed to the translation utility class 530, which uses the table created in step 620 above to decode the contents of these objects.
  • This content 770 can then be returned to the Java application 510 by updating existing server-side objects, creating new server-side objects, or creating a call return value (or some combination of these three actions).
  • the proxy method is now able to continue processing, and to eventually complete and return. This leads to the resumption of normal server-side processing of Java application 510 (step 660, Figure 7K).
  • the continued processing of the relevant thread of Java application 510 on the server 110 may lead to one or more further proxy methods being called, if there are more portions of code to be run on client 130. If this turns out to be the case (step 665), then processing returns to repeat stages (3) through (11), as just described. 13. Finally all the proxy methods in the relevant Java application 510 have been completed, and so the server-side program is ready to terminate. At this point, there is an outstanding HTTP request from the client (given the request/response model of client-server HTTP communications). Thus the gateway process 540 renders out a piece of null VoiceXML to the client 130 in order to formally satisfy this remaining request (step 670). This then allows this thread of the server Java application 510 to conclude. Likewise, the client 130 may conclude the call, or may perform additional processing associated with the call that does not involve server 110.
  • FIG. 8 provides an overview of the voice application development process in this embodiment. This commences with writing the application in the SpeechJava format (step 810). Once the application has been written, it can be tested on a single processor Java platform provided with suitable audio facilities (step 820). This testing is useful to verify correct application behaviour, although can be omitted if desired.
  • the next step in Figure 8 is to compile the application into static Voice XML (step 830). Note that this is only feasible if the entire application can potentially be run on the client. Thus once the complete application has been statically compiled into VoiceXML, it is possible to test the VoiceXML code on the client (step 840). This can be useful for understanding application behaviour and performance. Note however that the static compilation and associated testing can be omitted if desired; indeed, they must be omitted if the application includes operations that can only be performed on the server, since in this case the static compilation will not be viable.
  • step 850 the annotations are developed to control whether processing is to be performed on the server or on the client.
  • the annotations can be considered as optional, in that if no annotations are provided, the system defaults to performing only the basic speech input/output operations on the client.
  • the application can now be compiled into dynamic VoiceXML (step 860), using the annotations to control the location of the relevant processing, and finally installed onto the server system, ready for use (step 870).
  • step 860 could potentially be postponed until run-time (i.e. after the installation of step 870). This would allow the annotation file to be specified at run-time (perhaps for example dependent on the type of client that initiated the request to the server). However, in the preferred embodiment the compilation is done in advance. This avoids the repetition of having to perform substantially the same compilation process for each user request. Nevertheless, it is not possible to create all the VoiceXML code in advance; rather the generation of a certain proportion of the VoiceXML code, connected with data object transfer between the server and client, must be deferred until run-time (see step 620 of Figure 6).
  • a single server can then deploy these different versions, and determine which one to utilise in response to a given client request.
  • Section 2.1 presents one preferred definition of the SpeechJava language
  • Section 2.2 presents a single-processor environment for running SpeechJava programs
  • Section 2.3 describes in detail one particular SpeechJava to static VoiceXML compiler
  • Section 2.4 describes in detail one particular SpeechJava to dynamic VoiceXML compiler
  • Section 2.5 describes in detail one particular environment for the execution of SpeechJava programs that have been compiled into dynamic VoiceXML.
  • SpeechJava is a subset of Java equipped with an extra utility class called SpeechIO. Static methods are used in effect as though they were C-style functions, and inner classes as though they were C-style structs. Input and output are handled through the SpeechIO class.
  • SpeechJava contains the following constructs: 1. Definitions of top-level classes, of the form
  • ⁇ className> is an identifier
  • ⁇ Body> is a list of definitions of static methods and/or inner classes.
  • RecStructure extending Hashtable which is intended to represent the results of calling recognition.
  • a RecStructure object associates int or String values with a set of String slots. It contains the following methods: a) String getSlotValueAsString (String slotName) Returns the value of a String-valued slot, or null if it has no value.
  • SpeechIO A utility class for low-level speech functions called SpeechIO, containing the following methods: a) RecStructure Recognise (String grammar)
  • SpeechJava programs conforming to the above description can be developed and run in any normal Java environment equipped with an implementation of the SpeechIO and RecStructure classes.
  • One easy way to implement the RecStructure class (which can potentially be incorporated into the SpeechIO class) is as an extension of Hashtable.
  • the SpeechIO class can be realised by first implementing a server (corresponding to the voice platform 303 of Figure 3) that can carry out the basic speech input and output operations.
  • This server can be built on top of any standard speech recognition platform, for example a toolkit from Nuance Corporation (see above).
  • This server provides a minimal speech application, whose top-level loop reads messages from an input channel and acts on them in an appropriate way. Messages will be of the following three types:
  • the server sends a return message over an output channel containing either the recognition result, or a notification that recognition failed for some reason.
  • the SpeechIO class itself (302 in Figure 3) can be implemented as a client to the server described above. Each of the three SpeechIO methods then functions by sending an appropriate message to the server, and in the case of the Recognise message also waiting for a return value.
  • This section describes one embodiment of a compiler that converts annotated SpeechJava programs conforming to the framework of Section 2.1 above into equivalent VoiceXML programs.
  • the annotations for indicating whether code is to be performed on the server or on the client are provided as a separate input file.
  • the compiler performs this conversion as a sequence of three main steps:
  • the first stage is performed in one embodiment using known methods.
  • the parsing of the Java code to internal form is done using a version of the freeware ANTLR parser-generator, together with the accompanying freeware ANTLR grammar for Java ( ' www.antlr.org ' ).
  • this has a simple structure and can be parsed using simple ad hoc methods.
  • the third stage above renders abstract VoiceXML into executable VoiceXML.
  • the compiled VoiceXML is initially in abstract or internalised form for easier manipulation (similar to the internalised form of Java produced at step 410 in Figure 4). It will be appreciated that techniques for converting back from this abstract form into a final executable form are well-known in the art, and accordingly will not be described in further detail herein.
  • VoiceXML in contrast to Java, does not permit expressions to contain subdialog calls
  • Java expressions in general translate into two components: a VoiceXML expression, and a list of zero or more statements which are executed before the statement in which the target expression appears. If this list is non-empty, this is generally because it contains a ⁇ subdialog> item.
  • Translate_expression consequently takes the following arguments:
  • Each recognition grammar used as an argument to an invocation of SpeechIO . recognise is the subject of a declaration using a line of the form grammar grammar ⁇ slot_l, slot_2, ..., slotji ⁇ where grammar is the name of the grammar as it appears in the invocation of SpeechIO . recognise, and slot_l, ... slot ⁇ are the names of the slots filled by grammar.
  • Each grammar declaration is translated into a ⁇ form> item corresponding to a recognition call involving the defined grammar.
  • the declaration above translates into a ⁇ form> with the following appearance:
  • definitions of inner classes are internalised and stored for use in translating other constituents, but are not directly translated into output VoiceXML.
  • a static method definition is translated into one or more ⁇ form> elements, as follows:
  • the body of the method is translated into abstract VoiceXML, potentially including one or more ancillary ⁇ form> elements resulting from translation of conditional or loop elements.
  • the list of current local variables is initialised to the list of formal parameters.
  • SpeechIO methods recognise, sayWavfile and sayTTS are translated as follows: recognise A method invocation of the form:
  • SpeechIO.recognise(gr ⁇ 7?z/w ⁇ /") is translated into a VoiceXML fragment of the form:
  • subdialog src #recognition_subdialog_for_ j gr ⁇ / «?« ⁇ " name- 1 'subdialog_l "/> where subdlalog_l is a new subdialog identifier.
  • Declarations are translated into ⁇ var> items. If initial values are supplied, these are either translated into 'expr' attributes, or into assignments on the newly defined variables.
  • the following examples illustrate how the translation is carried out: int i;
  • method invocations are translated into ⁇ subdialog> items, but the form of the translation depends on whether the method invocation occurs in a 'statement' or an 'expression' context. If the method invocation appears as a statement, then it is directly translated as a ⁇ subdialog> item. If the method invocation however is part of an expression, it is translated in the output VoiceXML expression as the expression : subdialog_l .return_value where subdialogj is a newly generated identifier, and the ⁇ subdialog> item is added to the output list of 'extra statements' produced by the relevant call to translate_expression. In both cases, the list of actual parameters to the method invocation is translated using translate_expression, and the resulting output list of 'extra statements' is added to the current list of 'extra statements'.
  • the basic strategy is to define two new subdialogs, (call them cond_sub_l and cond_sub_2), that respectively encode the 'if and 'else' branches of the conditional.
  • the compiler then recodes the conditional in terms of conditional subdialog calls by introducing a new variable whose value is set depending on the result of the conditional's test. This is followed by calls to cond_sub_l and cond_sub_2 conditional on appropriate values of the branch variable.
  • Assignment statements are translated as appropriate either into VoiceXML ⁇ assign> elements, or into ECMAScript (JavaScript) assign statements wrapped in ⁇ script> elements.
  • the ⁇ assign> element is used if the left-hand side of the assignment is a simple variable, and the ⁇ script> element otherwise.
  • Java numerical literals are translated as VoiceXML numerical literals.
  • Java string literals are translated as VoiceXML string literals.
  • the special Java constant 'null' is translated as the VoiceXML string "undefined”.
  • Java data member expressions of the form: class Jnstance.datajnemberjiame are translated as ECMAScript object property references of the form: class Jnstance.datajnemberjiame
  • Java array element expressions of the form: array _instance[index] are translated as ECMAScript array element expressions of the form: array_instance[index]
  • Java arithmetic operators '+', '-', '*', 7' and '%' are translated into the ECMAScript arithmetic operators of the same names.
  • Java string concatenation operator '+' is translated into the ECMAScript operator of the same name.
  • tmp_var_l is the new temporary variable
  • the value associated with the structurejype key encodes the information that tmp_var_l is an array of two objects of type String
  • the value associated with the identity_type key is a unique new tag.
  • SpeechJava method cannot be translated into ECMAScript if it includes calls to SpeechIO primitives, since speech operations can only be carried out within a VoiceXML form, and JavaScript functions have no mechanism for calling VoiceXML forms.
  • the compiler carries out a static analysis of the input SpeechJava program to determine those method definitions that will be performed on the client side and that can be compiled into ECMAScript.
  • the compiler first constructs a call graph, in which the client side methods are nodes and method invocations are arcs. Methods are then labelled as being either 'VoiceXML' or 'ECMAScript', as follows:
  • Step (3) is repeated until a fixed point is reached. 5) All remaining methods are labelled 'ECMAScript'.
  • Methods labelled 'VoiceXML' are translated as described earlier in this section, and the remaining methods can be translated into ECMAScript.
  • This translation of Java into ECMAScript is reasonably straightforward (compared to the translation into VoiceXML per se), and in broad terms involves the translation of Java static methods into ECMAScript function definitions. The skilled person will then be able to map Java control primitives and operators into ECMAScript control primitives and operators without undue difficulty.
  • Java data structures can be mapped into the same ECMAScript data structures as in the VoiceXML case.
  • a table is utilised to keep track of whether each method is realised as VoiceXML code or
  • This section describes in detail one embodiment of a compiler for implementing the method sketched in Section 1.4 above in order to transform annotated SpeechJava programs into Java-based dynamic VoiceXML programs.
  • the annotations specify which Java methods are to be executed on the server (hence remaining as Java), and which are to be executed on the client (hence being compiled in VoiceXML).
  • the convention used is that the annotations explicitly specify a set of zero or more methods that are to transfer processing to the client. The transitive closure of this set is then run on the client.
  • class is the name of a class
  • method is the name of a method in that class with arity arguments.
  • a typical line might be: execute_on_client get_a_numberl .hello/0
  • the code produced by the compiler comprises a Java program to be executed on the server, together with a set of one or more pieces of VoiceXML to be executed on the client.
  • the top-level steps carried out by the compiler have already been described in Section 1.4 above (see also Figure 4). We now discuss each of these steps in more detail for one particular embodiment.
  • the Java code is internalised using the ANTLR parser for Java referred to in Section 2.3 above, and the annotation file is internalised using straightforward ad hoc methods. Methods listed in the annotations file are marked in a table as execute jm lient methods; these are the methods that are to transfer processing to the client.
  • the call graph is constructed by recursively traversing the internalised Java code and noting the following: a) Instances of invocations of method M_l inside method definition M_2, for some M_l, M_2. In this case, an arc of the form calls(M_2, M_l) is added to the call graph; b) Instances of invocations of the form SpeechIO .recognise(G_l) inside method definition M_2, for some named grammar G_l. In this case, an arc of the form uses_grammar(M_2, G_l) is added to the call graph.
  • server_side_call(M_l, M_2) so that server_side_call(M_l, M_2) holds iff call(M_l, M_2) holds and M_2 is not an execute_on_client method, and let server_side_call* be the reflexive and transitive closure of server_side_call. Then we divide up the set of methods as follows:
  • the server-side methods consist of the set of all methods M such that server_side_call*(main,
  • the client-side methods consist of the remaining methods. These methods are to be translated into VoiceXML.
  • this transitive closure is translated into VoiceXML using the SpeechJava to static VoiceXML compiler, using the appropriate declaration for each of the grammars G found in step 2.
  • the result is written out to a file; call this file client_side_code(M).
  • a corresponding "proxy" method M' For each execute_on_client method M, a corresponding "proxy" method M' is created.
  • the signature of the original method be: return Jype M(typej) f irgj), type f irgj , ... typejifjxrgji) where the return type of the method is return Jype, the names of the arguments are f irgj) ... f trgji, and their types are typej) ... typeji.
  • the name of the file that contains the client-side code for M be client jide ode(M).
  • arg_i' is the expression new Integer(arg_i). Otherwise, arg_i' is arg_i.
  • arg__i' should be always be a proper object, as opposed to a primitive type.
  • args ⁇ arg_0 ⁇ arg_ T, ... arg_n' ⁇ ; gatewayServer. convertArgsToVXML("/ * etHr «_t)>pe", "M", f_args, args); return gatewayServer.sendVXML('Vetw «j >e", " client jide ode(M) ");
  • gatewayServer.ConvertArgsToVXML takes as input the arguments of M, packaged as the Object array args, the names of the formal arguments, packaged as the String array f_args, the name of the return type, and the name of the method M itself. It uses this information to generate a piece of VoiceXML, which calls the code in client ide ode(M). In general, this involves creating client-side objects that correspond to the server-side objects in the arguments of M, so it also stores a table that associates each client-side object with its corresponding server-side object.
  • gatewayServer.sendVXML combines the VoiceXML code produced in step 1 with the precompiled client-side code in client jide_code(M), and renders this out to the client. It then waits for the client to perform a new submit which will contain the returned information. When this is received, it decodes it and if necessary updates server-side objects and/or computes a return value.
  • a call to SpeechlO.sayTTS would for example be replaced by a call to the proxy method execute_client_sayTTS, defined as follows:
  • the server- side code is subjected to a final transformation, which for each top-level class class does the following: 1. class is made to extend the class 'GatewayRunnable'.
  • class is provided with a public method called 'run', as follows: public void run() ⁇ main_jproxy(); gatewayServer.end(); ⁇
  • server-side methods are then rendered out in standard Java syntax using a simple recursive descent algorithm.
  • the implementation includes a GatewayRunnable class and the Gateway.
  • Server-side programs are classes that implement the GatewayRunnable interface. Communication between the voice-browser client and the server program first goes to a Tomcat Servlet (see http://jakarta.apache.org/. as previously mentioned); communication between the Tomcat Servlet 543 and the server-side program 510 is through the Gateway process 540 (see Figure 5).
  • the top-level modules in the Gateway are the following:
  • the Gate way Client class The Gate way Client class.
  • the FileClassLoader class The GatewayServer class.
  • the SharedMemory class Execution starts when the servlet accepts a URL request to run a new program, specified as a string program.
  • the server passes this request to the GatewayClient 542, which invokes the FileClassLoader to locate and run a class named program. Since this class extends GatewayRunnable, which in turn extends Runnable, program can be started as a separate thread.
  • the instance of program communicates with the voice-browser client through its private instance of GatewayServer 542; this communicates with the GatewayClient 542, which in turn communicates with the Tomcat Servlet 543.
  • the GatewayServer and GatewayClient pass information through an instance of the SharedMemory class.
  • the proximate interface between the instance of program and the Gateway consists of the two GatewayServer methods, convertArgsToVXML and sendVXML:
  • convertArgsToVXML takes the run-time information pertaining to the method invocation, and creates a small piece of VoiceXML that acts as a 'header' for the main piece of VoiceXML that has been produced and saved at compile-time.
  • the server-side Java objects are translated into ECMAScript counterparts, and the correspondence between them is saved in a server-side table.
  • sendVXML combines the header information produced by convertArgsToVXML with the pre-compiled VoiceXML, and renders it out to the client. It then waits for the next client request, which should contain updated versions of the objects that have been sent from the server, possibly together with a return value.
  • the server-side correspondence table is used to translate the data back into server- side form, update server-side objects where appropriate, and if necessary produce a server-side return value.
  • Section 3.1 presents an illustrative example of a SpeechJava program and its translation into static and dynamic VoiceXML.
  • the program itself is presented in Section 3.1, and the static VoiceXML translation in Section 3.2.
  • Section 3.3 presents a dynamic VoiceXML translation.
  • Section 3.4 describes the run-time processing carried out by the dynamic VoiceXML program.
  • the example program is a Java class called fmd_largest_number, which contains four static methods.
  • the program uses text to speech (TTS) conversion to prompt the user for three numbers, finds the largest one, and speaks the result. The way in which this is done has been chosen to display many of the features of the SpeechJava language and its compilation into static and dynamic VoiceXML.
  • TTS text to speech
  • SpeechIO.sayTTS ("Component " + i + "I heard " + numbers[i]); ⁇
  • the first branch involves only items whose translation can be included in the scope of an ⁇ if> form, so their translations can be left in place.
  • the output consists of a Java file, representing the code to be run on the server, and a VoiceXML file, representing the pre-compiled portion of the code to be run on the client.
  • the Java file is as follows, with comments as before in italics:
  • f_args ⁇ "numbers", “prompts”, “size” ⁇ ;
  • Objectf] args ⁇ arg_0, arg_l, new Integer(arg_2) ⁇ ; gatewayServer.convertArgsToVXML("void”, “get_number_array”, f_args, args); gatewayServer.sendVXML("void”, “get_number_array_3.vxml”); ⁇
  • the VoiceXML file get_number_array_3.vxml comprises the relevant subset of the file produced by the static VoiceXML compiler, presented in Section 3.2 above. Specifically, it contains the definitions of the five forms 'get_number_array', 'while_sub_ ', 'get_number', 'cond_sub_l' and 'recognition_subdialog_for_NUMBER'.
  • Section 3.3 is executed in the runtime environment of Section 2.5.
  • the initial steps (all of which can be considered as routine) are omitted.
  • arg_0 an int array with three uninitialised elements.
  • arg_l a String array with three elements, whose values are "Say a number", "Say another number” and
  • arg_2 an int with value 2.
  • f_args a String array whose three elements have the values "numbers", “prompts” and “size”.
  • args an Object array whose three elements have the values arg_0, arg_l and Integer(2).
  • the invocation is : gatewayServer.convertArgsToVXML("void”, “get_number_array”, f_args, args)
  • the purpose of this call is to produce a "header" piece of VoiceXML that makes a ⁇ subdialog> invocation of a form.
  • This invocation returns a void value (the first argument); the VoiceXML ⁇ form> called is called "get_number_array" (the second argument); the names of the formal parameters for the call are in f_args (the third argument); and the run-time values of these parameters are in args (the fourth argument).
  • the following piece of VoiceXML is produced, with comments as before in italics:
  • variable 'references' is used to hold client-side franslations of all the server-side objects that are passed to the call.
  • the code in convertArgsToVXML also constructs an association list which associates the Java objects passed to the call (here arg_0 and arg_l) with the corresponding indices in 'references'.
  • the list associates arg_0 (the int array) with references[2], and arg_l (the String array) with referencesfl].
  • the following call to sendVXML combines the dynamically generated VoiceXML code immediately above with the pre-compiled VoiceXML fragment get_number_array_3.vxml described at the end of Section 3.3. It then sends the result through the Gateway to the client, and waits for the next Gateway request, which should be the result of the ⁇ submit> at the end of the dynamically generated VoiceXML fragment.
  • the Gateway receives this new request, it passes the body of the ⁇ submit> back to the sendVXML method.
  • the user responded "seven", "five" and
  • the final step is to use the association list to unpack this information back to the original server-side data-structures. Since the list associates arg_0 (the int array) with element 2 of 'references', the numbers 7, 5 and 10 are correctly entered into elements 0, 1 and 2 respectively of the int array arg_0, thereby completing the call.
  • Section 4.1 discusses using a larger subset of Java
  • Section 4.2 discusses using procedural languages other than Java
  • Section 4.3 discusses using voice scripting languages other than VoiceXML.
  • Section 4.1.3 considers the question of whether there are any Java constructs for which there may not be a suitable translation into VoiceXML.
  • the basic strategy for translating non-static methods into VoiceXML is first to reduce them to functions, with one function for each method name. Since the architecture tags each generated VoiceXML object with the type of the Java object to which it is intended to correspond, it is then possible to use run-time dispatching to delegate a method invocation to the translation of the individual method appropriate to object on which the method was invoked, together with its arguments.
  • MClassl is created from the definitions of the method M in Classl by adding the Classl object o on which the method is invoked as an extra argument.
  • Direct references to data members of Classl in the definition of the method M are then translated into corresponding references to the data members of o. For example, suppose Classl has a String data member called 'message', and M is defined as follows:
  • MClassl is then defined as:
  • a partial treatment of exceptions can be implemented in a straightforward manner by translating them into a special type of return value. This would be similar to the treatment of local variables in conditionals and iterative loops described in Sections 2.3.8 and 2.3.10. This type of solution is expected to work adequately for user-generated exceptions, i.e. exceptions intentionally created using the 'throw' construction; 'throw' will just translate into a special kind of 'return' statement.
  • a generalised scheme for handling exceptions not generated by user code, such as an exception resulting from a division by zero, is somewhat more problematic, although partial solutions that are likely to be sufficient in most practical situations are again feasible.
  • SALT Speech Application Language Tags
  • a voice browser could be employed.
  • the user could connect to the client via a computer network rather than a conventional telephone network (using a facility such as Voice of the Internet).
  • a computer network rather than a conventional telephone network (using a facility such as Voice of the Internet).
  • voice browsers on other forms of client system, some potentially quite different from a standard interactive voice response system.
  • the client is a Personal Digital Assistant (PDA).
  • PDA Personal Digital Assistant
  • the PDA still functions as a VoiceXML client, and is therefore directly compatible with the approach described above.
  • this wide range in the nature of the potential client device underlines the usefulness of annotations, in that the optimum distribution of processing between the server and client will clearly vary according to the properties of the client for any given application environment.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Stored Programmes (AREA)
  • Computer And Data Communications (AREA)

Abstract

Procédé pour développer et faire tourner des applications locales dans un environnement client-serveur. La plate-forme de serveur prend en charge un langage de procédure de haut niveau tel que Java, et la plate-forme client prend en charge un langage balisé vocal tel que VoiceXML. Le procédé consiste à écrire d'abord l'application vocale dans un langage de procédure de haut niveau puis à fournir une ou plusieurs annotations pour l'application vocale. Les annotations servent à indiquer la partie de l'application vocale à exécuter sur le serveur et la partie de l'application vocale à exécuter du côté client. Cette dernière partie de l'application vocale (à exécuter du côté client) est ensuite transformée du langage de procédure de haut niveau en langage balisé vocal pris en charge par le client, conformément aux annotations. La plus grande partie de cette transformation peut être réalisée statiquement en avance mais le reste de la transformation est effectué dynamiquement pendant l'exécution.
PCT/GB2002/001929 2002-04-26 2002-04-26 Systeme et procede pour creer des applications vocales WO2003091827A2 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
AU2002253334A AU2002253334A1 (en) 2002-04-26 2002-04-26 A system and method for creating voice applications
PCT/GB2002/001929 WO2003091827A2 (fr) 2002-04-26 2002-04-26 Systeme et procede pour creer des applications vocales

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/GB2002/001929 WO2003091827A2 (fr) 2002-04-26 2002-04-26 Systeme et procede pour creer des applications vocales

Publications (2)

Publication Number Publication Date
WO2003091827A2 true WO2003091827A2 (fr) 2003-11-06
WO2003091827A3 WO2003091827A3 (fr) 2004-03-04

Family

ID=29266192

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2002/001929 WO2003091827A2 (fr) 2002-04-26 2002-04-26 Systeme et procede pour creer des applications vocales

Country Status (2)

Country Link
AU (1) AU2002253334A1 (fr)
WO (1) WO2003091827A2 (fr)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7653650B2 (en) 2005-12-13 2010-01-26 International Business Machines Corporation Apparatus, system, and method for synchronizing change histories in enterprise applications
US7885958B2 (en) 2006-02-27 2011-02-08 International Business Machines Corporation Method, apparatus and computer program product for organizing hierarchical information
US9330668B2 (en) 2005-12-20 2016-05-03 International Business Machines Corporation Sharing voice application processing via markup
CN111984305A (zh) * 2020-08-21 2020-11-24 腾讯科技(上海)有限公司 一种应用配置方法及装置、计算机设备

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6269336B1 (en) * 1998-07-24 2001-07-31 Motorola, Inc. Voice browser for interactive services and methods thereof

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6269336B1 (en) * 1998-07-24 2001-07-31 Motorola, Inc. Voice browser for interactive services and methods thereof

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DANIELSEN P J: "THE PROMISE OF A VOICE-ENABLED WEB" COMPUTER, IEEE COMPUTER SOCIETY, LONG BEACH., CA, US, US, vol. 33, no. 8, 1 August 2000 (2000-08-01), pages 104-106, XP000987575 ISSN: 0018-9162 *
HARTMAN J D ET AL: "VoiceXML builder: a workbench for investigating voiced-based applications" 2001, PISCATAWAY, NJ, USA, IEEE, USA, October 2001 (2001-10), pages S2C-6, XP002265475 ISBN: 0-7803-6669-7 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7653650B2 (en) 2005-12-13 2010-01-26 International Business Machines Corporation Apparatus, system, and method for synchronizing change histories in enterprise applications
US9330668B2 (en) 2005-12-20 2016-05-03 International Business Machines Corporation Sharing voice application processing via markup
US7885958B2 (en) 2006-02-27 2011-02-08 International Business Machines Corporation Method, apparatus and computer program product for organizing hierarchical information
CN111984305A (zh) * 2020-08-21 2020-11-24 腾讯科技(上海)有限公司 一种应用配置方法及装置、计算机设备
CN111984305B (zh) * 2020-08-21 2023-08-08 腾讯科技(上海)有限公司 一种应用配置方法及装置、计算机设备

Also Published As

Publication number Publication date
AU2002253334A8 (en) 2003-11-10
WO2003091827A3 (fr) 2004-03-04
AU2002253334A1 (en) 2003-11-10

Similar Documents

Publication Publication Date Title
US7487440B2 (en) Reusable voiceXML dialog components, subdialogs and beans
JP4625198B2 (ja) 動的ウェブページコンテンツファイルからのサーバ側コード生成
US7120897B2 (en) User control objects for providing server-side code generation from a user-defined dynamic web page content file
US7711570B2 (en) Application abstraction with dialog purpose
US8229753B2 (en) Web server controls for web enabled recognition and/or audible prompting
US8024196B1 (en) Techniques for creating and translating voice applications
KR100431972B1 (ko) 통상의 계층 오브젝트를 사용한 효과적인 음성네비게이션용 뼈대 구조 시스템
US7844958B2 (en) System and method for creating target byte code
US7260535B2 (en) Web server controls for web enabled recognition and/or audible prompting for call controls
US7010796B1 (en) Methods and apparatus providing remote operation of an application programming interface
US7707547B2 (en) System and method for creating target byte code
US20030208640A1 (en) Distributed application proxy generator
US20050028085A1 (en) Dynamic generation of voice application information from a web server
CN106201862A (zh) web服务压力测试方法及装置
US20090144711A1 (en) System and method for common compiler services based on an open services gateway initiative architecture
US7174006B2 (en) Method and system of VoiceXML interpreting
US20050132323A1 (en) Systems and methods for generating applications that are automatically optimized for network performance
EP1002267A1 (fr) Procede et dispositif de generation statique et dynamique d'information sur une interface d'utilisateur
WO2003091827A2 (fr) Systeme et procede pour creer des applications vocales
US7826600B2 (en) Method and procedure for compiling and caching VoiceXML documents in a voice XML interpreter
US7266814B2 (en) Namespace based function invocation
Eberman et al. Building voiceXML browsers with openVXI
Turner Specifying and realising interactive voice services
Turner Formalizing graphical service descriptions using SDL
ES2373114T3 (es) Procedimiento para proporcionar un servicio de voz interactivo sobre una plataforma accesible a un terminal cliente, servicio de voz, programa informático y servidor correspondientes.

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SI SK SL TJ TM TN TR TT TZ UA UG US UZ VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase in:

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP