WO2000017854A1 - Method and system of configuring a speech recognition system - Google Patents

Method and system of configuring a speech recognition system Download PDF

Info

Publication number
WO2000017854A1
WO2000017854A1 PCT/EP1998/006030 EP9806030W WO0017854A1 WO 2000017854 A1 WO2000017854 A1 WO 2000017854A1 EP 9806030 W EP9806030 W EP 9806030W WO 0017854 A1 WO0017854 A1 WO 0017854A1
Authority
WO
WIPO (PCT)
Prior art keywords
network application
speech
state
application server
server
Prior art date
Application number
PCT/EP1998/006030
Other languages
German (de)
French (fr)
Inventor
Anthony Rodrigo
Original Assignee
Nokia Networks Oy
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Networks Oy filed Critical Nokia Networks Oy
Priority to AU10253/99A priority Critical patent/AU1025399A/en
Priority to JP2000571437A priority patent/JP4067276B2/en
Priority to PCT/EP1998/006030 priority patent/WO2000017854A1/en
Priority to EP98952622A priority patent/EP1116373B1/en
Priority to AT98952622T priority patent/ATE239336T1/en
Priority to DE69814181T priority patent/DE69814181T2/en
Priority to ES98952622T priority patent/ES2198758T3/en
Publication of WO2000017854A1 publication Critical patent/WO2000017854A1/en
Priority to US09/809,808 priority patent/US7212970B2/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/04Protocols specially adapted for terminals or networks with limited capabilities; specially adapted for terminal portability
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/75Indicating network or usage conditions on the user display
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/40Network security protocols
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/30Definitions, standards or architectural aspects of layered protocol stacks
    • H04L69/32Architecture of open systems interconnection [OSI] 7-layer type protocol stacks, e.g. the interfaces between the data link level and the physical level
    • H04L69/322Intralayer communication protocols among peer entities or protocol data unit [PDU] definitions
    • H04L69/329Intralayer communication protocols among peer entities or protocol data unit [PDU] definitions in the application layer [OSI layer 7]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/18Information format or content conversion, e.g. adaptation by the network of the transmitted or received information for the purpose of wireless delivery to users or terminals

Definitions

  • the present invention relates to a speech control system and method for a telecommunication network, wherein a network application server is controlled on the basis of a speech command.
  • DSR distributed speech recognition
  • ASR automatic speech recognition
  • NAS network application server
  • MS subscriber terminal like a mobile station
  • the basic function of a distributed speech recognition system in the context of mobile applications is the ability of a mobile station to provide automatic speech recognition features with the help of a high power ASR engine or ASR server provided in the network. Therefore, the basic function of the mobile station is the transmission of an input speech command to this network ASR engine to perform the recognition tasks and return the results. The result can be a recognized word or command in text format. The mobile station can then use the text to perform the necessary functions.
  • Another function of such a system is to provide the mobile station with access to other application servers, i.e.
  • the mobile station transmits a speech signal (audio) to the ASR engine.
  • the ASR engine will perform speech recognition so as to obtain corresponding text commands. These text commands are returned to the mobile station.
  • the mobile station uses these text commands * to control a corresponding network application server
  • NAS which can be any server in a data network like the
  • the ASR engine Since the ASR engine usually runs on a platform that can also run other applications or perform other tasks, it is possible to transfer other functions to the ASR engine, such as processing the obtained text command to ascertain the required operation and contact the relevant server. Then, it transmits the information retrieved from the contacted network application server back to the mobile station. In this situation, the mobile station receives a speech input, sends it to a network ASR engine which performs speech recognition, executes necessary functions based on the speech commands and sends the retrieved information or results to the mobile station.
  • the user might say "Call John Smith” .
  • the ASR engine converts the speech into text and returns the text "Call John Smith” to the mobile station, where the application software in the mobile station then retrieves the number for John Smith and performs a calling operation.
  • the speech command at the mobile station might be "Racing Info".
  • the ASR engine converts the speech into text, and returns the text "Racing Info" to the mobile station.
  • the application software of the mobile station recognizes that the user wishes to access the network server that provides a Horse Racing Information. Accordingly, the mobile station establishes a connection with the relevant server, retrieves the latest race results and displays the results- n a display of the mobile station.
  • a speech command input to the mobile station might be "Read
  • the ASR engine converts the speech into text and returns the text "Read Email" to the mobile station.
  • the application software of the mobile station recognizes that the user wishes to access the network server that provides access to the user's email box.
  • the mobile station sends a command to the ASR engine to establish a connection with the relevant email application server.
  • the ASR engine does not return the recognized speech, but further processes the converted speech.
  • the speech command was "Message 1”
  • the ASR engine receives the speech and translates it into a text command "Message 1" and transmits this text command to the email application server.
  • the email application server returns the text of Message 1 to the ASR engine.
  • the ASR engine will then transmit this text to the mobile station.
  • the dialog may continue with Message 2, 3 and so on, wherein each speech command from the user will be handled by the ASR engine, until the user issues an exit command or until a message is received from the mobile station to terminate the session.
  • the only function of the ASR engine is to convert speech into text and to send the results back to the mobile station for further processing. Therefore, the network application servers will receive commands directly from the mobile station.
  • the network application servers will receive commands directly from the mobile station.
  • ASR engine itself processes the converted speech and directly accesses the relevant network application server in order to receive the results from the network application server and pass the results back to the mobile station.
  • the mobile station or the ASR engine is required to communicate with the network application server to issue user commands to the network application server and receive responses from the network application server.
  • the email application to be read supports commands such as A ⁇ Message 1, Message 2 ... Message N and Exit ⁇ at the top level menu.
  • the commands in this context are B ⁇ Delete, Exit, Next Message ⁇ . Therefore, if the user is in the top level menu and inputs a speech command other than those in the command set A, the network application server will respond with an error message. Even if the user issues a speech command from the command set B, this command will still be an erroneous command, since the context or state of the network application server is different.
  • context irrelevant commands could as well be input into the mobile station due to noise and the like. All of these speech signals will be converted into a text by the ASR engine and sent to the network application server which will respond with error messages.
  • the above problem leads to a delay in the response of the ASR engine to an input speech message, since it has to wait for responses from the network application server. Accordingly, the overall response time at the mobile station will be increased, such that the user may repeat the command or change the command which increases the delays even further and leads to a poor performance of the system.
  • a speech control system for a telecommunication network comprising: loading means for loading a state definition information from a network application server, wherein said state definition information defines possible states of the network application server; determining means for determining a set of valid commands for said network application server on the basis of said state definition information; and checking means for checking a validity of a text command, obtained by converting an input speech command to be used for controlling said network application server, by comparing said text command with said determined set of valid commands.
  • a speech control method for a telecommunication network comprising the steps of: loading a state definition information from a network application server, wherein said state definition information defines possible states of the network application server; determining a set of valid commands for said network application server on the basis of said state definition information; and checking a validity of a text command, obtained by converting a speech command to be used for controlling said network application server, by comparing said text command with said determined set of valid commands. Accordingly, since a set of valid commands can be determined on the basis of a state definition information provided by the network application server, the validity of an obtained text command can be checked before transmitting the text command to the network application server. Thus, the transmission of erroneous text messages can be prevented so as to prevent corresponding delays and wastes of processing time of the network application server.
  • the loading means can be arranged to load a grammar and/or vocabulary information which specifies a total set of valid commands supported by the network application server, wherein the determining means can be arranged to determine said set of valid commands on the basis of said total set of valid commands and a state transition information included in said state definition information.
  • the speech control system can keep up with the actual states of the network application server by referring to state transition rules so as to limit the total set of valid commands to those commands which correspond to the actual state of the network application server.
  • the determining means can be arranged to cause the loading means to load a state-dependent grammar file defining a set of valid commands for a specific state of the network application server, when the determining means determines a state change on the basis of a state transition information included in the state definition information.
  • the network control system may comprise a speech recognition means for converting an input speech command received from a subscriber terminal into the text command to be supplied to the network application server.
  • a central speech control system can be provided in the network which can be accessed by individual subscriber terminals .
  • the speech control system may be implemented in a Wireless Telephony Application (WTA) server, wherein the WTA server may be arranged to receive the text command from a network speech recognition means for converting an input speech command received from a subscriber terminal into said text command.
  • WTA Wireless Telephony Application
  • the speech control system may be a subscriber terminal having an input means for inputting a speech command, a transmission means for transmitting the speech command to a speech recognition means of the telecommunication network, and a receiving means for receiving the text command from the speech recognition means, wherein the transmitting means is arranged to transmit the received text command to the network application server.
  • the validity check of the received text command is performed in the subscriber terminal, e.g. the mobile station, before it is transmitted to the network application server.
  • the processing time at the network application server can be reduced, as it will receive only valid commands .
  • the state definition information can be a data file such as a Wireless Markup Language (WML) file or a Hyper Text Markup Language (HTML) file.
  • WML Wireless Markup Language
  • HTML Hyper Text Markup Language
  • the state definition information may include a load instruction for loading the state-dependent grammar and/or vocabulary file.
  • the speech control system may use the load instruction directly for loading the specific set of valid commands in case a change of the state of the network application server is determined.
  • ⁇ the state definition information can be provided by the network application server at a setup time of the server.
  • state definition information can be stored together with a command set info in a network server running on the hardware of the speech control system.
  • the speech control system may comprise a plurality of vendor-specific speech recognition means, wherein corresponding parameters for said plurality of vendor-specific speech recognition means are defined in the state definition information.
  • a universal speech control system can be obtained which is based on a hardware and software independent platform.
  • a required audio processing hardware and vendor-specific speech recognition means can be selected depending on the network application server.
  • Fig. 1 shows a block diagram of a telecommunication network comprising a speech control system according to the preferred embodiment of the present invention
  • Fig. 2 shows a flow diagram of a speech control method according to the preferred embodiment of the present invention
  • Fig. 3 shows a block diagram of a telecommunication network comprising a WAP-based speech control system according to the preferred embodiment of the present invention.
  • FIG. 1 A block diagram of a telecommunication network comprising the speech control system according to the preferred embodiment of the present invention is shown in Fig. 1.
  • a mobile station (MS) 1 is radio-connected to a base station subsystem (BSS) 2 which is connected to a telecommunication network 4 via a mobile switching center (MSC) 3.
  • the telecommunication network 4 may be a data network like the Internet which provides various services.
  • NAS network application server
  • ASR automatic speech recognition means or engine
  • the application has to be fine-tuned to a required context. This is done by specifying a vocabulary for the application and grammars that are valid in the context of the application.
  • the vocabulary is basically a set of words to be recognized by the ASR engine 6, e.g. words like Close, Read, Message, Orange, Pen, Chair, Exit, Open etc..
  • a means for specifying the grammar for a given application can be provided. This could be achieved by a rule- based grammar like for example:
  • ⁇ Command> one public rule, ⁇ Command>, is specified, which may be spoken by a user.
  • the rule is a combination of subrules ⁇ Action>, ⁇ Object> and ⁇ Polite>, wherein the square brackets around ⁇ Polite> indicate an optionality thereof. Therefore, the above grammar would support the following commands: "read message”, "please read item and message” etc..
  • rule-based grammars are used to define all spoken input which the application is programmed to handle.
  • the rule-based grammar basically specifies all spoken commands (or command syntax) that are supported by an application.
  • the grammar file contains all commands which the email reader application will accept (e.g. Message 1, Message 2, ..., Message N, Exit, Delete and Next Message) .
  • the ASR engine 6 generally loads the associated grammar file before starting the speech recognition. Some applications may even have multiple grammar files to define different contexts of an application such as the network application server 5, wherein the ASR engine 6 is required to load the context- dependent grammar file at run time.
  • a grammar file In the preferred embodiment, a grammar file, a vocabulary file and an application states definition file (ASD file) are defined. Therefore, each network application server 5 produces an ASD file, a grammar file and/or a vocabulary file.
  • the grammar file is adapted to the requirements of the ASR engine 6, wherein ASR engines 6 of different vendors may have different grammar file formats.
  • the ASD file is a file which describes all possible states of the application and how to jump between states, along with the valid commands for each state.
  • the ASD file provides a means for specifying the context-dependent grammar files and also a vocabulary file name. This is an important feature, since a given application may use different grammars and/or vocabularies depending on the context. If this information is loaded on-line to the ASR engine 6, the speech recognition and the overall response time can be improved remarkably due to the small set of valid commands and the resulting high recognition accuracy.
  • ASD file is based on a syntax similar to HTML (Hyper Text Markup Language), it could be defined as follows:
  • ⁇ STATE "Main Menu”
  • COMMANDS ⁇ MSG>
  • NEXTSTATE Read
  • ⁇ DIGITS> 1
  • an ⁇ ASD> tag identifies the file as a file type that provides the state definition of the network application server 5
  • an ⁇ APP> tag specifies the application name and a ⁇ STATE> tag defines a given state, i.e. the name of the state, the valid commands for this state, and with each command, the next state to which the application must jump is also defined.
  • a ⁇ STATE> tag is defined for each state of the network application.
  • the ⁇ GRAMMAR> tag provides a means of defining the commands and the syntax of the commands. According to the above file, the application has to jump to the state "Read" after the Messages 1, 2, 3 . . . N .
  • ⁇ digits> tag defines a specific grammar.
  • the ⁇ GRAMMAR> tag shows that the digits could be 1, 2, 3, 4 or 5.
  • the ASD file tells the ASR engine 6 or the mobile station 1 which commands are valid for a given context.
  • state transition rules are also provided in the ASD file.
  • tags which include a context-dependent grammar file, it would be possible to instruct the ASR engine 6 which grammar or vocabulary file is to be loaded. Thereby, a higher flexibility can be provided and a recognition can be made more accurate, since the ASR engine 6 is fine-tuned to the context of the network application server.
  • An example for such a tag is shown in the following:
  • Fig. 2 shows a flow diagram of an example for a speech recognition processing as performed in the preferred embodiment .
  • the ASR engine 6 loads a corresponding ASD file from the network application server 5 to be connected (S101).
  • the ASR engine is instructed to load a state-dependent grammar file, i.e. "Read Email. g r", when the network application server 5 enters the state "Read” .
  • the ASR engine 6 may load a general grammar file from the network application server 5 (S102).
  • valid text commands for speech recognition are then determined (S103).
  • a state- dependent grammar file the commands defined in the loaded grammar file are determinded as valid commands for the speech recognition.
  • the valid commands are selected from the general grammar file in accordance with a corresponding information provided in the ASD file. Accordingly, only the determined valid commands are allowed in this state or at least until a different grammar file is loaded.
  • a speech command is received from the mobile station 1 (S104) and speech recognition is performed for the received speech command (S105) .
  • the text command derived by the speech recognition processing from the received speech command are then checked against the determined valid text commands (S106).
  • step 107 the text command is supplied directly to the network application server 5 or to the mobile station 1 (S108) . Otherwise, an error messaging is performed so as to inform the mobile station 1 of the erroneous speech command (S109).
  • the ASR engine 6 refers to the state transition rules defined in the ASD file and determines whether the supplied command leads to a state change of the network application server 5 (S110) . If no state change has been determined, the processing returns to step S104 in order receive another speech command and perform speech recognition of the other received speech commands, if required.
  • step 103 the processing returns to step 103 and the ASR engine 6 refers to the ASD file so as to determine a new set of valid text commands. This can be achieved either by loading a new state-dependent grammar file according to an instruction provided in the ASD file, or by selecting new valid commands from the general grammar file based on a corresponding information in the ASD file.
  • step 104 a new speech command is received in step 104 and speech recognition is continued in step 105.
  • the ASR engine 6 can load a new grammar file at run time. This means that the ASR engine 6 can be instructed to load only the grammar rules applicable to a particular state/context of the network application server 5 by referring to the ASD file. This greatly improves recognition accuracy and efficiency of the use of the network connections .
  • An implementation of the network application server 5 and its user interface may vary depending on the software and hardware platform used.
  • Most network application servers 5 may provide a HTTP interface (i.e. HTML), a WAP (Wireless Application Protocol - WML) or a proprietary Application Interface (API) .
  • HTTP interface i.e. HTML
  • WAP Wireless Application Protocol - WML
  • API Application Interface
  • the ASD file is adapted to either WML (Wireless Markup Language) or HTML (Hyper Text Markup Language) , it can be used as a universal definition file for application states or speech commands in any type of application running on a network application server 5.
  • WML Wireless Markup Language
  • HTML Hyper Text Markup Language
  • the ASR engine 6 would be able to build an internal representation of the relevant NAS application. This representation or model can then be used to keep the ASR engine 6 in synchronism with the application states of the network application server 5.
  • each network application server 5 which provides a speech recognition feature will have its speech-specific WML card(s) or HTML location.
  • the state definition information URL (Uniform Resource Locator) might be a file such as: //services . internal .net/dailynews/speechsettings
  • the speech control system whether it is in the mobile station 1 or in a network server, needs to load this file from the given URL.
  • the network application server 5 is actually a HTTP or WAP origin server, then the first WML card or HTML page sent by this server can include the above specific URL under a special tag. Thereby, the mobile station 1 can be informed that this application supports a speech control and that the file at this URL needs to be loaded in order to provide the speech recognition facility.
  • the ASD files could be sent on-line to the ASR engine 6, as a part of the standard HTML/WML scripts sent by the network application server 5.
  • the ASR engine 6 would interpret these scripts automatically and keep step with the network application server 5 so as to process the speech commands efficiently and perform functions such as on-line loading of grammar files and so on.
  • ASR engine 6 would directly refer to the URL specified in the LOADGRAMMAR tag so as to read the associated grammar file.
  • the ASD files are supplied by the network application server 5 to the ASR engine 6 at setup time, i.e. off-line. These ASD files must be produced in line with the HTML-like specification described above and will be stored along with a grammar file in a WWW server (e.g. www.asr.com) running on the hardware of the ASR engine 6.
  • a WWW server e.g. www.asr.com
  • the ASR engine 6 At the beginning of an interaction between the ASR engine 6 and the network application server 5, the ASR engine 6 first loads the ASD file from the server www.asr.com and builds the internal state representation/model of the application of the network application server 5. Thereafter, the ASR engine can keep step with the states of the network application server 5 and processes speech commands efficiently and performs functions such as run-time loading of grammar files.
  • the LOADGRAMMAR tag includes the full URL which points to the www.asr.com.
  • the functions of the ASD file can be extended even further.
  • a flexibility of selecting a required audio processing hardware and a vendor- specific ASR engine 6 in dependence on the application of the network application server 5 can be provided.
  • a logical ASR engine can be connected to the vendor-specific physical ASR engine 6 based on the application requirements of the network application server 5, such that even custom hardware can be used for audio processing.
  • the corresponding optional parameters can be defined in the ASD file using additional tags .
  • the ASD file in a WAP application is described, which may be used by operators to enhance their existing service offerings.
  • the ASD file was used by the ASR server or engine 6 in order to perform a context-based speech recognition.
  • the ASD file is used by a different application server, i.e. the WTA (Wireless Telephony Application) server 7 in WAP, to perform similar, tasks.
  • WTA Wireless Telephony Application
  • the WAP-enabled mobile station 1 may have the full WAP stack installed and runs the WAE (Wireless Application Environment) .
  • the WTA server 7 has the ability to control the services of the network 4, which is a standard mobile network in the present case.
  • the WTA server 7 acts as a principle content generator.
  • the content may be customized and downloaded to the client, which is the mobile station 1 running a WAP software.
  • the WTA server 7 could also perform call control functions such as informing the mobile station 1 of incoming call details via WTA events .
  • a network-based ASR server 6 which enables an application to connect to the speech server based on parameters such as ID/address of the application, MSISDN, speech encoding type, grammar file ID (to select an appropriate grammar rule) and other optional parameters.
  • the ASR server 6 may have the ability to perform an outgoing call to a given MSISDN number, wherein the ASR server
  • the WTA server 7 checks the validity of the text and may also control the
  • ASR server 6 to load grammar files etc..
  • Each network application server 5 having a speech interface provides an ASD file to the WTA server 7, along with a basic WML card deck, i.e. WML document, for that service.
  • the WTA server 7 loads the ASD file and may change the WML sent to the mobile station 1 based on the ASD file settings.
  • audio functions of the mobile station 1 and settings of the ASR server 6 are controlled in dependence on the application context.
  • the ASD file may define attributes such as an ASR engine to be used for an actual application, an encoding type supported by the ASR engine used by the actual speech-enabled application, a default grammar file (file name) to be used, a default vocabulary (file name or words) and states of the actual application, i.e. a menu hierarchy.
  • Each menu provides specifications for commands supported at the menu and corresponding NEXT states, new grammar rules and vocabularies, which may override previously set values, and parameters specifying whether the actual application requires a microphone or a speaker of the mobile station 1 to be on or off.
  • the service provider (or operator) provides a weather service to its mobile subscribers and offers the service over a speech interface.
  • the operator has installed the ASR server 6 in his network 4 and intends to use this ASR server 6 along with the WTA server 7 to provide the weather service with a speech interface .
  • the user of the mobile station 1 activates a weather menu being already primed to use the speech interface.
  • This request is sent by the WAE to the WTA server 7.
  • the WTA server 7 sends a deck of WML cards pre-loaded from the corresponding network application server 5 and relating to the weather service, to the mobile station 1.
  • the WAE software of the mobile station 1 goes to a listening mode in order to answer an incoming call from the ASR server 6 of the network 4.
  • the WTA server 7 sends a request for an ASR session to the ASR server 6, including an MSISDN, an allocated session ID with the WTA server 7, and also an ID of a grammar rule to be used.
  • the grammar rule name is derived from the ASD file pre-loaded from the corresponding network application server 5 for the weather service.
  • the ASR server 6 ensures the required resources, i.e. dialout ports and ASR sessions on the speech engine, are available and sends a confirmation to the WTA server 7. Subsequently, the ASR server 6 calls the MSISDN and the network 4 sends a call indication to the mobile station 1. The WAE software of the mobile station 1 automatically answers the call and a speech connection is established between the ASR server 6 and the mobile station 1. Actually, the above call signaling between the mobile station 1 and the ASR server 6 is performed via the WTA server 7.
  • the mobile station 1 deactivates its speaker and sends any audio input received via its microphone over the established speech connection.
  • the audio input may be coded by the WAE software according to a required format, i.e. PCM, CEP or the like.
  • the ASR server 6 converts the received audio input into text and sends the obtained text to the WTA server 7.
  • the WTA server 7 Since the weather session was started, the WTA server 7 has loaded the corresponding ASD file and is now in a position to compare the received text with the valid context-dependent commands. If a valid command, i.e. "London UK", has been received, the WTA server 7 requests the WML/HTML for London UK from the network application server 5 providing the weather service. The network application server 5 responds with the requested weather report for London and the WTA server 7 supplies the WML card deck for London weather to the mobile station 1. In case the grammar rules or vocabulary is changed in the set of WML cards, the ASD file contains a corresponding information and the WTA server 7 sends the new grammar rules or vocabulary to be used for the London weather to the ASR server 6. Thus, the ASR server 6 is primed to use the new grammar or vocabulary required for the new WML cards .
  • the text converted by the ASR server 6 from the speech commands received from the mobile station 1 is sent to the WTA server 7 which checks its validity. In case a valid command, i.e. "Heathrow”, has been received, the WTA server 7 requests the weather info for London Heathrow, and the network application server 5 responds with the requested weather report. Then, the WML card deck for London Heathrow weather is supplied by the WTA server 7 to the mobile station 1.
  • the service provider (or operator) provides a voice mail service with a speech interface to its mobile subscribers .
  • the network application server 5 providing the voice mail service sends a new voice mail message to the WTA server 7.
  • the WTA server 7 sends a deck of WML cards pre-loaded from the network application server 5 and relating to the voice mail service to the mobile station 1.
  • the WAE software of the mobile station 1 goes to a listening mode in order to answer an incoming call from the ASR server 6 of the network 4.
  • the mobile station 1 sends to the WTA server 7 an ASR request which indicates that the user will employ the speech interface to the voice mail service.
  • the WTA server 7 instructs the network 4 to send any incoming call indications to the WTA server 7.
  • the WTA server 7 sends a request for an ASR session to the ASR server 6, including an MSISDN, an allocated session ID with the WTA server 7, and also an ID of a grammar rule to be used.
  • the grammar rule name is derived from the ASD file pre-loaded from the corresponding network application server 5 for the voice mail service.
  • the ASR server 6 ensures the required resources, i.e. dialout ports and ASR sessions on the speech engine, are available and sends a confirmation to the WTA server 7.
  • the ASR server 6 calls the MSISDN and the network 4 sends a call indication to the mobile station 1.
  • the WAE software of the mobile station 1 automatically answers the call and a speech connection is established between the ASR server 6 and the mobile station 1.
  • the mobile station 1 activates both its speaker and its microphone, and sends any audio input received via its microphone over the established speech connection.
  • the audio input may be coded by the WAE software according to a required format, i.e. PCM, CEP or the like.
  • the ASR server 6 converts the received audio input into text.
  • the WTA server 7 sends a command to call the given MSISDN to the network application server 5 providing the voice mail service, which then calls the MSISDN.
  • a multiparty call is setup, since the ASR server 6 requires a speech input at the mobile station 1 and the network application server 5 needs to send audio to the mobile station 1.
  • These two services are in different machines and may not have any API (Application Programming Interface) or connection with each other. Since both servers need to access the mobile station 1, a multiparty call setup is required, which is explained in the following.
  • the WTA server 7 receives a call indication for the MSISDN and sends a call indication event message to the mobile station 1 with special parameters to instruct an addition of the call to a multiparty call.
  • the mobile station 1 sends a call hold message to instruct the network 4 to hold call 1, i.e. the call from the ASR server 6 to the mobile station 1.
  • the mobile station 1 accepts call 2, i.e. the call from the network application server 5 to the mobile station 1, and a speech connection is established.
  • the mobile station 1 instructs the establishment of a multiparty call, i.e. with call 1 and 2, such that now both the ASR server 6 and the network application server 5 are connected to the mobile station 1.
  • the WTA server 7 Since the voice mail session was started, the WTA server 7 has loaded the corresponding ASD file for voice mail and is now in a position to compare the received text with the valid context-dependent commands. If a valid command, i.e.
  • the WTA server 7 requests the network application server 5 providing the voice mail service to play the message "Anthony” . Accordingly, the network application server 5 performs playback of the message "Anthony” .
  • a speech control system and method wherein a state definition information is loaded from a network application server.
  • the state definition information defines possible states of the network application server and is used for determining a set of valid commands of the network application server, such that a validity of a text command obtained by converting an input speech command can be checked by comparing said text command with said determined set of valid commands. Thereby, a transmission of erroneous text commands to the network application server can be prevented so as to reduce total processing time and response delays.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Mobile Radio Communication Systems (AREA)
  • Telephonic Communication Services (AREA)
  • Selective Calling Equipment (AREA)
  • Computer And Data Communications (AREA)
  • Exchange Systems With Centralized Control (AREA)

Abstract

A speech control system and method is described, wherein a state definition information is loaded from a network application server. The state definition information defines possible states of the network application server and is used for determining a set of valid commands of the network application server, such that a validity of a text command obtained by converting an input speech command can be checked by comparing said text command with said determined set of valid commands. Thereby, a transmission of erroneous text commands to the network application server can be prevented so as to reduce total processing time and response delays.

Description

METHOD AND SYSTEM OF CONFIGURING A SPEECH RECOGNITION SYSTEM
FIELD OF THE INVENTION
The present invention relates to a speech control system and method for a telecommunication network, wherein a network application server is controlled on the basis of a speech command.
BACKGROUND OF THE INVENTION
In distributed speech recognition (DSR) systems, the user may control an application on the basis of spoken control messages supplied to an automatic speech recognition (ASR) means or engine. The spoken control messages are converted by the ASR engine into text commands which are sent to the application running in a corresponding network application server (NAS) or to a subscriber terminal like a mobile station (MS) from which the spoken control messages have been received.
The basic function of a distributed speech recognition system in the context of mobile applications is the ability of a mobile station to provide automatic speech recognition features with the help of a high power ASR engine or ASR server provided in the network. Therefore, the basic function of the mobile station is the transmission of an input speech command to this network ASR engine to perform the recognition tasks and return the results. The result can be a recognized word or command in text format. The mobile station can then use the text to perform the necessary functions.
Another function of such a system is to provide the mobile station with access to other application servers, i.e.
Internet WWW (World Wide Web), email, voice mail and the like, via speech commands. Therefore, the user with such a type of mobile station is able to connect to these application servers and issue speech commands. To achieve this, the mobile station transmits a speech signal (audio) to the ASR engine. The ASR engine will perform speech recognition so as to obtain corresponding text commands. These text commands are returned to the mobile station. The mobile station then uses these text commands* to control a corresponding network application server
(NAS) which can be any server in a data network like the
Internet that provides various services like WWW, email readers, voice mail and so on.
Since the ASR engine usually runs on a platform that can also run other applications or perform other tasks, it is possible to transfer other functions to the ASR engine, such as processing the obtained text command to ascertain the required operation and contact the relevant server. Then, it transmits the information retrieved from the contacted network application server back to the mobile station. In this situation, the mobile station receives a speech input, sends it to a network ASR engine which performs speech recognition, executes necessary functions based on the speech commands and sends the retrieved information or results to the mobile station.
In the following, examples for the above cases are described:
Example 1 :
The user might say "Call John Smith" . In this case, the ASR engine converts the speech into text and returns the text "Call John Smith" to the mobile station, where the application software in the mobile station then retrieves the number for John Smith and performs a calling operation.
Example 2 :
The speech command at the mobile station might be "Racing Info". In this case, the ASR engine converts the speech into text, and returns the text "Racing Info" to the mobile station. Thus, the application software of the mobile station recognizes that the user wishes to access the network server that provides a Horse Racing Information. Accordingly, the mobile station establishes a connection with the relevant server, retrieves the latest race results and displays the results- n a display of the mobile station.
Example 3 :
A speech command input to the mobile station might be "Read
Email". In this case, the ASR engine converts the speech into text and returns the text "Read Email" to the mobile station. Thus, the application software of the mobile station recognizes that the user wishes to access the network server that provides access to the user's email box. In this case, the mobile station sends a command to the ASR engine to establish a connection with the relevant email application server. Now, the ASR engine does not return the recognized speech, but further processes the converted speech. In case the speech command was "Message 1", the ASR engine receives the speech and translates it into a text command "Message 1" and transmits this text command to the email application server. In turn, the email application server returns the text of Message 1 to the ASR engine. The ASR engine will then transmit this text to the mobile station. The dialog may continue with Message 2, 3 and so on, wherein each speech command from the user will be handled by the ASR engine, until the user issues an exit command or until a message is received from the mobile station to terminate the session.
In the above examples 1 and 2, the only function of the ASR engine is to convert speech into text and to send the results back to the mobile station for further processing. Therefore, the network application servers will receive commands directly from the mobile station. However, in the above example 3, the
ASR engine itself processes the converted speech and directly accesses the relevant network application server in order to receive the results from the network application server and pass the results back to the mobile station.
Thus, the mobile station or the ASR engine is required to communicate with the network application server to issue user commands to the network application server and receive responses from the network application server.
However, the following problem is encountered in either one of the cases. It is assumed that the email application to be read supports commands such as A {Message 1, Message 2 ... Message N and Exit} at the top level menu. In case the user is already reading a message, the commands in this context are B {Delete, Exit, Next Message}. Therefore, if the user is in the top level menu and inputs a speech command other than those in the command set A, the network application server will respond with an error message. Even if the user issues a speech command from the command set B, this command will still be an erroneous command, since the context or state of the network application server is different.
Moreover, context irrelevant commands could as well be input into the mobile station due to noise and the like. All of these speech signals will be converted into a text by the ASR engine and sent to the network application server which will respond with error messages.
As such scenarios may occur frequently, the processing of valid commands by the network application server will be delayed, since valuable network band widths and application server processor time is required for responding to such invalid commands.
Moreover, the above problem leads to a delay in the response of the ASR engine to an input speech message, since it has to wait for responses from the network application server. Accordingly, the overall response time at the mobile station will be increased, such that the user may repeat the command or change the command which increases the delays even further and leads to a poor performance of the system.
-* ^ — S—U—MM 'ARY— OF THE INVENTION
It is an object of the present invention to provide a speech control system and method having a reduced overall response time.
This object is achieved by a speech control system for a telecommunication network, comprising: loading means for loading a state definition information from a network application server, wherein said state definition information defines possible states of the network application server; determining means for determining a set of valid commands for said network application server on the basis of said state definition information; and checking means for checking a validity of a text command, obtained by converting an input speech command to be used for controlling said network application server, by comparing said text command with said determined set of valid commands.
Furthermore, the above object is achieved by a speech control method for a telecommunication network, comprising the steps of: loading a state definition information from a network application server, wherein said state definition information defines possible states of the network application server; determining a set of valid commands for said network application server on the basis of said state definition information; and checking a validity of a text command, obtained by converting a speech command to be used for controlling said network application server, by comparing said text command with said determined set of valid commands. Accordingly, since a set of valid commands can be determined on the basis of a state definition information provided by the network application server, the validity of an obtained text command can be checked before transmitting the text command to the network application server. Thus, the transmission of erroneous text messages can be prevented so as to prevent corresponding delays and wastes of processing time of the network application server.
Preferably, the loading means can be arranged to load a grammar and/or vocabulary information which specifies a total set of valid commands supported by the network application server, wherein the determining means can be arranged to determine said set of valid commands on the basis of said total set of valid commands and a state transition information included in said state definition information.
Thereby, the speech control system can keep up with the actual states of the network application server by referring to state transition rules so as to limit the total set of valid commands to those commands which correspond to the actual state of the network application server.
Alternatively, the determining means can be arranged to cause the loading means to load a state-dependent grammar file defining a set of valid commands for a specific state of the network application server, when the determining means determines a state change on the basis of a state transition information included in the state definition information.
Thus, only the set of valid commands applicable to a particular state of the network application server is loaded by referring to the state transition information. Thereby, accuracy can be improved and network connections can be used more efficiently. Preferably, the network control system may comprise a speech recognition means for converting an input speech command received from a subscriber terminal into the text command to be supplied to the network application server. Thus, a central speech control system can be provided in the network which can be accessed by individual subscriber terminals .
In case a Wireless Application Protocol (WAP) is used in a mobile network, the speech control system may be implemented in a Wireless Telephony Application (WTA) server, wherein the WTA server may be arranged to receive the text command from a network speech recognition means for converting an input speech command received from a subscriber terminal into said text command. Thereby, existing WTA applications can be enhanced with an optimized speech recognition.
As an alternative, the speech control system may be a subscriber terminal having an input means for inputting a speech command, a transmission means for transmitting the speech command to a speech recognition means of the telecommunication network, and a receiving means for receiving the text command from the speech recognition means, wherein the transmitting means is arranged to transmit the received text command to the network application server.
Thus, the validity check of the received text command is performed in the subscriber terminal, e.g. the mobile station, before it is transmitted to the network application server. Hence, the processing time at the network application server can be reduced, as it will receive only valid commands .
The state definition information can be a data file such as a Wireless Markup Language (WML) file or a Hyper Text Markup Language (HTML) file. This data file can be sent online to the speech control system as a part of the standard information sent by the network application server.
Furthermore, the state definition information may include a load instruction for loading the state-dependent grammar and/or vocabulary file. Thereby, the speech control system may use the load instruction directly for loading the specific set of valid commands in case a change of the state of the network application server is determined.
Preferably^ the state definition information can be provided by the network application server at a setup time of the server.
Furthermore, the state definition information can be stored together with a command set info in a network server running on the hardware of the speech control system.
Preferably, the speech control system may comprise a plurality of vendor-specific speech recognition means, wherein corresponding parameters for said plurality of vendor-specific speech recognition means are defined in the state definition information. Thereby, a universal speech control system can be obtained which is based on a hardware and software independent platform. Thus, a required audio processing hardware and vendor-specific speech recognition means can be selected depending on the network application server.
Further preferred developments of the present invention are defined in the dependent claims.
BRIEF DESCRIPTION OF THE DRAWINGS
In the following, the invention will be described in greater detail on the basis of a preferred embodiment with references to the accompanying drawings, wherein:
Fig. 1 shows a block diagram of a telecommunication network comprising a speech control system according to the preferred embodiment of the present invention;
Fig. 2 shows a flow diagram of a speech control method according to the preferred embodiment of the present invention; and Fig. 3 shows a block diagram of a telecommunication network comprising a WAP-based speech control system according to the preferred embodiment of the present invention.
DESCRIPTION OF THE PREFERRED EMBODIMENT
A block diagram of a telecommunication network comprising the speech control system according to the preferred embodiment of the present invention is shown in Fig. 1. According to Fig. 1, a mobile station (MS) 1 is radio-connected to a base station subsystem (BSS) 2 which is connected to a telecommunication network 4 via a mobile switching center (MSC) 3. The telecommunication network 4 may be a data network like the Internet which provides various services.
Furthermore, a network application server (NAS) 5 is connected to the network 4 in order to provide a specific service on the basis of corresponding commands. Additionally, an automatic speech recognition means or engine (ASR) 6 is provided as a central means for enabling speech input at subscriber terminals like the mobile station 1.
To refine the recognition process and arrive at recognition rates with higher accuracy, language specific features are employed in the ASR engine 6. To achieve a high accuracy of the speech recognition, the application has to be fine-tuned to a required context. This is done by specifying a vocabulary for the application and grammars that are valid in the context of the application. The vocabulary is basically a set of words to be recognized by the ASR engine 6, e.g. words like Close, Read, Message, Orange, Pen, Chair, Exit, Open etc.. In the ASR engine 6, a means for specifying the grammar for a given application can be provided. This could be achieved by a rule- based grammar like for example:
public <Command> = [<Polite>] <Action><Object> (and <Object>) * ; <Action> = Read | Next | delete;
<Object> = message | item;
<Polite> = Please;
In the above rule-based grammar, one public rule, <Command>, is specified, which may be spoken by a user. The rule is a combination of subrules <Action>, <Object> and <Polite>, wherein the square brackets around <Polite> indicate an optionality thereof. Therefore, the above grammar would support the following commands: "read message", "please read item and message" etc..
In command-based ASR applications, rule-based grammars are used to define all spoken input which the application is programmed to handle. The rule-based grammar basically specifies all spoken commands (or command syntax) that are supported by an application. In case of an email reader, the grammar file contains all commands which the email reader application will accept (e.g. Message 1, Message 2, ..., Message N, Exit, Delete and Next Message) .
The ASR engine 6 generally loads the associated grammar file before starting the speech recognition. Some applications may even have multiple grammar files to define different contexts of an application such as the network application server 5, wherein the ASR engine 6 is required to load the context- dependent grammar file at run time.
In the preferred embodiment, a grammar file, a vocabulary file and an application states definition file (ASD file) are defined. Therefore, each network application server 5 produces an ASD file, a grammar file and/or a vocabulary file. The grammar file is adapted to the requirements of the ASR engine 6, wherein ASR engines 6 of different vendors may have different grammar file formats.
The ASD file is a file which describes all possible states of the application and how to jump between states, along with the valid commands for each state. Thus, the ASD file provides a means for specifying the context-dependent grammar files and also a vocabulary file name. This is an important feature, since a given application may use different grammars and/or vocabularies depending on the context. If this information is loaded on-line to the ASR engine 6, the speech recognition and the overall response time can be improved remarkably due to the small set of valid commands and the resulting high recognition accuracy.
In case the ASD file is based on a syntax similar to HTML (Hyper Text Markup Language), it could be defined as follows:
<ASD> <APP = "Email Reader">
<STATE = "Main Menu", COMMANDS = <MSG> , NEXTSTATE="Read" ,
<QUIT> , NEXTSTATE= " " > ;
<STATE = "Read", COMMANDS = <NXT> , NEXTSTATE= "Read" , <PREV>,
NEXTSTATE= "Read" , <QUIT> , NEXTSTATE= "Main Menu" > ; :
<GRAMMAR>
<MSG> = MESSAGE<DIGITS> <NXT> = NEXT <PREV> = PREVIOUS <QUIT> = EXIT
<DIGITS> = 1|2|3|4|5 ;
</GRAMMAR> </APP> </ASD>
wherein an <ASD> tag identifies the file as a file type that provides the state definition of the network application server 5, an <APP> tag specifies the application name and a <STATE> tag defines a given state, i.e. the name of the state, the valid commands for this state, and with each command, the next state to which the application must jump is also defined. Such a <STATE> tag is defined for each state of the network application. The <GRAMMAR> tag provides a means of defining the commands and the syntax of the commands. According to the above file, the application has to jump to the state "Read" after the Messages 1, 2, 3 . . . N . The
<digits> tag defines a specific grammar. In the present case, the <GRAMMAR> tag shows that the digits could be 1, 2, 3, 4 or 5. After the command "Exit" the application should be exit
(which -ά" _s. denoted as a NULL state ("")). It is to be noted that the state is to be transferred to the "Main Menu", when an "Exit" command is issued in the "Read" state.
Using this approach, the ASD file tells the ASR engine 6 or the mobile station 1 which commands are valid for a given context. In order for the mobile station 1 or the ASR engine 6 to keep up with the states of the network application server 5, state transition rules are also provided in the ASD file. Using other tags which include a context-dependent grammar file, it would be possible to instruct the ASR engine 6 which grammar or vocabulary file is to be loaded. Thereby, a higher flexibility can be provided and a recognition can be made more accurate, since the ASR engine 6 is fine-tuned to the context of the network application server. An example for such a tag is shown in the following:
<STATE= "Read" LOADGRAMMAR= "URL=ftp : / /hs . gh . com/Reademail . gmr" LOADVOCABULARY= "URL=ft : / /hs . gh . com/Reademail . vcb"
COMMANDS= "Next" , NEXTSTATE= "Read" , <PREV>, NEXTSTATE= "Read" , <QUIT>, NEXTSTATE="Main Menu">;
Fig. 2 shows a flow diagram of an example for a speech recognition processing as performed in the preferred embodiment .
Initially, the ASR engine 6 loads a corresponding ASD file from the network application server 5 to be connected (S101). In the loaded ASD file, the ASR engine is instructed to load a state-dependent grammar file, i.e. "Read Email. g r", when the network application server 5 enters the state "Read" . Alternatively, the ASR engine 6 may load a general grammar file from the network application server 5 (S102). Based on the grammar file, valid text commands for speech recognition are then determined (S103). In case of a state- dependent grammar file, the commands defined in the loaded grammar file are determinded as valid commands for the speech recognition. In case of a general grammar file, the valid commands are selected from the general grammar file in accordance with a corresponding information provided in the ASD file. Accordingly, only the determined valid commands are allowed in this state or at least until a different grammar file is loaded.
Thereafter, a speech command is received from the mobile station 1 (S104) and speech recognition is performed for the received speech command (S105) . The text command derived by the speech recognition processing from the received speech command are then checked against the determined valid text commands (S106).
In case of a valid command is determined is step 107, the text command is supplied directly to the network application server 5 or to the mobile station 1 (S108) . Otherwise, an error messaging is performed so as to inform the mobile station 1 of the erroneous speech command (S109).
Thereafter, the ASR engine 6 refers to the state transition rules defined in the ASD file and determines whether the supplied command leads to a state change of the network application server 5 (S110) . If no state change has been determined, the processing returns to step S104 in order receive another speech command and perform speech recognition of the other received speech commands, if required.
If a state change has been determined, the processing returns to step 103 and the ASR engine 6 refers to the ASD file so as to determine a new set of valid text commands. This can be achieved either by loading a new state-dependent grammar file according to an instruction provided in the ASD file, or by selecting new valid commands from the general grammar file based on a corresponding information in the ASD file.
Subsequently, a new speech command is received in step 104 and speech recognition is continued in step 105.
An important aspect is that it is necessary for DSR type applications to have a standard method of passing application specific features to the ASR engine 6, since the ASR engine 6 is a general purpose ASR resource and any network application should be able to use the ASR features by producing state definition and grammar files. Therefore, according to the preferred embodiment, the ASR engine 6 can load a new grammar file at run time. This means that the ASR engine 6 can be instructed to load only the grammar rules applicable to a particular state/context of the network application server 5 by referring to the ASD file. This greatly improves recognition accuracy and efficiency of the use of the network connections .
An implementation of the network application server 5 and its user interface may vary depending on the software and hardware platform used. Most network application servers 5 may provide a HTTP interface (i.e. HTML), a WAP (Wireless Application Protocol - WML) or a proprietary Application Interface (API) . If the ASD file is adapted to either WML (Wireless Markup Language) or HTML (Hyper Text Markup Language) , it can be used as a universal definition file for application states or speech commands in any type of application running on a network application server 5. Using this ASD information, the ASR engine 6 would be able to build an internal representation of the relevant NAS application. This representation or model can then be used to keep the ASR engine 6 in synchronism with the application states of the network application server 5.
Hence, each network application server 5 which provides a speech recognition feature will have its speech-specific WML card(s) or HTML location. As an example, for a dailynews service, the state definition information URL (Uniform Resource Locator) might be a file such as: //services . internal .net/dailynews/speechsettings
Therefore, the speech control system, whether it is in the mobile station 1 or in a network server, needs to load this file from the given URL.
Furthermore, if the network application server 5 is actually a HTTP or WAP origin server, then the first WML card or HTML page sent by this server can include the above specific URL under a special tag. Thereby, the mobile station 1 can be informed that this application supports a speech control and that the file at this URL needs to be loaded in order to provide the speech recognition facility.
Thus, the ASD files could be sent on-line to the ASR engine 6, as a part of the standard HTML/WML scripts sent by the network application server 5. The ASR engine 6 would interpret these scripts automatically and keep step with the network application server 5 so as to process the speech commands efficiently and perform functions such as on-line loading of grammar files and so on. In this case ASR engine 6 would directly refer to the URL specified in the LOADGRAMMAR tag so as to read the associated grammar file.
For other non-WML/HTML applications of the network application server 5, the ASD files are supplied by the network application server 5 to the ASR engine 6 at setup time, i.e. off-line. These ASD files must be produced in line with the HTML-like specification described above and will be stored along with a grammar file in a WWW server (e.g. www.asr.com) running on the hardware of the ASR engine 6.
At the beginning of an interaction between the ASR engine 6 and the network application server 5, the ASR engine 6 first loads the ASD file from the server www.asr.com and builds the internal state representation/model of the application of the network application server 5. Thereafter, the ASR engine can keep step with the states of the network application server 5 and processes speech commands efficiently and performs functions such as run-time loading of grammar files. In this case, the LOADGRAMMAR tag includes the full URL which points to the www.asr.com.
If the" application of the network application server 5 is for example a "voice mail server" with an apparatus name vmsvr, then the following URL would be used for example:
"http: //www.asr . com/vmsvr/Grammar/vmail .gmr"
The above applications were based on the use of a single ASR engine 6 in the network 4. Therein, the ASR engine 6 is implemented on fixed hardware and software platforms . From the mobile station application's point of view, this universal ASR engine 6 handles the ASR requests and responds with the corresponding text commands .
However, in case the ASR engine 6 is based on a hardware and software independent platform such as Java with the JSAPI (Java speech API, i.e. a standard API which is under development at present and provides a common API to ASR engines of disparate vendors), the functions of the ASD file can be extended even further. In this case, a flexibility of selecting a required audio processing hardware and a vendor- specific ASR engine 6 in dependence on the application of the network application server 5 can be provided. This means, that a logical ASR engine can be connected to the vendor-specific physical ASR engine 6 based on the application requirements of the network application server 5, such that even custom hardware can be used for audio processing. The corresponding optional parameters can be defined in the ASD file using additional tags .
In the following, an example of an implementation of the ASD file in a WAP application is described, which may be used by operators to enhance their existing service offerings. In the previous examples, the ASD file was used by the ASR server or engine 6 in order to perform a context-based speech recognition. In this example, as shown in Fig. 3, the ASD file is used by a different application server, i.e. the WTA (Wireless Telephony Application) server 7 in WAP, to perform similar, tasks. In this case, the use of WAP-enabled mobile phones or stations 1 is assumed.
The WAP-enabled mobile station 1 may have the full WAP stack installed and runs the WAE (Wireless Application Environment) . The WTA server 7 has the ability to control the services of the network 4, which is a standard mobile network in the present case. The WTA server 7 acts as a principle content generator. The content may be customized and downloaded to the client, which is the mobile station 1 running a WAP software. The WTA server 7 could also perform call control functions such as informing the mobile station 1 of incoming call details via WTA events .
Furthermore, a network-based ASR server 6 is provided which enables an application to connect to the speech server based on parameters such as ID/address of the application, MSISDN, speech encoding type, grammar file ID (to select an appropriate grammar rule) and other optional parameters. Moreover, the ASR server 6 may have the ability to perform an outgoing call to a given MSISDN number, wherein the ASR server
6 extracts the received audio input having a PCM, CEP or other format, supplies the audio input to a speech recognition engine and obtains the recognized text, and sends the text to the ID/address of the calling application. The WTA server 7 then checks the validity of the text and may also control the
ASR server 6 to load grammar files etc..
Each network application server 5 having a speech interface provides an ASD file to the WTA server 7, along with a basic WML card deck, i.e. WML document, for that service. The WTA server 7 loads the ASD file and may change the WML sent to the mobile station 1 based on the ASD file settings. Based on the ASD file, audio functions of the mobile station 1 and settings of the ASR server 6 are controlled in dependence on the application context.
In the present example, the ASD file may define attributes such as an ASR engine to be used for an actual application, an encoding type supported by the ASR engine used by the actual speech-enabled application, a default grammar file (file name) to be used, a default vocabulary (file name or words) and states of the actual application, i.e. a menu hierarchy. Each menu provides specifications for commands supported at the menu and corresponding NEXT states, new grammar rules and vocabularies, which may override previously set values, and parameters specifying whether the actual application requires a microphone or a speaker of the mobile station 1 to be on or off.
In the following, the operation of the present WAP-based example will be described based on a weather service application and a voice mail service application.
Weather service application:
The service provider (or operator) provides a weather service to its mobile subscribers and offers the service over a speech interface. The operator has installed the ASR server 6 in his network 4 and intends to use this ASR server 6 along with the WTA server 7 to provide the weather service with a speech interface .
In this case, the user of the mobile station 1 activates a weather menu being already primed to use the speech interface. This request is sent by the WAE to the WTA server 7. Then, the WTA server 7 sends a deck of WML cards pre-loaded from the corresponding network application server 5 and relating to the weather service, to the mobile station 1. At this point, the WAE software of the mobile station 1 goes to a listening mode in order to answer an incoming call from the ASR server 6 of the network 4. Thereafter, the WTA server 7 sends a request for an ASR session to the ASR server 6, including an MSISDN, an allocated session ID with the WTA server 7, and also an ID of a grammar rule to be used. The grammar rule name is derived from the ASD file pre-loaded from the corresponding network application server 5 for the weather service.
The ASR server 6 ensures the required resources, i.e. dialout ports and ASR sessions on the speech engine, are available and sends a confirmation to the WTA server 7. Subsequently, the ASR server 6 calls the MSISDN and the network 4 sends a call indication to the mobile station 1. The WAE software of the mobile station 1 automatically answers the call and a speech connection is established between the ASR server 6 and the mobile station 1. Actually, the above call signaling between the mobile station 1 and the ASR server 6 is performed via the WTA server 7.
In accordance with the application-dependent WML obtained from the WTA server 7, the mobile station 1 deactivates its speaker and sends any audio input received via its microphone over the established speech connection. The audio input may be coded by the WAE software according to a required format, i.e. PCM, CEP or the like. The ASR server 6 converts the received audio input into text and sends the obtained text to the WTA server 7.
Since the weather session was started, the WTA server 7 has loaded the corresponding ASD file and is now in a position to compare the received text with the valid context-dependent commands. If a valid command, i.e. "London UK", has been received, the WTA server 7 requests the WML/HTML for London UK from the network application server 5 providing the weather service. The network application server 5 responds with the requested weather report for London and the WTA server 7 supplies the WML card deck for London weather to the mobile station 1. In case the grammar rules or vocabulary is changed in the set of WML cards, the ASD file contains a corresponding information and the WTA server 7 sends the new grammar rules or vocabulary to be used for the London weather to the ASR server 6. Thus, the ASR server 6 is primed to use the new grammar or vocabulary required for the new WML cards .
Thereafter, the text converted by the ASR server 6 from the speech commands received from the mobile station 1 is sent to the WTA server 7 which checks its validity. In case a valid command, i.e. "Heathrow", has been received, the WTA server 7 requests the weather info for London Heathrow, and the network application server 5 responds with the requested weather report. Then, the WML card deck for London Heathrow weather is supplied by the WTA server 7 to the mobile station 1.
Voice mail service application:
In this case, the service provider (or operator) provides a voice mail service with a speech interface to its mobile subscribers .
The network application server 5 providing the voice mail service sends a new voice mail message to the WTA server 7. Then, the WTA server 7 sends a deck of WML cards pre-loaded from the network application server 5 and relating to the voice mail service to the mobile station 1. At this point, the WAE software of the mobile station 1 goes to a listening mode in order to answer an incoming call from the ASR server 6 of the network 4. Then, the mobile station 1 sends to the WTA server 7 an ASR request which indicates that the user will employ the speech interface to the voice mail service. At this point, the WTA server 7 instructs the network 4 to send any incoming call indications to the WTA server 7.
Thereafter, the WTA server 7 sends a request for an ASR session to the ASR server 6, including an MSISDN, an allocated session ID with the WTA server 7, and also an ID of a grammar rule to be used. The grammar rule name is derived from the ASD file pre-loaded from the corresponding network application server 5 for the voice mail service. The ASR server 6 ensures the required resources, i.e. dialout ports and ASR sessions on the speech engine, are available and sends a confirmation to the WTA server 7. Subsequently, the ASR server 6 calls the MSISDN and the network 4 sends a call indication to the mobile station 1. The WAE software of the mobile station 1 automatically answers the call and a speech connection is established between the ASR server 6 and the mobile station 1.
In accordance with the application-dependent WML obtained from the WTA server 7, the mobile station 1 activates both its speaker and its microphone, and sends any audio input received via its microphone over the established speech connection. The audio input may be coded by the WAE software according to a required format, i.e. PCM, CEP or the like. The ASR server 6 converts the received audio input into text.
Now, the WTA server 7 sends a command to call the given MSISDN to the network application server 5 providing the voice mail service, which then calls the MSISDN. In this case, a multiparty call is setup, since the ASR server 6 requires a speech input at the mobile station 1 and the network application server 5 needs to send audio to the mobile station 1. These two services are in different machines and may not have any API (Application Programming Interface) or connection with each other. Since both servers need to access the mobile station 1, a multiparty call setup is required, which is explained in the following.
In the multiparty call setup, the WTA server 7 receives a call indication for the MSISDN and sends a call indication event message to the mobile station 1 with special parameters to instruct an addition of the call to a multiparty call. The mobile station 1 sends a call hold message to instruct the network 4 to hold call 1, i.e. the call from the ASR server 6 to the mobile station 1. Then, the mobile station 1 accepts call 2, i.e. the call from the network application server 5 to the mobile station 1, and a speech connection is established.
Thereafter, the mobile station 1 instructs the establishment of a multiparty call, i.e. with call 1 and 2, such that now both the ASR server 6 and the network application server 5 are connected to the mobile station 1.
Since the voice mail session was started, the WTA server 7 has loaded the corresponding ASD file for voice mail and is now in a position to compare the received text with the valid context-dependent commands. If a valid command, i.e.
"Anthony", has been received, the WTA server 7 requests the network application server 5 providing the voice mail service to play the message "Anthony" . Accordingly, the network application server 5 performs playback of the message "Anthony" .
It should be understood that the above description and the accompanying drawings are only intended to illustrate the present invention. In particular, the present invention is not restricted to speech recognition or control systems for mobile phones, but can be used in any data network. Thus, the apparatus and method according to the invention may vary within the scope of the attached claims.
A speech control system and method is described, wherein a state definition information is loaded from a network application server. The state definition information defines possible states of the network application server and is used for determining a set of valid commands of the network application server, such that a validity of a text command obtained by converting an input speech command can be checked by comparing said text command with said determined set of valid commands. Thereby, a transmission of erroneous text commands to the network application server can be prevented so as to reduce total processing time and response delays.

Claims

Claims
1. Speech control system for a telecommunication network (4), comprising: a) loading means for loading a state definition information from a network application server (5), wherein said state definition information defines possible states of the network application server (5); b) determining means for determining a set of valid commands for said network application server (5) on the basis of said state definition information; and c) checking means for checking a validity of a text command, obtained by converting an input speech command to be used for controlling said network application server (5), by comparing said text command with said determined set of valid commands.
2. System according to claim 1, wherein said loading means is arranged to load a grammar and/or vocabulary information which specifies a total set of valid commands supported by said network application server, wherein said determining means is arranged to determine said set of valid commands on the basis of said total set of valid commands and a state transition information included in said state definition information.
3. System according to claim 1, wherein said determining means is arranged to cause said loading means to load a state-dependent grammar file defining a set of valid commands for a specific state of the network application server (5), when said determining means determines a state change on the basis of a state transition information included in said state definition information.
4. System according to any one of the preceding claims, wherein said speech control system comprises a speech recognition means (6) for converting an input speech command received from a subscriber terminal (1) into said text command to be supplied to said network application server
(5).
5. System according to any one of claims 1 to 3 , wherein said telecommunication network (4) is a mobile network and said speech control system is implemented in a Wireless Telephony
Application (WTA) server (7), and wherein said WTA server (7) is arranged to receive said text command from a network speech recognition means (6) for converting an input speech command received from a subscriber terminal (1) into said text command.
6. System according to any one of claims 1 to 3 , wherein said speech control system comprises a subscriber terminal (1) having an input means for inputting a speech command, a transmitting means for transmitting said speech command to a speech recognition means (6) of said telecommunication network (4), and a receiving means for receiving said text command from the speech recognition means (6), wherein said transmitting means is arranged to transmit the received text command to said network application server (5) .
7. System according to claim 3, wherein said state definition information includes a load instruction for loading the state-dependent grammar file.
8. System according to any of the preceding claims, wherein said state definition information is a data file.
9. System according to claim 8, wherein said data file is a WML file.
10. System according to claim 8, wherein said data file is a HTML file.
11. System according to claim 9 or 10, wherein said data file is sent on-line to said speech control system as a part of a standard information sent by said network application server (5).
12. System according to claim 1, wherein said state definition information is provided by said network application server (5) at a setup time.
13. System according to claim 4, wherein said state definition information is stored together with a command set information in a network server running on a hardware of said speech control system.
14. System according to claim 4 or 6, wherein said speech control system comprises a plurality of vendor-specific speech recognition means, and wherein corresponding parameters of said plurality of vendor-specific speech recognition means are defined in said state definition information.
15. Speech control method for a telecommunication network, comprising the steps of: a) loading a state definition information from a network application, wherein said state definition information defines possible states of said network application; b) determining a set of valid commands for said network application on the basis of said state definition information; and c) checking a validity of a text command, obtained by converting a speech command to be used for controlling said network application, by comparing said text command with said determined set of valid commands .
16. Method according to claim 15, further comprising the steps of loading a grammar and/ or vocabulary information which specifies a total set of valid commands for said network application, wherein said determining step is performed on the basis of said total set of valid commands and a state transition information included in said state definition information.
17. Method according to claim 15, further comprising the step y :$ of loading a state-dependent grammar file defining a set of valid commands for a specific state of said network application, when a state change has been determined on the basis of the state transition information included in said state definition information.
PCT/EP1998/006030 1998-09-22 1998-09-22 Method and system of configuring a speech recognition system WO2000017854A1 (en)

Priority Applications (8)

Application Number Priority Date Filing Date Title
AU10253/99A AU1025399A (en) 1998-09-22 1998-09-22 Method and system of configuring a speech recognition system
JP2000571437A JP4067276B2 (en) 1998-09-22 1998-09-22 Method and system for configuring a speech recognition system
PCT/EP1998/006030 WO2000017854A1 (en) 1998-09-22 1998-09-22 Method and system of configuring a speech recognition system
EP98952622A EP1116373B1 (en) 1998-09-22 1998-09-22 Method and system of configuring a speech recognition system
AT98952622T ATE239336T1 (en) 1998-09-22 1998-09-22 METHOD AND DEVICE FOR CONFIGURING A VOICE RECOGNITION SYSTEM
DE69814181T DE69814181T2 (en) 1998-09-22 1998-09-22 METHOD AND DEVICE FOR CONFIGURING A VOICE RECOGNITION SYSTEM
ES98952622T ES2198758T3 (en) 1998-09-22 1998-09-22 PROCEDURE AND CONFIGURATION SYSTEM OF A VOICE RECOGNITION SYSTEM.
US09/809,808 US7212970B2 (en) 1998-09-22 2001-03-16 Method and system of configuring a speech recognition system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP1998/006030 WO2000017854A1 (en) 1998-09-22 1998-09-22 Method and system of configuring a speech recognition system

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US09/809,808 Continuation US7212970B2 (en) 1998-09-22 2001-03-16 Method and system of configuring a speech recognition system

Publications (1)

Publication Number Publication Date
WO2000017854A1 true WO2000017854A1 (en) 2000-03-30

Family

ID=8167070

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP1998/006030 WO2000017854A1 (en) 1998-09-22 1998-09-22 Method and system of configuring a speech recognition system

Country Status (8)

Country Link
US (1) US7212970B2 (en)
EP (1) EP1116373B1 (en)
JP (1) JP4067276B2 (en)
AT (1) ATE239336T1 (en)
AU (1) AU1025399A (en)
DE (1) DE69814181T2 (en)
ES (1) ES2198758T3 (en)
WO (1) WO2000017854A1 (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1137242A1 (en) * 2000-03-24 2001-09-26 Alcatel Telecommunication system, terminal and network for vocal commanding
FR2810823A1 (en) * 2000-06-27 2001-12-28 Canecaude Emmanuel De E mail output screen WAP mobile telephone information transmission having network internet network bridge connected and speech information transcription written format accessible digital information.
GB2364480A (en) * 2000-06-30 2002-01-23 Mitel Corp Initiating a WAP session using voice recognition
WO2003019905A1 (en) * 2000-07-13 2003-03-06 Impulsity, Inc. Mixed-mode interaction
WO2003030507A2 (en) * 2001-10-03 2003-04-10 Accenture Global Services Gmbh Multi-modal messaging and callback with service authorizer and virtual customer database
US6633846B1 (en) 1999-11-12 2003-10-14 Phoenix Solutions, Inc. Distributed realtime speech recognition system
US6665640B1 (en) 1999-11-12 2003-12-16 Phoenix Solutions, Inc. Interactive speech based learning/training system formulating search queries based on natural language parsing of recognized user queries
US7149287B1 (en) 2002-01-17 2006-12-12 Snowshore Networks, Inc. Universal voice browser framework
US7233655B2 (en) 2001-10-03 2007-06-19 Accenture Global Services Gmbh Multi-modal callback
US7254384B2 (en) 2001-10-03 2007-08-07 Accenture Global Services Gmbh Multi-modal messaging
US7441016B2 (en) 2001-10-03 2008-10-21 Accenture Global Services Gmbh Service authorizer
US7472091B2 (en) 2001-10-03 2008-12-30 Accenture Global Services Gmbh Virtual customer database
US7610547B2 (en) 2001-05-04 2009-10-27 Microsoft Corporation Markup language extensions for web enabled recognition
US7640006B2 (en) 2001-10-03 2009-12-29 Accenture Global Services Gmbh Directory assistance with multi-modal messaging
US7657424B2 (en) 1999-11-12 2010-02-02 Phoenix Solutions, Inc. System and method for processing sentence based queries
US9076448B2 (en) 1999-11-12 2015-07-07 Nuance Communications, Inc. Distributed real time speech recognition system

Families Citing this family (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7400712B2 (en) * 2001-01-18 2008-07-15 Lucent Technologies Inc. Network provided information using text-to-speech and speech recognition and text or speech activated network control sequences for complimentary feature access
JP2003114698A (en) * 2001-10-03 2003-04-18 Denso Corp Command acceptance device and program
JP2003143256A (en) * 2001-10-30 2003-05-16 Nec Corp Terminal and communication control method
US7275217B2 (en) * 2002-09-09 2007-09-25 Vijay Anand Saraswat System and method for multi-modal browsing with integrated update feature
US7386443B1 (en) 2004-01-09 2008-06-10 At&T Corp. System and method for mobile automatic speech recognition
JP4789507B2 (en) * 2005-05-24 2011-10-12 株式会社小松製作所 Transmission
US7698140B2 (en) * 2006-03-06 2010-04-13 Foneweb, Inc. Message transcription, voice query and query delivery system
US20080114604A1 (en) * 2006-11-15 2008-05-15 Motorola, Inc. Method and system for a user interface using higher order commands
US20080208594A1 (en) * 2007-02-27 2008-08-28 Cross Charles W Effecting Functions On A Multimodal Telephony Device
US10056077B2 (en) 2007-03-07 2018-08-21 Nuance Communications, Inc. Using speech recognition results based on an unstructured language model with a music system
US8880405B2 (en) 2007-03-07 2014-11-04 Vlingo Corporation Application text entry in a mobile environment using a speech processing facility
US8949130B2 (en) 2007-03-07 2015-02-03 Vlingo Corporation Internal and external speech recognition use with a mobile communication facility
US8635243B2 (en) 2007-03-07 2014-01-21 Research In Motion Limited Sending a communications header with voice recording to send metadata for use in speech recognition, formatting, and search mobile search application
US8886545B2 (en) 2007-03-07 2014-11-11 Vlingo Corporation Dealing with switch latency in speech recognition
US8886540B2 (en) * 2007-03-07 2014-11-11 Vlingo Corporation Using speech recognition results based on an unstructured language model in a mobile communication facility application
US8838457B2 (en) 2007-03-07 2014-09-16 Vlingo Corporation Using results of unstructured language model based speech recognition to control a system-level function of a mobile communications facility
US8949266B2 (en) 2007-03-07 2015-02-03 Vlingo Corporation Multiple web-based content category searching in mobile search application
US8032383B1 (en) * 2007-05-04 2011-10-04 Foneweb, Inc. Speech controlled services and devices using internet
US8019606B2 (en) * 2007-06-29 2011-09-13 Microsoft Corporation Identification and selection of a software application via speech
US8379801B2 (en) 2009-11-24 2013-02-19 Sorenson Communications, Inc. Methods and systems related to text caption error correction
US9159322B2 (en) * 2011-10-18 2015-10-13 GM Global Technology Operations LLC Services identification and initiation for a speech-based interface to a mobile device
US9183835B2 (en) 2011-10-18 2015-11-10 GM Global Technology Operations LLC Speech-based user interface for a mobile device
US20130103404A1 (en) * 2011-10-21 2013-04-25 GM Global Technology Operations LLC Mobile voice platform architecture
US9326088B2 (en) 2011-10-21 2016-04-26 GM Global Technology Operations LLC Mobile voice platform architecture with remote service interfaces
US20140244259A1 (en) * 2011-12-29 2014-08-28 Barbara Rosario Speech recognition utilizing a dynamic set of grammar elements
CN104040620B (en) * 2011-12-29 2017-07-14 英特尔公司 Apparatus and method for carrying out direct grammer access
US9583100B2 (en) 2012-09-05 2017-02-28 GM Global Technology Operations LLC Centralized speech logger analysis
KR101284594B1 (en) * 2012-10-26 2013-07-10 삼성전자주식회사 Image processing apparatus and control method thereof, image processing system
US9875494B2 (en) 2013-04-16 2018-01-23 Sri International Using intents to analyze and personalize a user's dialog experience with a virtual personal assistant
US9530416B2 (en) 2013-10-28 2016-12-27 At&T Intellectual Property I, L.P. System and method for managing models for embedded speech and language processing
US9666188B2 (en) 2013-10-29 2017-05-30 Nuance Communications, Inc. System and method of performing automatic speech recognition using local private data
US20160111090A1 (en) * 2014-10-16 2016-04-21 General Motors Llc Hybridized automatic speech recognition
CN107833576A (en) * 2017-11-17 2018-03-23 哈尔滨工大服务机器人有限公司 A kind of semantic processes method and system with intermediate server
US11562731B2 (en) 2020-08-19 2023-01-24 Sorenson Ip Holdings, Llc Word replacement in transcriptions

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0382670A2 (en) * 1989-02-07 1990-08-16 International Business Machines Corporation Voice applications generator
EP0747881A2 (en) * 1995-06-05 1996-12-11 AT&T IPM Corp. System and method for voice controlled video screen display
WO1998008215A1 (en) * 1996-08-19 1998-02-26 Virtual Vision, Inc. Speech recognition manager
EP0854417A2 (en) * 1997-01-06 1998-07-22 Texas Instruments Inc. Voice activated control unit

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5251233A (en) 1990-12-20 1993-10-05 Motorola, Inc. Apparatus and method for equalizing a corrupted signal in a receiver
US5325402A (en) 1991-04-30 1994-06-28 Nec Corporation Method and arrangement for estimating data sequences transmsitted using Viterbi algorithm
US5303263A (en) 1991-06-25 1994-04-12 Oki Electric Industry Co., Ltd. Transmission channel characteristic equalizer
US5790598A (en) 1996-03-01 1998-08-04 Her Majesty The Queen In Right Of Canada Block decision feedback equalizer
US6282511B1 (en) * 1996-12-04 2001-08-28 At&T Voiced interface with hyperlinked information
US6269336B1 (en) * 1998-07-24 2001-07-31 Motorola, Inc. Voice browser for interactive services and methods thereof
JP2000076040A (en) * 1998-09-03 2000-03-14 Matsushita Electric Ind Co Ltd Voice input network terminal equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0382670A2 (en) * 1989-02-07 1990-08-16 International Business Machines Corporation Voice applications generator
EP0747881A2 (en) * 1995-06-05 1996-12-11 AT&T IPM Corp. System and method for voice controlled video screen display
WO1998008215A1 (en) * 1996-08-19 1998-02-26 Virtual Vision, Inc. Speech recognition manager
EP0854417A2 (en) * 1997-01-06 1998-07-22 Texas Instruments Inc. Voice activated control unit

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"WAP Architecture", WAP FORUM, 26 April 1999 (1999-04-26), pages 1 - 20, XP002101098 *
ERLANDSON C ET AL: "WAP-the wireless application protocol", ERICSSON REVIEW, 1998, L M ERICSSON, SWEDEN, vol. 75, no. 4, ISSN 0014-0171, pages 150 - 153, XP002101097 *

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9076448B2 (en) 1999-11-12 2015-07-07 Nuance Communications, Inc. Distributed real time speech recognition system
US7657424B2 (en) 1999-11-12 2010-02-02 Phoenix Solutions, Inc. System and method for processing sentence based queries
US7702508B2 (en) 1999-11-12 2010-04-20 Phoenix Solutions, Inc. System and method for natural language processing of query answers
US7729904B2 (en) 1999-11-12 2010-06-01 Phoenix Solutions, Inc. Partial speech processing device and method for use in distributed systems
US9190063B2 (en) 1999-11-12 2015-11-17 Nuance Communications, Inc. Multi-language speech recognition system
US6633846B1 (en) 1999-11-12 2003-10-14 Phoenix Solutions, Inc. Distributed realtime speech recognition system
US6665640B1 (en) 1999-11-12 2003-12-16 Phoenix Solutions, Inc. Interactive speech based learning/training system formulating search queries based on natural language parsing of recognized user queries
EP1137242A1 (en) * 2000-03-24 2001-09-26 Alcatel Telecommunication system, terminal and network for vocal commanding
FR2810823A1 (en) * 2000-06-27 2001-12-28 Canecaude Emmanuel De E mail output screen WAP mobile telephone information transmission having network internet network bridge connected and speech information transcription written format accessible digital information.
GB2364480A (en) * 2000-06-30 2002-01-23 Mitel Corp Initiating a WAP session using voice recognition
US7103550B2 (en) 2000-06-30 2006-09-05 Mitel Networks Corporation Method of using speech recognition to initiate a wireless application protocol (WAP) session
GB2364480B (en) * 2000-06-30 2004-07-14 Mitel Corp Method of using speech recognition to initiate a wireless application (WAP) session
AU784816B2 (en) * 2000-07-13 2006-06-29 Gtech Global Services Corporation Limited Mixed-mode interaction
EP1770963A2 (en) * 2000-07-13 2007-04-04 GTECH Global Services Corporation Limited Mixed-mode interaction
EP2271056A1 (en) * 2000-07-13 2011-01-05 Aeritas LLC Mixed-Mode Interaction
EP1770963A3 (en) * 2000-07-13 2007-06-20 GTECH Global Services Corporation Limited Mixed-mode interaction
WO2003019905A1 (en) * 2000-07-13 2003-03-06 Impulsity, Inc. Mixed-mode interaction
US6925307B1 (en) 2000-07-13 2005-08-02 Gtech Global Services Corporation Mixed-mode interaction
US7610547B2 (en) 2001-05-04 2009-10-27 Microsoft Corporation Markup language extensions for web enabled recognition
WO2003030507A3 (en) * 2001-10-03 2004-05-27 Accenture Global Services Gmbh Multi-modal messaging and callback with service authorizer and virtual customer database
US7640006B2 (en) 2001-10-03 2009-12-29 Accenture Global Services Gmbh Directory assistance with multi-modal messaging
US7472091B2 (en) 2001-10-03 2008-12-30 Accenture Global Services Gmbh Virtual customer database
US7441016B2 (en) 2001-10-03 2008-10-21 Accenture Global Services Gmbh Service authorizer
US7254384B2 (en) 2001-10-03 2007-08-07 Accenture Global Services Gmbh Multi-modal messaging
US7233655B2 (en) 2001-10-03 2007-06-19 Accenture Global Services Gmbh Multi-modal callback
US8073920B2 (en) 2001-10-03 2011-12-06 Accenture Global Services Limited Service authorizer
US8527421B2 (en) 2001-10-03 2013-09-03 Accenture Global Services Limited Virtual customer database
WO2003030507A2 (en) * 2001-10-03 2003-04-10 Accenture Global Services Gmbh Multi-modal messaging and callback with service authorizer and virtual customer database
US7149287B1 (en) 2002-01-17 2006-12-12 Snowshore Networks, Inc. Universal voice browser framework

Also Published As

Publication number Publication date
EP1116373A1 (en) 2001-07-18
US7212970B2 (en) 2007-05-01
US20010047258A1 (en) 2001-11-29
JP4067276B2 (en) 2008-03-26
JP2002525689A (en) 2002-08-13
DE69814181D1 (en) 2003-06-05
DE69814181T2 (en) 2004-03-04
ATE239336T1 (en) 2003-05-15
AU1025399A (en) 2000-04-10
EP1116373B1 (en) 2003-05-02
ES2198758T3 (en) 2004-02-01

Similar Documents

Publication Publication Date Title
US7212970B2 (en) Method and system of configuring a speech recognition system
US20030125023A1 (en) Method and system for providing a wireless terminal communication session integrated with data and voice services
US6594347B1 (en) Speech encoding in a client server system
US7555536B2 (en) Apparatus and methods for providing an audibly controlled user interface for audio-based communication devices
US8285213B2 (en) Profile and capability of WAP-terminal with external devices connected
KR100566014B1 (en) Methods and devices for voice conversation over a network using parameterized conversation definitions
US7707036B2 (en) Voice review of privacy policy in a mobile environment
WO2003063137A1 (en) Multi-modal information delivery system
US20040172254A1 (en) Multi-modal information retrieval system
US20060064499A1 (en) Information retrieval system including voice browser and data conversion server
US6999755B2 (en) Method and device for providing information of unfinished call
WO2006014546B1 (en) Method and apparatus for cordless phone and other telecommunications services
US7836188B1 (en) IP unified agent using an XML voice enabled web based application server
US5594721A (en) Method and system for implementing an application protocol in a communication network
EP1371174A1 (en) Method and system for providing a wireless terminal communication session integrated with data and voice services
WO2000030329A1 (en) A data access system and method
KR100372007B1 (en) The Development of VoiceXML Telegateway System for Voice Portal
CA2338821A1 (en) Performing interactive service dialogs in a telecommunication network
KR20030022123A (en) Method and System for Providing a Wireless Terminal Communication Session Integrated with Data and Voice Services
US7801958B1 (en) Content converter portal
KR101056589B1 (en) Home network control service method using voice recognition function
CN1470125A (en) Method for the administration and establishment of services in a switched system
US8644465B2 (en) Method for processing audio data on a network and device therefor
WO2004100593A1 (en) A method of and apparatus for transmitting position data to a receiver
KR20020028259A (en) Network game system and method therefor by using speech recognition technology

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GD GE GH GM HR HU ID IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG US UZ VN YU ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 09809808

Country of ref document: US

WWE Wipo information: entry into national phase

Ref document number: 1998952622

Country of ref document: EP

ENP Entry into the national phase

Ref country code: JP

Ref document number: 2000 571437

Kind code of ref document: A

Format of ref document f/p: F

WWP Wipo information: published in national office

Ref document number: 1998952622

Country of ref document: EP

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

WWG Wipo information: grant in national office

Ref document number: 1998952622

Country of ref document: EP