WO1999005671A1

WO1999005671A1 - Universal voice operated command and control engine

Info

Publication number: WO1999005671A1
Application number: PCT/US1998/015213
Authority: WO
Inventors: Sean R. Garratt
Original assignee: Knowles Electronics, Inc.
Priority date: 1997-07-24
Filing date: 1998-07-23
Publication date: 1999-02-04
Also published as: TW409226B; AU8581498A

Abstract

The present invention is a system for controlling graphical user interface by voice commands. The present invention constitutes a means for receiving issued voice commands from a standard voice recognition system (18), a means for monitoring the state of a target application (16), a means for determining active voice commands from the state of the target application (12), a means for determining whether issued voice command is an active voice command, a means for associating each active voice command with a block of script code data (14), a means for issuing the block of script code data associated with the issued voice command to the graphical user interface when the issued voice command is determined to be an active voice command.

Description

UNIVERSAL VOICE OPERATED COMMAND AND CONTROL ENGINE

This is application claims the priority of U.S. Provisional Application Serial Number 60/053,621 filed July 24, 1997.

Technical Field

The present invention relates generally to software for a computer system which provides voice operated control of a graphical user interface such as a Microsoft WINDOWS based environment.

More specifically, the present invention relates to a system which takes standard Microsoft Speech Application Programming Interface compliant commands from a speech recognition software package and directs them to a target application in order to control that target application.

Background of the Invention

Generally, there are two applications of speech recognition technology in the computer speech recognition art: 1) dictation applications and 2) command and control applications. Dictation applications recognize the speech of a computer user in order to reproduce those words in a computer software application, such as a word processor. Therefore, a user may dictate a letter to the computer as opposed to manually typing it out. Command and control applications, on the other hand, recognize the speech of a computer user in order to operate the computer itself. Therefore, a user may issue commands to a computer, such as to execute a program, save a file, or to change the font of the letter being dictated, rather than manually use a keyboard or mouse to issue the command. Speech recognition used to perform either of these two tasks can greatly increase the productivity of a computer user.

Voice recognition software, such as IBM's VOICETYPE or Dragon Systems' NATURALLY SPEAKING, is well known in the art. Generally, voice recognition software, in conjunction with a computer system, converts an analog voice signal into digital data capable of interacting with a computer system. Software also exists which enables computer users to make event calls to the operating system that simulate actions by well-known computer input devices such as a mouse or a keyboard, thereby allowing computer users to interact with the graphical user interface via voice control.

However, no computer software application exists which allows a user to create new voice commands or change the behavior of current voice commands by using standard scripting languages. The ability to use standard scripting languages allows the creation of local and global variables which can be shared by different voice commands in order to create more comprehensive voice commands. Additionally, no computer software application exists which will: 1) continually monitor an operating system or software application state, and its subwindow state, to be voice controlled for a listing of voice commands which may be validly issued and 2) dynamically maintain the listing of voice commands as the state of the operating system or software to be controlled changes. The tracking of the state of the operating system or software application to be controlled also allows commands to be issued based on the state of the operating system or software application. Summary of the Invention The present invention is directed to computer software for converting spoken commands into commands capable of directing graphical user interfaces of any computer application for a particular computer system.

Specifically, the present invention comprises a means for receiving issued voice commands from a standard voice recognition system, a means for monitoring the state of a target application, a means for determining active voice commands from the state of the target application, a means for determining whether the issued voice command is an active voice command, a means for associating each active voice command with a block of script code data, and a means for issuing the block of script code data associated with the issued voice command to the graphical user interface when the issued voice command is determined to be an active voice command.

Further, the present invention will be an improvement over current technology based on its methodology of processing events within the accompanying operating system and its ability to adapt to virtually any Microsoft WINDOWS based application. The present invention accomplishes this by using industry-standard scripting languages. Additionally, the present invention monitors target applications to determine which voice commands may be validly issued based on the target application state. The present invention does this in order to prevent invalid commands from being issued to the target application regardless of whether the original application was intended to be controlled through voice commands.

Other advantages and aspects of the present invention will become apparent upon reading the following description of the drawings and detailed description of the invention.

Brief Description of the Drawings

Figure 1 is a chart showing the interrelation of the present invention with external components;

Figure 2 is a chart displaying the internal operation of the invention;

Figure 3 is a plan view of a desktop icon according to the present invention; Figure 4 is a plan view of a configuration dialog according to the present invention; Figure 5 is a plan view of a voice command dialog according to the present invention;

Figure 6 is a chart showing the contents of a directive module data file according to the present invention. Figure 7 is a plan view of a directive module editor according to the present invention;

Detailed Description

While this invention is susceptible of embodiment in many different forms, there is shown in the drawings and will herein be described in detail a preferred embodiment of the invention with the understanding that the present disclosure is to be considered as an example of the principles of the present invention and is not intended to limit the broad aspect of the invention to the embodiments illustrated. Referring to Figure 1 , generally, a Universal Voice Operated

Command and Control Engine 10 consists of two primary components: a Voice Control Engine ("VCE") 12 and a Directive Module ("DM") 14. In order for a user to control a target application 16 by a voice command, the user issues the voice command which is then passed to Voice Control software 18 running on a computer 20 with a microphone 22 connected. The Voice Control software 18 then recognizes the speech pattern and passes the result to the Universal Voice Operated Command and Control Engine 10. In the present invention, the Voice Control software 18 can be any commercially available voice recognition software which adheres to Microsoft's

Speech Application Programming Interface ("SAPI") standards such as Dragon Systems^* NATURALLY SPEAKING and IBM's VOICETYPE. SAPI compliance means that a users choice of speech recognition software is transparent to the VCE 12, providing the vendor of the speech recognition software adheres properly to the SAPI standards.

As will be explained in further detail below, the VCE 12 then determines if the command is proper by querying the target application 16 to determine its present state. If the voice command is proper, the Universal Voice Operated Command and Control Engine 10 executes a section of computer code contained within the DM 14 corresponding the particular target application 16 in order to control the target application 16 for executing the voice command. The computer code that is executed is preferably written in Visual Basic for Applications ("VBA") code, although other scripting languages such as Javascript can be used. Voice Control Engine

Referring to Figure 2, the VCE 12 is the run-time software for receiving information from the Voice Control software 18, dispatching synthetic keyboard and mouse messages to the Operating System ("OS") or target application 16, and monitoring the current target application state 16.

Specifically, the VCE 12 interfaces with the Voice Recognition software 18 through a standard SAPI interface 24 with a SAPI controller 26. The SAPI controller 26 comprises interface code to initialize a conversation with the SAPI interface 24, to monitor a success state of the SAPI interface 24, and produce notification callbacks from the SAPI interface 24. The success state of the SAPI interface 24 indicates whether a command transmitted through the SAPI interface 24 was successful. The SAPI controller also receives updated active voice commands from an Active Voice Command Updater 28 as the state of the target application 16 changes. Once a conversation is initialized by the SAPI controller 26, the SAPI controller 26 receives commands from the Voice Recognition software 18 and compares the voice command received from the SAPI interface to the commands provided by the Active Voice Command Updater 28. If the received voice command matches a command provided by the Active Voice Command Updater 28, then it is a presently valid voice command, and the SAPI controller 26 passes the command to a Recognized Voice Command Handler 30. The Recognized Voice Command Handler 30 then forwards the command to a DM Parser and Indexer 32 if the voice command is "simple" and contains no variable data. However, if the voice command contains variable data, such as

"Set font size to {number}," the Recognized Voice Command Handler 30 preprocesses the voice command before passing it to the DM Parser and Indexer 32. If the received voice command does not match a command provided by the Active Voice Command Updater 28, then it is a presently not valid voice command, and the SAPI controller 26 can either take no action or return a message to the SAPI Interface 24 that the command was not valid.

The DM Parser and Indexer 32 then takes the voice command and, using the DM 14 corresponding to the active application, determines the section of scripting code corresponding to the voice command received, and issues the scripting code to the Message Dispatcher 34. The Message Dispatcher 34 then issues pseudo mouse and/or keyboard messages to the Operating System or target application 16. The Message Dispatcher 34 accomplishes this through standard WIN32 API calls such as SendMessageO or PostMessage().

The DM Parser and Indexer 32 also passes the list of the active voice commands to the Active Voice Command Updater 28 when the DM Parser and Indexer receives an updated list of active voice commands from a Current Application State Monitor 36. The Current Application State Monitor continuously polls the active target program

16 to determine its state and maintains the list of active voice commands. The Current Application State Monitor 36 does this by determining which dialog and form is open in the target application 16 at the present time and sending the commands that may be validly issued to the DM Parser and Indexer 32.

A DM Loader/Container 38 holds DMs 14 for parsing and indexing by the DM Parser and Indexer 32, and loads, or unloads, DMs 14 whenever the DM Loader/Container 38 recognizes a new target application 16 has loaded, or unloaded, for which a DM 14 is available. The DM Loader/Container 38 retrieves the DMs 14 from an electronic media storage 40, such as a hard drive, of the computer 20.

As has been described, the VCE 12 has no controls which are visible to the user of the computer 20. However, as in the Windows 95 platform, preferably a small icon 42 is visible in the system tray 44, as shown in Figure 3, to indicate to the user that the VCE 12 has been loaded. Additionally, the icon may be clicked with the left mouse button to show a small menu of possible command options, such as: an option for showing a User Interface 46 for displaying the list of active commands, a stop voice commands option, a start voice commands option, an option to open a User-Definable Variables dialog 48 (Figure 4), an option for online help, and an option to unload the VCE 12.

The User-Definable Variables dialog 48, as shown in Figure 4, includes an option to require confirmation of every voice command, a user definable confirmation statement for confirmation, an option for showing the list of active commands dialog every time the VCE 12 initially loads, an option to open the Voice Recognition software's 18 recognition parameters window, an option to run an SAPI microphone variable adjustment dialog, an option to set the maker of the Voice Recognition software 18, and a listing of installed DMs 14 and means for disabling any one of the installed DMs 14. Finally as explained above, the VCE's 12 User Interface 46 dialog, as shown in Figure 5, contains a list of active commands for which voice commands may be validly issued. The User Interface 122 dialog also has provisions to select an active command with a mouse or keyboard in order to issue the command, rather than issue it by voice. Through a pull-down menu and/or toolbar there is provided: an option to sort the list of commands alphabetically or by most used commands first, an option to keep the dialog visible even when the dialog is not the active, a smart locate function for the dialog, and an option to hide the window. In the smart locate function, the dialog attempts to locate itself to a portion of the screen where it does not block the user's view. Directive Module Format

As explained above, each DM 14 is a separate data file which contains information about a respective target application. The layout of a DM 14 is shown in Figure 6. The DM 14 contains Header data, Form Template data, and Script Code data. The Header data contains information such as the creation date and version of the DM 14, data about the target application 16, such as target application name, version, target application executable file location, etc., a pointer to the beginning location of the Form Template Data, and a pointer to the beginning location of the Script Code Data. Within the Form Template data, information about an application's dialogs and forms is stored for use by the VBA script code, such as locations of command buttons, contents of list boxes, etc. within each individual form or dialog. Within the Script Code section is the scripting code to perform the voice commands within the target application 16. The script code is passed to the target application through standard WIN32 API calls, such as SendMessageO or PostMessageQ. Additionally, the most preferred scripting language is VBA, however, Javascript, or any other scripting language, could be implemented. Directive Module Editor

Additionally, as shown in Figure 7, there is shown a Directive Module Editor ("DME") 100 application. The DME 100 is not needed in order to implement the function of the VCE 12, but is used as a development tool in order to edit DMs 14. The DME 100 includes the ability to load DMs 14 for the purpose of adding or removing voice commands to a DM 14, and the scripting code associated with that command. The DME 100 preferably comprises: a command window 102 in which available commands are listed, a form template window 104 in which the form or dialog box of the target application to be controlled is shown, and a scripting code window 106. A user may then edit DMs 100 to include more complex commands or sets of commands to add more functionality to a single voice command. While the specific embodiments have been illustrated and described, numerous modifications come to mind without significantly departing from the spirit of the invention and the scope of protection is only limited by the scope of the accompanying Claims.

Claims

I CLAIM:

1. A system for controlling a graphical user interface comprising: means for receiving a voice command; means for determining active voice commands; means for determining whether the received voice command is an active voice command; and, means for executing a received voice command if the received voice command is an active voice command.

2. The system of claim 1, wherein the means for executing a received voice command comprises: means for associating voice commands with a block of script code data; and, means for issuing the block of script code data associated with the received voice command to the graphical user interface when the received voice command is determined to be an active voice command.

3. The system of claim 2 wherein: the means for determining active voice commands includes the ability to determine variable data contained within the target application; and, the means for issuing the block of script code data includes the ability to incorporate the variable data within the script code data.

4. The system of claim 1, wherein the means for receiving issued voice commands is a standard voice recognition system.

5. The system of claim 4, wherein standard voice recognition system is a SAPI compliant voice recognition system.

6. The system of claim 1, wherein the means for determining active voice commands comprises a means for monitoring the state of a target application.

7. A system for controlling a graphical user interface comprising: a directive module comprising script code data; a voice control engine comprising: means for receiving a voice command; means for determining active voice commands; means for determining whether the received voice command is an active voice command; means for executing a received voice command if the received voice command is an active voice command.

8. The system of claim 7, wherein the means for executing a received voice command comprises: means for associating voice commands with the block of script code data; and, means for issuing the block of script code data associated with the received voice command to the graphical user interface when the received voice command is determined to be an active voice command.

9. The system of claim 8, wherein: the means for determining active voice commands includes the ability to determine variable data contained within the target application; and, the means for issuing the block of script code data includes the ability to incorporate the variable data within the script code data.

10. The system of claim 7, wherein the means for receiving issued voice commands is a standard voice recognition system.

11. The system of claim 10, wherein the standard voice recognition system is a SAPI compliant voice recognition system.

12. The system of claim 7, wherein the means for determining active voice commands comprises: means for monitoring the state of a target application; and, means for comparing the state of a target application to the script code data within the directive module to obtain the active voice commands.