GB2371669A

GB2371669A - Control of apparatus by artificial speech recognition

Info

Publication number: GB2371669A
Application number: GB0116304A
Authority: GB
Inventors: Robert William Series
Original assignee: 20 20 Speech Ltd
Current assignee: 20 20 Speech Ltd
Priority date: 2001-07-03
Filing date: 2001-07-03
Publication date: 2002-07-31
Anticipated expiration: 2021-07-03
Also published as: GB0116304D0; GB2371669B

Abstract

A control system comprises an Automatic Speech Recognition (ASR) module (14) for processing and acting upon verbally-issued control commands, a speech generation module (20), and a detection module (12) for detecting operation of manual controls of an apparatus and applying control signals to the speech generation module, in which, upon detection of operation of a manual control to perform a specific control action, the detection module applies signals to the speech generation module to cause the speech generation module to generate a speech output that contains an utterance that would, if issued by an operator, cause the ASR module to perform the specific control action.

Description

Control of apparatus by artificial speech recognition

This invention relates to control of apparatus by artificial speech recognition. Use of automatic speech recognition (ASR) to control apparatus is becoming more widespread. For example, ASR has been employed to enable an operator to control functions of aircraft, vehicles, and industrial plant, amongst many other applications. For the most part, the intention is to make the operator's task less arduous by supplementing (yet often not supplanting) conventional manually operated controls.

As ASR-based control systems become ever more complex, so the task of learning to use them has become ever more difficult. In many cases, an operator cannot refer to written material to learn the commands of an ASR system because their close attention is required elsewhere (for example, in maintaining control of a vehicle). One problem is that an ASR system has a repertoire of phrases that an operator must learn to issue without assistance, much like a command-line interface in a computer system.

A number of approaches have been adopted with a view to assisting a user in the use of such a system. The simplest is to configure the ASR system so that many phrases lead to equivalent actions. For example'radio station 3', 'radio program 3','radio 3', 'radiotune station 3'or'radio 3'may all result in the same action. More complex systems include provision of various degrees of what is commonly termed natural language processing.

At its simplest this may rely on spotting of certain words or phrases in what the users says. More complex systems extract words or phrases and attempt to relate them to the desired actions, possibly with reference to the existing state of the equipment. However the added complexity of a such systems can lead to the user becoming confused, especially if he uses an idiom or word which is not in the systems'vocabulary (perhaps use of the word'wireless'in place of'radio').

Another approach is to equip the system with additional commands that a user can issue to request help, such as'how do I use the radio'or to include a specific help button to

allow the user to ask for spoken instruction on the correct phrase. However these strategies rely on the user explicitly requesting help and knowing that such a facility exists.

Further complications arise since the equipment may be used by large numbers of people, possibly speaking different languages. For example a car may be privately owned and used mainly by the owner, or it may be a hire car used by many people who may speak several different languages. (For example, in Belgium the driver may speak French, Flemish, or English.) An aim of this invention is to provide an ASR-based control system that assists an operator in the task of learning to use the system.

From a first aspect, this invention provides an ASR-based control system operative to complement manual controls in operation of apparatus, the control system comprising an ASR module for processing and acting upon verbally-issued control commands, a speech generation module, and a detection module for detecting operation of manual controls of an apparatus and for applying control signals to the speech generation module, in which, upon detection of operation of a manual control to perform a specific control action, the detection module applies signals to the speech generation module signals to cause the speech generation module to generate a speech output that contains an utterance that would, if issued by an operator, cause the ASR module to perform the specific control action.

Such a system tells the operator the words that should be said in order to substitute a verbal instruction for a specific manual action. This is an effective way in which to learn, because the operator is always provided with information that has direct contextual relevance to the manual action that the operator has just performed. As is well-known, it is much more effective to provide instruction that is contextually relevant than simply to provide a list of instructions to be learned by rote.

Advantageously, the system is configured to conditionally cease to generate speech output in respect of one or more control actions. For example, this may be applied once an operator has learned a repertoire of verbal commands. This ensures that the benefit of the system will not become an irritation for an experienced operator. Moreover, it is likely that an operator will learn the most commonly used verbal commands sooner than those that are used less frequently. Therefore, it is particularly advantageous to provide a system configurable such that for selected command actions (a subset of the complete repertoire) a speech output will not be generated. Most advantageously, the system is arranged such that on receipt of a specific verbal command a speech output will not be generated for one or more specific control action.

In a development of the arrangement described in the last-preceding paragraph, the system is configured to apply an analysis to the pattern of use of manual and vocal commands issued by an operator and to generate or suppress a speech output on the basis of that analysis. The analysis may be based on heuristic rules that relate to patterns of an operator's behaviour. For example, the system may infer that specific behaviour indicates that an operator does not wish to use vocal commands in respect of a specific control action. This could, for instance, be inferred after an operator has been informed of the appropriate utterance to perform a command action a specific number of times, yet continues to use a manual control to perform the control action.

A system embodying the invention can advantageously determine the identity of a specific operator and tailor its operation accordingly. The identity may be explicitly entered by the user, or be determined by analysis of the user's voice against stored templates of known users.

In applications in which it is likely that more than one language may be spoken by any individual user, the system may change the language used by the recognition system and the prompting system to match the language being used by the user. For example commands issued by a user may be processed by a language identification module to determine the language being spoken. Language identification may be carried out either as a separate process or combined with the ASR system. Alternately the system may determine the language from analysis of speech that is not explicitly spoken as a command to the system.

In order that the system does not respond to utterances that are not issued as commands, for example, utterances that are part of a conversation, it may be provided with an alerting control operable by a user to alert the system to expect a command utterance.

This alerting control may be a manually-operable control (a so-called"press-to-talk" button). It might alternatively or additionally be incorporated into the ASR function of the control system; for example, the user may be required to utter an attention keyword in advance of a command utterance.

A control system according to this aspect of the invention may be incorporated into a conveyance to control one or more of its functional components. The conveyance might be a land vehicle (water, road or rail-bore for example) or an aircraft.

From a second aspect, this invention provides a method of operating an ASR-based control system in which an operator can perform a control action by issuing a verbal command or by operation of a manual control, the method comprising detecting a control function performed by a manual control and generating a speech output that includes an utterance that, if issued by an operator, would carry out the control action under voice control.

In such a method, the speech output is typically generated only in respect of a subset of a repertoire of command actions. The said subset may be selected in response to an operator's interaction with the control system, the intention being to avoid generating a speech output in an instance in which it is not required. For example, speech output might not be generated in respect of command actions that are inferred to be known to an operator of the system. Alternatively or additionally, speech output might not be generated in respect of command actions in response to a request by an operator. Since different operators may have different preferences, the method advantageously further includes a step of identifying an operator of the system and modifying its operation accordingly.

In order that utterances made incidentally (for example, as part of a conversation) in a method embodying the invention an utterance may be interpreted as being a command only if it is made after or during the operation of an alerting control In an enhancement, a method embodying the invention may include identification of a case in which an operator has unsuccessfully attempted to issue a voice command, and generating a speech output containing an utterance that, if spoken by the operator, would issue the command. To help in distinguishing between a failed attempt to use a

command and an incidental utterance, only an utterance that is not recognised as a command made after or during operation of an alerting control may be considered to be indicative of an unsuccessful attempt to issue a voice command.

Embodiments of this aspect of the invention may advantageously be applied to control of one or more functional systems in a conveyance, such as a land vehicle or an aircraft.

An embodiment of the invention will now be described in detail, by way of example and with reference to the accompanying drawings, in which: Figure 1 is a schematic diagram of a system embodying the invention; and Figure 2 shows diagrammatically the interior of a motor car that incorporates a control system embodying the invention.

With reference first to Figure 1, a control system embodying the invention comprises a control module 10. The control module 10 is configured to control operation of objects (not shown). For example, where the control system is installed in a vehicle, the controlled objects may include such equipment as windscreen wipers, lights, an electronic gearbox controller, and so forth. Where the control system is installed in an industrial plant, the controlled objects may include such items as valves, pumps, conveyors and so forth. The control system might also be installed to control a piece of computer software, in which case, the controlled objects might be program functions or configurations.

The internal nature of the control module 10 is not limited by the invention. For example, the control module might be a software module executing on a computer system. It might equally well be a hardware system incorporating components such as relays and solid-state switches.

The control module 10 acts to modify operation of controlled apparatus based upon input signals. These input signals can be received from two sources; a manual input module 12 and an automatic speech recognition (ASR) module 14.

The ASR module 14 receives audio input signals from an input system 16 that will typically include a microphone for detecting vocal utterances of an operator. The audio

input signals are analysed and classified by the ASR module 14 to determine whether they include any of a repertoire of vocal commands that can be issued by an operator in order to perform a control function. Internally the ASR module 14 may be configured in a manner most suitable for the task in hand. At the present time, this is most likely to include a recognition engine based on a hidden Markov model, but this may not be the case, for example if more appropriate recognition technologies become available in future. The ASR system may optionally include a press to talk switch 18 which the user may press to indicate the start of a voice command, or hold depressed for the full duration of the command.

When the ASR module 14 recognises a command within an utterance of an operator, it sends control signals to the control module 10 instructing the control module 10 to carry out control functions to implement the control action associated with the recognised command. The ASR module 14 is further configured to recognise control utterances that modify operation of the control system itself. Upon recognition of such utterances, control signals may be sent to other components of the system, in particular a speech generation stage 20 that will be described below.

The manual input module 12 provides manual controls with which an operator can interact manually in order to control operation of the controlled apparatus. These controls might be embodied in hardware, for example, as switches, pushbuttons and so forth; or they could be virtual controls appearing in a computer graphical user interface, amongst many other possibilities.

The manual input module 12 operates by interpreting an operator's interaction with the controls, and sending control signals to the control module 10 instructing it to carry out control actions associated with the operator's actions. In addition, the manual input module 12 sends control signals that identify the command action that has been performed to a speech generation module 20.

The internal structure of the speech generation module can take many forms. If might contain recordings of utterances corresponding to the command actions, but this is likely to be applicable only to systems that have a relatively small number of commands in their repertoire. More typically, the speech generation module will include a speech

synthesiser that can generate essentially arbitrary utterances, or a system that can concatenate appropriate pre-recorded utterances. In any case, signals generated by the speech generation module 20 are reproduced by a reproduction system 22 that will typically include a loudspeaker in order that an operator can hear them.

Upon receipt of the control signals from the manual input module 12, the speech generation module 20 identifies the command function associated with the signals, and then applies rules to determine what action it should take. For example, the signal received from the manual input module 12 may include a numerical identifier that can be used to index a look-up table contained within a memory unit of the speech generation module, the look-up table containing encoded speech that can be reproduced by a speech synthesiser.

In a most basic mode of operation, the action taken is to generate a speech output that corresponds to an utterance that could be issued by an operator verbally in order to have performed the control action. Thus, every time the operator uses a manual control, the system issues an audible prompt to inform the operator of the utterance that would have effected the control action under voice control.

In a more sophisticated mode of operation, the action taken by the speech generation module 20 is first to determine whether any audible output should be issued. In this embodiment, the speech generation module 20 associates a flag indicator with each command function in the repertoire. If the flag has an affirmative value, then an audible output is issued. If the flag has a negative value, then no audible output is issued.

The value of the flag can be determined by various rules. The following is an example of one such set of rules.

All flags may initially be set to the affirmative value. Then, the value can be changed in response to a variety of events: The operator may issue a control utterance promptly after operating the manual control to indicate that a verbal prompt is no longer required for the associated control action. The control utterance is detected by the ASR module 14, which then sends a suitable signal to the speech generation module 20. The speech generation module 20 sets the control flag for the command action to the negative value.

* The speech generation module 20 may maintain a count of the number of times that a manual control is used. After a threshold value is reached, an assumption is made that the operator prefers to use the manual control for the control action concerned, and that no further prompts. The speech generation module 20 then sets the control flag for the command action to the negative value. w In the event that the ASR module 14 determines that an operator has made a verbal utterance that is not recognised as a verbal command within the repertoire and subsequently uses a manual command to perform a control action, it is assumed that the operator has mistakenly issued an incorrect verbal command.

(Optionally, such an assumption may not be made until this has happened several times, or the user has pressed the press to talk switch 18 and no valid command has been spoken or detected. ) The speech generation module 20 then sets the control flag for the command action to the affirmative value in order that the next time the operator uses the manual control, the operator is prompted with the correct verbal utterance.

Whether or not a particular audible prompt is given is a decision that must be taken with knowledge of the identity and preferences of the operator concerned. Therefore, it is preferable that the system can identify the operator in some manner. For example, the system may prompt an operator to identify himself or herself vocally by saying their name. Alternatively, the system may identify an operator by identifying a token used by the operator as, for example, a car key. Alternatively the system may identify the user by comparison of his or her voice with appropriately stored templates or models. In such an implementation, the system may store models or templates of new speakers for future use. A system may use a combination of any of the above described means, or yet further means for identifying a user.

In addition to determining the identity of the speaker, the system may determine the preferred language of the user. This may be through a manual setting, or by analysis of the user's speech. For example the ASR module 14 may be configured to recognise both

French and English phrases. When a valid command is recognised, the ASR module 14 outputs both control information to the control module 10 to specify the desired action and control information to the speech generation module 20 to indicate the language to be used for future prompts. Alternately the language detection may be carried out by a separate language detection module.

Figure 2 illustrates a more specific example of the system described above, as applied to a motor car.

A control system embodying the invention might be applied to control operation of many devices in a car, including, but not limited to, lights, windscreen wipers, electric window or mirror adjustment, and automatic gearbox control.

Within the car, a set of conventional controls is provided including a control for windscreen wipers 30, a light switch 32, mirror adjustment 34, window controls 36 and an automatic gearbox control 38. Each of these controls incorporates electrical switches that can open or close input circuits to an electrical system unit 40.

In this embodiment, the various modules described with reference to Figure 1 are incorporated into the system unit 40 that can be placed in a convenient location within the vehicle. The manual input module 12 receives signals from the manual controls 30... 36. The control module 10 within the system unit 40 acts to control the various pieces of the equipment within the car in response to signal it receives from the manual input module 12.

A microphone 42 is provided within the passenger cabin of the car, positioned to receive utterances made by the driver of the vehicle. The microphone 42 is connected to the system unit 40 such that signals from the microphone 42 are fed as an input to the ASR module 14 within the unit 40. As described above, the ASR module 14 can apply control signals to the control module 10.

Within the vehicle cabin is a loudspeaker 44. This may be a component of an in-car entertainment installation, or may be a loudspeaker dedicated for use with the control system. The loudspeaker reproduces signals generated by the speech generation module 20 within the system unit 40. Consider now how a driver may interact with the vehicle. An experienced driver can control equipment on the vehicle by issuing verbal commands that are picked up by the microphone 42. For example, to open a window, the driver can say"open driver's side window", and the control system will respond accordingly.

On the other hand, a driver that is a novice operator of the car can use the manual controls with which they will be familiar. Now, say that the driver switches on the windscreen wipers to normal speed using the manual control 30. This action is detected by the manual input module 12, which sends a signal to the control module 10, which in turn switches on the windscreen wipers.

The manual input module 12 also sends a signal to the speech generation module 20 that identifies the manual action that the driver has taken. The first step is to check whether the particular action is flagged for audible prompting. If it is, the speech generation module looks obtains and reproduces the encoded speech that corresponds to that action. For example, if the verbal instruction would have been"wipers on", that is the phrase that will be produced by the speech generation module.

Now, suppose that the driver prefers to use manual control of the automatic gearbox. The driver moves the gearbox control 38, and the control system responds by reproducing the utterance"gearbox in drive". The driver then immediately says a control word, for example"manual". The ASR module 14 sends a control signal to the speech generation module 20 that identifies the manual command action that was performed, and that instructs the speech recognition module to flag the command as being exempt from verbal prompting. There may optionally be a further control word, for example,"prompt"that an operator can issue to turn voice prompting back on for a particular control action.

It may be that the driver does not know the control word necessary to suppress verbal prompts, yet wishes, for example, to operate the lights only by manual control. Rather than continuing to issue voice prompts, the speech generation module 20 will only issue the prompt a predetermined number (e. g. five) times. After that, the command action will be flagged as being exempt from verbal prompting.

The ASR module may detect that the driver has made an utterance that has the general form of a command, for example a short utterance, yet this is not recognised as an actual command. If this is followed by a manual command action then it may be inferred that the operator has attempted to use a vocal command unsuccessfully. For example, the driver may say"open window"instead of the proper utterance"window down"and then use the manual control 36 to operate the window. This will present a problem if the flag for the command action in the speech generation module is set to the negative value, since the driver will not be prompted with the correct command.

If this is the case, the ASR module will send a control signal to the speech generation module (which will also receive a signal from the manual control module). In response to this, the speech generation module will set the flag for the command action to affirmative, and issue a prompt.

Claims

Claims 1. An ASR-based control system operative to complement manual controls in operation of apparatus, the control system comprising an ASR module for processing and acting upon verbally-issued control commands, a speech generation module, and a detection module for detecting operation of manual controls of an apparatus and for applying control signals to the speech generation module, in which, upon detection of operation of a manual control to perform a specific control action, the detection module applies signals to the speech generation module signals to cause the speech generation module to generate a speech output that contains an utterance that would, if issued by an operator, cause the ASR module to perform the specific control action.
2. An ASR-based control system according to claim 1 configured to conditionally cease to generate speech output in respect of one or more control actions.
3. An ASR-based control system according to claim 2 configurable such that, for a subset of a complete repertoire of control actions, a speech output will not be generated.
4. An ASR-based control system according to claim 1 or claim 2 arranged such that on receipt of a specific verbal command the system is configured such that a speech output will not be generated for one or more specific control action.
5. An ASR-based control system according to any one of claims 2 to 4 configured to apply an analysis to the pattern of use of manual and vocal commands issued by an operator and to generate or suppress a speech output on the basis of that analysis.
6. An ASR-based control system according to claim 5 in which the analysis may be based on heuristic rules that relate to patterns of an operator's behaviour.
7. An ASR-based control system according to claim 6 which system infers that specific behaviour may indicate that an operator does not wish to issue a vocal command in respect of a specific control action.
8. An ASR-based control system according to claim 7 in which a speech output for a given control action is generated only a predetermined maximum number of times.
9. An ASR-based control system according to any preceding claim operative to determine the identity of a specific operator and tailor its operation accordingly.
10. An ASR-based control system according to any preceding claim operative to determine the language being used by the operator and tailor its operation accordingly.
11. An ASR-based control system according to any preceding claim comprising an alerting control operable by a user to alert the system to expect a command utterance.
12. An ASR-based control system according to claim 11 in which the alerting control is a manually-operable control.
13. An ASR-based control system according to claim 11 or claim 12 in which the alerting control is incorporated into an ASR function of the system.
14. An ASR-based control system operative to complement manual controls in operation of apparatus substantially as herein described with reference to the accompanying drawings.
15. An ASR based control system according to any preceding claim installed in a conveyance for controlling operation of one or more functional components of the conveyance.
16. An ASR based control system according to claim 15 in which the conveyance is a land vehicle.
17. An ASR based control system according to claim 15 in which the conveyance is an aircraft.
18. A method of operating an ASR-based control system in which an operator can perform a control action by issuing a verbal command or by operation of a manual control, the method comprising detecting a control action performed by a manually operated control and generating a speech output that includes an utterance that, if issued by an operator, would carry out the control action under voice control.
19. A method according to claim 18 in which the speech output is generated only in respect of a subset of a repertoire of command actions.
20. A method according to claim 19 in which the said subset is selected in response to an operator's interaction with the control system.
21. A method according to claim 20 in which speech output is not generated in respect of command actions that can be inferred to be known to an operator of the system.
22. A method according to claim 20 or claim 21 in which speech output is not generated in respect of command actions in response to a request by an operator.
23. A method according to any one of claims 18 to 22 further including a step of identifying an operator of the system and modifying its operation accordingly.
24. A method according to any one of claims 18 to 23 in which and utterance is interpreted as being a command only if it is made after or during the operation of an alerting control.
25. A method according to any one of claims 18 to 24 including determining whether an operator has unsuccessfully attempted to issue a voice command, and generating a speech output containing an utterance that, if spoken by the operator, would issue the command.
26. A method according to claim 25 in which an utterance being made after operation of an alerting control that is not recognised as a command is indicative of an unsuccessful attempt to issue a voice command.
27. A method of operating an ASR-based control system substantially as herein described with reference to the accompanying drawings.
28. A method of controlling one or more functional systems in a conveyance according to any one of claims 18 to 27.
29. A method according to claim 28 in which the conveyance is a land vehicle or an aircraft.

29. A method according to claim 28 in which the conveyance is a land vehicle or an aircraft. Amendments to the claims have been filed as follows 1. An ASR-based control system operative to complement manual controls in operation of apparatus, the control system comprising an ASR module for processing and acting upon verbally-issued control commands, a speech generation module, and a detection module for detecting operation of manual controls of an apparatus and for applying control signals to the speech generation module, in which, upon detection of operation of a manual control to perform a specific control action, the detection module applies signals to the speech generation module to cause the speech generation module to generate a speech output that contains an utterance that would, if issued by an operator, cause the ASR module to perform the specific control action.

2. An ASR-based control system according to claim 1 configured to conditionally cease to generate speech output in respect of one or more control actions.

3. An ASR-based control system according to claim 2 configurable such that, for a subset of a complete repertoire of control actions, a speech output will not be generated.

4. An ASR-based control system according to claim 1 or claim 2 arranged such that on receipt of a specific verbal command the system is configured such that a speech output will not be generated for one or more specific control action.

5. An ASR-based control system according to any one of claims 2 to 4 configured to apply an analysis to the pattern of use of manual and vocal commands issued by an operator and to generate or suppress a speech output on the basis of that analysis.

6. An ASR-based control system according to claim 5 in which the analysis is based on heuristic rules that relate to patterns of an operator's behaviour.

7. An ASR-based control system according to claim 6 which system infers that specific behaviour indicates that an operator does not wish to issue a vocal command in respect of a specific control action.

8. An ASR-based control system according to claim 7 in which a speech output for a given control action is generated only a predetermined maximum number of times.

9. An ASR-based control system according to any preceding claim operative to determine the identity of a specific operator and tailor its operation accordingly.

10. An ASR-based control system according to any preceding claim operative to determine the language being used by the operator and tailor its operation accordingly.

11. An ASR-based control system according to any preceding claim comprising an alerting control operable by a user to alert the system to expect a command utterance.

12. An ASR-based control system according to claim 11 in which the alerting control is a manually-operable control.

13. An ASR-based control system according to claim 11 or claim 12 in which the alerting control is incorporated into an ASR function of the system.

14. An ASR-based control system operative to complement manual controls in operation of apparatus substantially as herein described with reference to the accompanying drawings.

15. An ASR based control system according to any preceding claim installed in a conveyance for controlling operation of one or more functional components of the conveyance.

16. An ASR based control system according to claim 15 in which the conveyance is a land vehicle.

17. An ASR based control system according to claim 15 in which the conveyance is an aircraft.

18. A method of operating an ASR-based control system in which an operator can perform a control action by issuing a verbal command or by operation of a manual control, the method comprising detecting a control action performed by a manually operated control and generating a speech output that includes an utterance that, if issued by an operator, would carry out the control action under voice control.

19. A method according to claim 18 in which the speech output is generated only in respect of a subset of a repertoire of command actions.

20. A method according to claim 19 in which the said subset is selected in response to an operator's interaction with the control system.

21. A method according to claim 20 in which speech output is not generated in respect of command actions that can be inferred to be known to an operator of the system.

22. A method according to claim 20 or claim 21 in which speech output is not generated in respect of command actions in response to a request by an operator.

23. A method according to any one of claims 18 to 22 further including a step of identifying an operator of the system and modifying its operation accordingly.

24. A method according to any one of claims 18 to 23 in which an utterance is interpreted as being a command only if it is made after or during the operation of an alerting control.

25. A method according to any one of claims 18 to 24 including determining whether an operator has unsuccessfully attempted to issue a voice command, and generating a speech output containing an utterance that, if spoken by the operator, would issue the command.

26. A method according to claim 25 in which an utterance being made after operation of an alerting control that is not recognised as a command is indicative of an unsuccessful attempt to issue a voice command.

27. A method of operating an ASR-based control system substantially as herein described with reference to the accompanying drawings.

28. A method of controlling one or more functional systems in a conveyance according to any one of claims 18 to 27.