GB2149155A

GB2149155A - A method of communicating with a computer, and a user-operated input device for a computer

Info

Publication number: GB2149155A
Application number: GB08329413A
Authority: GB
Inventors: Brian Fine
Original assignee: Fine & Curtis Ltd
Current assignee: Fine & Curtis Ltd
Priority date: 1983-11-03
Filing date: 1983-11-03
Publication date: 1985-06-05

Abstract

The items on a menu 12 of computer inputs 14 available for selection by the user are indicated by the input device one by one by a continuously repeated series of corresponding outputs indicated on a speech generation unit or a VDU, for example, controlled by a control unit. In the Figure the computer is controlling the state of a machine. The available instructions are "ON" and "OFF". When the user says "ON" or "OFF" into a microphone in chorus with the device, as at times 16 and 18, the corresponding input is applied to the computer, which turns the machine on or off. If the user says a word other than one of the words in the menu, no input is applied to the computer. At time 20 the user says "STOP", but the computer does not switch the machine off. This discrimination is effected by an analyser analysing the output of the microphone and providing signals to the control unit. <IMAGE>

Description

SPECIFICATION A method of communicating with a computer and a user-operated input device for a computer -The present invention relates to a method of communicating with a computer of the type in which an output device indicates to a user a set of inputs to the computer available for selection by the user. The invention also relates to a user operated input device for a computer, operating according to the method of the invention.

A computer system using an input method of this type is known as a menu driven system, and the indicated set of inputs available for selection is known as a menu.

Computer systems commonly use a keyboard to allow users to enter data and instructions. The use of a keyboard is convenient for many, but not all tasks and users. For instance, a user may be unable to operate a full keyboard because of a handicap, or may have both hands occupied performing another task, or may be away from the keyboard.

Methods and apparatus for speech recognition have been developed to enable users to communicate with a computer. The sound of the user's voice is detected and analysed by breaking it down into units of sound, called phonemes, which are the same as, or recog nisably close to reference phonemes with which the apparatus has been preprogrammed. For instance, the set of phonemes will include vowel sounds, consonant sounds and the sound of letter groups such as "th", "oi" etc. Further analysis then takes place to link sets of phonemes together to form words.

It has been found to be extremely difficult for a machine to detect the end of one word and the beginning of the next, and even more difficult for a machine to break a sound into its component phonemes. These problems have been overcome with some success by using large, fast computers to perform the analysis, according to complicated algorithms.

Without the use of such computers recognition is often unreliable, because ambiguities occur, or sounds at variance with the reference phonemes become unrecognisable. However, apparatus based on large computers is far too expensive for use in many situations in which a speech recognition facility would be advantageous, for instance in the home of a disabled person.

Considerable effort has been spent in attempting to produce a speech recognition system which can recognize general speech, that is a large vocabulary of words spoken by any speaker. However, there are many situations in which recognition of only a small vocabulary is necessary, and in which recognition must be reliable but need only be crude.

That is, the apparatus must be able to recognize a word which is not identical to, but which, to the human ear, is clearly intended to be a particular word from the small vocabulary. A need exists for a method and apparatus which can be used to communicate with a computer in these situations.

The present invention provides a method of communicating with a computer, of the type in which an output device indicates to a user a set of inputs to the computer available for selection by the user, wherein the available inputs are indicated by a continuously repeated series of corresponding outputs, and an input corresponding to the contemporaneous output is applied to the computer upon detection of a predetermined sound made by the user.

The expression "predetermined sound" is used for convenience to mean a sound which complies with predetermined criteria which may be one or more of, for example, a predetermined minimum volume, presence of a predetermined frequency, presence of a predetermined phoneme, and so on.

The invention further provides a user operated input device for a computer, comprising control means operable to control an output device to indicate to the user a set of inputs to the computer available for selection by the user, in a continuously repeated series, a microphone for detecting sounds made by the user, and an analyser analysing signals from the microphone and supplying an indicating signal to the control means when a predetermined sound is detected, wherein, on receipt of the said indicating signal, the control means applies to the computer an input corresponding to the contemporaneous output of the output device.

Preferably, the available inputs are indicated one by one, and for each available input there is a corresponding, predetermined sound which is a word.

Since the invention relies on the sense of rhythm of the user to enable him to speak in chorus with the computer, thereby selecting the contemporaneously indicated input, many of the problems of speech recognition disappear. For example, computing power is not needed to detect the beginning of a word, because the beginning of an output on the output device can be taken as the beginning of a spoken word (if any). Furthermore, reliance on the user's sense of rhythm means that an input can be selected using a word pronounced in a wide variety of ways, as will be seen from the following description.

The method according to the invention and one embodiment of an input device according to the invention will be described by way of example with reference to the accompanying drawings in which: Figs. 1 and 2 are timing diagrams indicating a sequence of events when the method is In use, and Fig. 3 is a block diagram of an embodiment of the input device of the invention.

Fig. 1 shows a timing diagram of a sequence of operations when the method of the invention is being used to communicate a series of the simple instructions "ON" and "OFF" to a computer, which is being used to control a machine. For reference, a time axis 10 is shown extending horizontally across Fig.

1. Sequences of operations and actions are shown below the axis 10, and vertical alignment, in the figure, of events in those sequences indicates the simultaneous occurrence of those events.

The menu 12 of available instructions comprises, in this example, the instructions "ON" and "OFF" which are spoken by the input device one by one, in a continuously repeated series indicated by the line of boxes 14 containing the instructions. Speech of the user during the period shown, comprises the words "ON" "OFF" and "STOP" spoken at the times indicated on the time axis 10 by the numerals 16, 18 and 20 respectively.

When a word spoken by the user is detected, and found to be recognisable as corresponding to the menu item 14 which is contempraneously being spoken by the input device, the input corresponding to that menu item 14, in this case the instruction "ON" or "OFF" is applied to the computer, as indicated at 22. That is, at time 16, the user speaks the word "ON" in time with the input device, and so the instruction "ON" is applied to the computer. The computer accordingly switches on, at time 16, the machine which it is controlling, and which is initially "OFF".

The state of the machine is indicated at 24.

Similarly, at the later time 18 the user speaks the word "OFF" in time with the input device, the instruction "OFF" is applied to the computer and the machine is switched off.

If the user speaks a word other than the predetermined word, that is, in this example, if the user speaks a word which is not recognisable as the contemporaneous menu item being spoken by the input device, no instruction is applied to the computer, and the state of the machine is not changed. This is indicated at time 20, when the speaker says "STOP" while the input device is saying "OFF". The machine remains in its "ON" state.

Fig. 2 indicates how the method of the invention can be used to communicate a more complicated instruction. Vertical alignment in Fig. 2 does not indicate the coincidence of two events. In this example, the computer to which the instructions are communicated controls several types of apparatus in several different rooms in an office block. Each time an instruction is communicated to the computer, in the manner described above, the input device begins to speak a new menu.

That is, the input device is shown initially speaking the menu 26. Each time an instruction is received, as discussed below and indicated in the drawing in the lines labelled ''USER'', the input device changes from speaking one menu to speaking the menu shown next below in Fig. 2. In this way, a composite instruction, made up of one instruction from each menu, can be communicated.

In this case, the first menu 26 merely contains the words "ON" and "OFF" which indicate to the user that the input device can be activated to communicate a new instruction by the user speaking the word "ON" at the appropriate time. On receipt of the instruction "ON" the input device begins to speak the second menu 28, which comprises the names of the rooms in which the apparatus controlled by the computer is installed.

Apparatus is installed in the technical department, the administration department, the post room and the security room, indicated in the menu 28 by "TECH", "ADMIN", "POST" and "SECURITY" respectively. The apparatus in the post room comprises lighting ("LIGHT"), ventilation ("VENT"), audio equip- ment ("AUDIO"), video equipment ("VIDEO") and heating ("HEAT"). The menu 30 enables the user to select one of these pieces of apparatus. The heating is selected in Fig. 2.

The menu 32 then allows the user to instruct the heating to be turned up ("WARMER") or down ("COOLER") or switched on ("START") or off ("STOP"). Thus, it can be seen from Fig. 2, and from an understanding of Fig. 1 and its associated description, that in the sequence of events of Fig. 2, the user instructs the computer to turn up the temperature of the heating in the post room.

Preferably, every menu includes the word "OFF", which can be selected to abort a partially entered instruction. The input device then reverts to speaking the menu 26. When a complete instruction has been entered, after the menu 32, the input device reverts to speaking the "ON" and "OFF" menu, shown at 34.

One embodiment of an input device 36 according to the invention is shown schematically in Fig. 3, allowing a user to communicate with a computer 38. The user operated input device 36 comprises a speech generation unit 40 which drives an external loudspeaker 42. A control circuit 44 controls the speech generation unit 40 to speak a menu through the loudspeaker 42, as described above, to indicate a set of inputs to the computer available for selection by the user.

The computer 38 instructs the control circuit 44 as to the inputs available for selection, over the two-way bus 45. The menu items are spoken one by one in a continuously repeated series.

A microphone 46 responds to the speech of the user and supplies a signal to an anaylser 47. The analyser 47 analyses the signal to determine when the microphone 46 has detected a word determined by the control unit and corresponding to the menu item contemporaneously being spoken by the device 36.

Upon such detection, the analyser sends a signal to the control unit 44 and on receipt of this signal the control unit 44 applies the input selected by the user to the computer 38, over the bus 45. The analyser 47 may be a comparator comparing the output from the microphone 46 with a signal from the control unit 36.

The input device has been described as giving a speech output, which is convenient for many situations. However, the input device could control other types of output device, for instance the menu could be displayed on the screen of a cathode ray tube.

An alternative adaptation of the method, which is not indicated in the drawings, allows the speed of the conversation with the computer to be increased. This adaptation is preferably used when the output of the device is visual rather than verbal. Each menu item indicates more than one of the available inputs so that the number of menu items is reduced, as is the time taken to indicate the whole available set of inputs. The number of analysers in the input device must be increased so that there are as many analysers as there are inputs indicated in one menu item.

When a menu item is displayed, each analyser analyses the signal received from the microphone to determine whether a word has been detected corresponding to a respective one of the inputs indicated in the contemporaneous menu item.

For example, each menu item could indicate three of the available inputs. Three analysers would each detect the predetermined sound corresponding to a respective one of the three inputs. When the menu item changes, the three analysers would begin to detect new sounds corresponding to the contents of the new menu item.

This method allows a more natural conversation to take place with the computer, because of the choice of inputs available in each menu item. In principle there is no limit to the number of inputs which can be indicated on each menu item, so long as sufficient analysers are available. However, the analysis can be performed more easily, and consequently more cheaply, if only a small number of inputs are indicated at one time, with corresponding predetermined sounds which are easily distinguished. The analysers are then only required to perform a crude analysis.

The input device has been described as detecting speech of the user, and of recognising that a spoken word corresponds to a menu item. However, it is not necessary to use speech recognition more complicated than detection of a sound uttered at a particular time, for instance coincident with the beginning of the output of a menu item. If the menu is a regular series of outputs, the user then only needs a sense of rhythm to be able to select a menu item and hence communicate with the computer. It may be necessary to prevent sounds below a threshold intensity being taken into account, or to use a directional microphone, so that background noise does not cause spurious selections.

An alternative device could use more advanced speech recognition, by comparing the input device spoken sound with the user spoken sound to produce a measure of the fit, that is of the similarity between the two sounds. If the fit is better than a pre-determined threshold, the sound can be accepted as selecting the contemporaneous menu item.

Clearly, speech recognition at any level of complexity can be used in this way, for instance by comparing the frequency or volume envelopes of the sounds. This comparison can be performed by analogue circuitry, to any level of fit required.

Alternatively the output of the device could be at a different frequency for each menu item, so that an item could be selected by singing in tune (and in time) with the input device.

Any natural or artificial language can be used in the device after suitable adaptation of the control circuit 44.

The method and apparatus of the invention are equally applicable in communicating with a general purpose computer or a dedicated computer, that is a computer dedicated to operate a particular device or set of devices.

Data or instructions can be communicated, depending only on the contents of the menus used.

The control unit 44, the analyser 47 and the speech generator 40 are described as seperate units. These units may comprise a small computer, for instance they may be microprocessor based devices. Alternatively, suitable programming of the computer 38 would enable the functions of these units to be carried out by the computer 38, receiving input directly from the microphone 46 and providing output directly to the loudspeaker 42.

The device could be used for teaching pronunciation. The menu would consist of one word at a time, continuously repeated until the user correctly pronounces it in time with the device. A new menu word would then be given for pronunciation. The computer would select the words to be pronounced and could keep a record of the user's performance, or provide a score.

To be successful in teaching pronunciation, it would be desirable for the device to carry out speech recognition of a quite advanced kind, in order for it to be able to detect nuances of pronunciation. Using the present invention, high quality speech recognition can be performed by using the analyser to compare the signal from the microphone with a set of reference phonemes preprogrammed into a memory of the analyser. However, not all of the problems normally associated with high quality recognition are encountered, because the apparatus knows when to expect a word to begin, and knows which word to expect. Accordingly, the problem of finding breaks between words does not arise, and the complexity of analysis techniques is greatly reduced.

Claims

1. A method of communicating with a computer, of the type in which an output device indicates to a user a set of inputs to the computer available for selection by the user, wherein the available inputs are indicated by a continuously repeated series of corresponding outputs, and an input corresponding to the contemporaneous output is applied to the computer upon detection of a predetermined sound made by the user.

2. A method according to claim 1, wherein for each available input there is a corresponding, predetermined sound.

3. A method according to any preceding claim, wherein, on detection of the predetermined sound, the set of inputs available for selection is changed.

4. A method according to any preceding claim, wherein the available inputs are indicated one by one.

5. A method according to claim 4, wherein the available inputs are indicated verbally by speech generation apparatus.

6. A method according to any of claims 1 to 4, wherein the available inputs are indicated visually on a cathode ray tube.

7. A method according to any preceding claim, wherein the predetermined sound is a predetermined word.

8. A method according to any of claims 1 to 6, wherein the predetermined sound has a predetermined pitch.

9. A method according to any of claims 1 to 6, wherein the predetermined sound has a predetermined frequency envelope.

10 A method according to any of claims 1 to 6, wherein the predetermined sound has a predetermined volume envelope.

11. A method according to any preceding claim, wherein the predetermined sound has a volume in a predetermined range.

12. A method of communicating with a computer, substantially as described above with reference to Figs. 1 and 2 of the accompanying drawings.

1 3. A user operated input device for a computer, comprising control means operable to control an output device to indicate to the user a set of inputs to the computer available for selection by the user, in a continuously repeated series, a microphone for detecting sounds made by the user, and an analyser analysing signals from the microphone and supplying an indicating signal to the control means when a predetermined sound is detected, wherein, on receipt of the said indicating signal, the control means applies to the computer an input corresponding to the contemporaneous output of the output device.

14. A device according to claim 13, wherein there is a respective, predetermined sound corresponding to each available input.

15. A device according to claim 13 or 14.

wherein the available inputs are indicated one by one.

16. A device according to claim 15, further comprising an output device, and wherein the output device comprises speech generation apparatus.

17. A device according to any of claims 13 to 15, further comprising an output device, and wherein the output device comprises means for generating a visual display on a cathode ray tube.

18. A device according to any of claims 13 to 17, wherein the predetermined sound is a word.

19. A device according to any of claims 13 to 17, wherein the predetermined sound has a predetermined pitch.

20. A device according to any of claims 13 to 17. wherein the predetermined sound has a predetermined volume envelope.

21. A device according to any of claims 13 to 17, wherein the predetermined sound has a predetermined frequency envelope.

22. A device according to any of claims 13 to 21, wherein the predetermined sound has a volume in a predetermined range.

23. A user operated input device for a computer, substantially as described above with reference to Fig. 3 of the accompanying drawings.