GB2231700A - Speech recognition - Google Patents

Speech recognition Download PDF

Info

Publication number
GB2231700A
GB2231700A GB9010577A GB9010577A GB2231700A GB 2231700 A GB2231700 A GB 2231700A GB 9010577 A GB9010577 A GB 9010577A GB 9010577 A GB9010577 A GB 9010577A GB 2231700 A GB2231700 A GB 2231700A
Authority
GB
United Kingdom
Prior art keywords
speech
speaker
recognition apparatus
speech recognition
information signals
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
GB9010577A
Other versions
GB2231700B (en
GB9010577D0 (en
Inventor
Michael Robinson Taylor
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Smiths Group PLC
Original Assignee
Smiths Group PLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Smiths Group PLC filed Critical Smiths Group PLC
Publication of GB9010577D0 publication Critical patent/GB9010577D0/en
Publication of GB2231700A publication Critical patent/GB2231700A/en
Application granted granted Critical
Publication of GB2231700B publication Critical patent/GB2231700B/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech

Description

3 -1- -7 CO a) SPEECH RECOGNITION APPARATUS AND METHODS This invention
relates to speech recognition apparatus and methods.
Speech recognition apparatus operates by comparing speech information from the speaker with information in a store representing a reference vocabulary. If the words spoken by the speaker are closely similar with the spectral-temporal or acoustic-phonetic information in the store, this yields a high rate of matching. The reference vocabulary may be established from information derived by many different speakers in different circumstances and it can be modified to characterise it more closely with the speech patterns from one particular speaker. This can result in accurate and reliable speech recognition when the speaker's voice is similar to that used to produce the reference vocabulary.
However, under some environmental conditions, the speaker's voice can be modified sufficiently for recognition to be made unreliable. In particularly, if the speaker is influenced by linear acceleration forces, such as high g-forces in an aircraft, or by vibration or stress, this can alter his speech patterns sufficiently to reduce the ability of the speech recognition apparatus to 1 1 2 - identify the words spoken. Attempts have been made to overcome this problem, such as descibed in GB 2186726. This proposes measuring the acceleration or other environmental influences and modifying the stored reference templates or word models in the vocabulary by dynamic adaptation to anticipate the way in which speech will be influenced by the environmental influences. In this way, the stored information after adaptation will bear a closer resemblance to the actual speech, such as influenced by acceleration. This arrangement, however, requires considerable processing capacity and can lead to delay in recognition.
It is an object of the present invention to provide speech recognition appaiatus and methods that can be used to improve recognition when the speech is subject to environmental influences.
According to one aspect of the present invention there is provided speech recognition apparatus including means for deriving speech information signals from speech made by a speaker, means for sensing an environmental influence on the speaker of the kind that modifies speech sound made by the speaker, means for determining when voicing of speech occurs, means for reducing the spectral tilt of the speech information signals during voicing and when the sensed environmental influence is sufficient to cause the speaker to increase the mean fundamental excitation frequency of his speech, the reduction in speech spectral tilt being such as to compensate at least in part for this increase in mean fundamental excitation frequency, and means for comparing the speech information signals after any such reduction in speech spectral tilt with stored speech information signals.
The means for sensing an environmental influence may include an acceleration sensor, a vibration sensor and? or alternatively, a noise sensor. The means for determining when voicing occurs may include a device responsive to movement of the vocal folds. The device responsive to movement of the vocal folds may be a laryngograph.
The means for reducing spectral tilt may be located intermediate the means for deriving the speech information signals and a spectral analysis unit which is arranged to produce output signals representative of the frequency bands within which the sound falls. The means for reducing the spectral tilt is preferably arranged to increase the reduction in speech spectral tilt when the environmental influence on the speaker increases The apparatus may include means to perform sub-set selection on the stored speech information signals in accordance with words previously recognised. The apparatus may include means to perform active word selection on the stored speech information signals in accordance with mode data. The means for deriving speech information signals preferably includes a microphone.
The apparatus may be arranged provide an output in accordance with identified words to control operation of aircraft equipment.
According to another aspect of the present invention there is provided a method of speech recognition including the steps of deriving speech information signals in accordance with speech made by a speaker, sensing environmental influences on the speaker of the kind that modify speech sounds made by the speaker, determining when the speech sounds are voiced, reducing spectral tilt of the speech information signals when both voicing is sensed and when the sensed environmental influences are sufficient to cause the speaker to increase the mean excitation frequency of his speech, the reduction in speech spectral tilt being such as to compensate at least in part for this increase in excitation frequency, and comparing the speech information signals after any such reduction in spectral tilt with stored speech information signals.
- The reduction in speech spectral tilt is preferably greater for increasing sensed environmental influences.
According to a further aspect of the present invention, there is provided apparatus for performing a method according to the other aspect of the invention.
Speech recognition apparatus for an aircraft, and its method of operation, in accordance with the present invention, will now be described, by way of example, with reference to the accompanying drawings, in which:
Figure 1 shows the apparatus schematically; Figure 2 illustrates operation of a part of previous apparatus; and Figure 3 illustrates operation of a part of the apparatus of the invention.
The speech recognition apparatus includes a processing unit 10 that receives input signals from a microphone 1, a laryngograph 2, environmental sensors 3 and a databus 4.
The microphone is located close to the speaker's mouth so as to detect his speech sounds. The laryngograph, which may be of a kind described in GB 2193024, is secured to the speaker's neck to sense movement of the vocal folds and thereby provide an output signal indicative of voiced speech sounds. The environmental sensors 3 are located where they will respond to substantially the same environmental influences as those to which the speaker is subjected. More particularly, the sensors 3 may include an acceleration sensor responsive to g-forces on the speaker, a vibration sensor and a noise sensor.
Signals from the microphone 1 are first supplied to a filter unit 11 in the processing unit 10 which also receives inputs from the laryngograph 2 and the sensors 3 and the operation of which will be described later. The output of the filter unit is supplied to a spectral analysis unit 12 Which produces output signals in accordance with the frequency bands within which the sound falls. These output signals are supplied to an optional spectral correction and noise adaptation unit 13 which improves the signal-to-noise ratio or eliminates, or marks, those signals that can only have arisen from noise rather than speech. Output signals from the unit 13 are supplied to one input of a comparator or pattern matching unit 14. The other input to the pattern matching unit 14 is taken from a vocabulary store 30 which is described in greater detail below. The pattern matching unit 14 compares the spectral-temporal frequency-time patterns derived from the microphone 1 with the stored vocabulary and produces an output on line 15 in accordance with the word which is the best fit or has the highest probability of being the sound received by the microphone 1.
The output on line 15 is supplied to the input of a post recognition processing unit 16 which performs various tasks on the string of word outputs from the pattern matching unit 14 as discussed in greater detail later. The post recognition processing unit 16 has three outputs. One output is provided on line 18 as a feedback channel to an indicator 21. This may be an audible or visual indicator perceivable by the speaker which either confirms his spoken commands, as recognised by the units 14 and 16, or requests repetition of all or part of the command, where an unsatisfactory recognition is achieved. The second output is provided on line 19 to a word sub-set selection unit 32 forming a part of the vocabulary store 30, the operation of which is described in detail below.
The third output is provided on line 20 as the system command signal to the remote teminal 17. The system command signal is produced when the unit 10 identifies a spoken command with sufficient probability and may, for example, be used to effect operation of external equipment via the databus 4.
The store 30 includes a reference vocabulary 31 in the form of pattern templates or word models of the spectral-temporal pattern or state descriptions of different words. This vocabulary is established by the speaker speaking a list of words, under normal environmental circumstances of no vibration, of no noise And of lg. The sounds made are entered in the vocabulary 31 and labelled with the associated word. The total vocabulary 31 may be further reduced by an optional sub-set selection at 32 under control of signals on line 19 in accordance with words previously spoken and recognised.
Following sub-set selection, the vocabulary is further subjected to active word selection at 33 in response to mode data on line 34 from the remote terminal 17 which is derived from information supplied to the remote terminal on the databus 4. For example, in an aircraft, the mode data may indicate whether the aircraft is landing or taking off, or is in mid flight.
Alternatively, for example, if a radio channel had already been selected via a spoken command, the probability of reselection will be small so the words associated with selection of that radio channel can be excluded from the vocabulary at 33. Poor correlation with selected, active templates could be used to invoke re-processing of the speech on a wider syntax.
The tasks performed by the unit 16 are as follows:
Grammar parsing and word spotting techniques are used to detect errors and recover words that have not been identified; 0 2. Identification of the template string or word model sequence of words which best fits the contextual information at the time. Since particular strings of words are more likely than others to be spoken during particular environmental circumstances this can be used to improve the identification of the particular command spoken by the user; and 3. Following the final identification, the processing unit 16 may generate signals for use in bringing up to date the vocabulary sub-set selection performed by 32. These signals are supplied to the vocabulary store 30 via line 19.
It is well known that speech can be influenced by the environmental conditions to which the user to subjected. The result of, for example, high acceleraion on a speaker can subject the thorax and throat to high pressures, making speech difficult and unintelligible to conventional speech recognition apparatus. Similarly, high vibration also alters the ability of the speech articulators and airstream mechanism to function normally and thereby corrupts the speech. This is described in 'Effects of Low Frequency Whole-Body Sinusoidal Vibration on Speech', Michael R. Taylor; Proc. 1. 0. A. Vol 11 Part 5 (1989) pages 151 to 158 and in 'Studies in Automatic Speech Recogntion and its Application in Aerospace' Chapter 5 - PhD thesis by Michael R. Taylor. It has been found that in high noise environments a speaker will automatically alter his speech in a way that is not simply an amplitude increase. Conditions of high stress, such as caused by tiredness, high work load or impending danger also influences the speech patterns of the speaker. The alterations produced in speech by these different environmental conditions is complex and, to provide compensation in a speech recognition apparatus would require large processing capacity. It has, however, been found that these environmental conditions produce a universal effect on speech of a certain kind. More particularly, all these environmental conditions lead to an increase in the mean fundamental excitation 1 i frequency of voiced speech, that is, speech produced by movement of the vocal folds, which in turn causes an upward tilt in the voiced speech spectrum.
In conventional speech recognition apparatus, it is common practice to employ a pre-emphasis filter which acts to increase the upper frequency of the speech signal prior to supply to any pattern matching functions. The performance of such a filter is illustrated in Figure 2. By contrast, the filter unit 11 of the present invention operates in the opposite sense, such as to reduce the mean frequency of the speech input, its spectral tilt, under certain circumstances, as illustrated in Figure 3. This is achieved by attenuating higher frequencies by progressively greater amounts. Figure 3 illustrates a family of three curves A to C, although in practice a considerably larger number of curves would be used. The performance curve is selected according to the amount and nature of the environmental influences on the speaker. For example, under a high gforce acceleration and with high noise present, the filter unit 11 might have the performance characteristic illustrated by curve A, whereas for a lower acceleration and with less noise, the curve C would be used.
1 1 The spectral tilt correction function is only employed when the environmental influences are sufficiently great to affect the speech and when this speech is voiced. In normal circumstances of low environmental influences, the filter unit 11 takes a neutral (flat) characteristic or a conventional characteristic as shown in Figure 2 where the speech spectrum is modified, for both voiced and unvoiced speech.
Voiced speech is detected in the above example by means of a largyngograph but other devices responsive to vocal fold movement could be used instead. Alternatively, voiced speech could be identified by analysis of the speech signals from the microphone. A suitable analysis technique is described in-Theory and Applications of Digital Signal Processing, L. B. Rabiner and B. Gold, Prentice Hall Inc pub. 1975 pages 681 to 687. Modification of the mean frequency of the speech input signal can be achieved relatively simply without considerable processing capability but yet has been found to lead to a significant increase in the recognition rate of voiced speech under adverse environmental conditions.
t 1 1 Although the above system has been described for producing command signals, such as for controlling equipment, a similar system may be used in a speech communication system. In such an alternative arrangement, the line 20, instead of carrying command signals would carry speech signals in respect of the identified words and phrases. It will be appreciated that the various steps in the operation of the recognition system need not be carried out by discrete units, but could be made by steps in the programming of one or more computers or processing units.

Claims (1)

  1. Speech recognition apparatus including means for deriving speech information signals from speech made by a speaker, means for sensing an environmental influence on the speaker of the kind that modifies speech sound made by the speaker, means for determining when voicing of speech occurs, means for reducing the spectral tilt of the speech information signals during voicing and when the sensed environmental influence is sufficient to cause the speaker to increase the mean fundamental excitation frequency of his speech, the said reduction in speech spectral tilt being such as to compen-sate at least in part for this increase in mean fundamental excitation frequency, and means for comparing the speech information signals after any such reduction in speech spectral tilt with stored speech information signals.
    Speech recognition apparatus according to Claim 1, wherein the means for sensing an environmental influence includes an acceleration sensor.
    Speech recognition apparatus according to Claim 1 or 2, wherein the means for sensing an environmental influence includes a vibration sensor.
    Speech recognition apparatus according to any one of the preceding claims, wherein the means for sensing an environmetal influence includes a noise sensor.
    speech recognition apparatus according to any one of the preceding claims, wherein the means for determining when voicing occurs includes a device responsive to movement of the vocal folds.
    Speech recognition apparatus according to Claim 5, wherein the device responsive to movement of the vocal folds is a laryngograph.
    Speech recognition apparatus according to any one of the preceding claims, wherein the means for reducing spectral tilt is located intermediate the means for deriving speech information signals and a spectral analysis unit which is arranged to produce output signals representative of the frequency bands within which the sound falls.
    Speech recognition apparatus according to any one of the preceding claims, wherein the means for reducing the spectral tilt is arranged to increase the reduction in speech spectral tilt when the environmental influence on the speaker increases.
    Speech recognition apparatus according to any one of the preceding claims, including means to perform sub-set selection on the stored speech information signals in accordance with words previously recognised.
    10.
    11.
    Speech recognition apparatus according to any one of the preceding claims, including means to perform active word selection on the stored speech information signals in accordance with mode data.
    Speech recognition apparatus according to any one of the preceding claims, wherein the means for deriving speech information signals includes a microphone.
    -1 12.
    13.
    14.
    Speech recognition apparatus according to any one of the preceding claims, wherein the apparatus is arranged to provide an output in accordance with identified words to control operation of aircraft equipment.
    Speech recognition apparatus substantially as hereinbefore described with reference to the accompanying drawings.
    A method of speech recognition including the steps of deriving speech information signals in accordance with speech made by a speaker, sensing environmental influences on the speaker of the kind that modify speech sounds made by the speaker, determining when the speech sounds are voiced, reducing spectral tilt of the speech information signals when both voicing is sensed and when the sensed environmental influences are sufficient to cause the speaker to increase the mean excitation frequency of his speech, the said reduction in speech spectral tilt being such as to compensate at least in part for this increase in excitation frequency, and comparing the speech information signals after any such reduction in spectral tilt with stored speech information signals.
    - 18 15.
    A method according to Claim 14, wherein the reduction in speech spectral tilt is greater for increasing sensed environmental influences.
    16.
    A method of speech recognition substantially as hereinbefore described with reference to the accompanying drawings.
    Apparatus for performing a method according to any one of Claims 14 to 16.
    18.
    Any novel feature or combination of features as hereinbefore described.
    Published 1990 atThe Patent Office. Stats House. 66 71 High Holborn. London WC1R 4TP. Further copies maybe obtainedfrom The Patent Office. Sales Branch, St Mary Cray, Orpington. Kent BR5 3RD. Printed by Multiplex techniques ltd. St Mary Cray, Kent, Con. 1187
GB9010577A 1989-05-16 1990-05-11 Speech recognition apparatus and methods Expired - Lifetime GB2231700B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB898911153A GB8911153D0 (en) 1989-05-16 1989-05-16 Speech recognition apparatus and methods

Publications (3)

Publication Number Publication Date
GB9010577D0 GB9010577D0 (en) 1990-07-04
GB2231700A true GB2231700A (en) 1990-11-21
GB2231700B GB2231700B (en) 1993-07-07

Family

ID=10656773

Family Applications (2)

Application Number Title Priority Date Filing Date
GB898911153A Pending GB8911153D0 (en) 1989-05-16 1989-05-16 Speech recognition apparatus and methods
GB9010577A Expired - Lifetime GB2231700B (en) 1989-05-16 1990-05-11 Speech recognition apparatus and methods

Family Applications Before (1)

Application Number Title Priority Date Filing Date
GB898911153A Pending GB8911153D0 (en) 1989-05-16 1989-05-16 Speech recognition apparatus and methods

Country Status (4)

Country Link
JP (1) JPH03208099A (en)
DE (1) DE4015381A1 (en)
FR (1) FR2647248A1 (en)
GB (2) GB8911153D0 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0421341A2 (en) * 1989-10-04 1991-04-10 Matsushita Electric Industrial Co., Ltd. Speech recognizer
US5361324A (en) * 1989-10-04 1994-11-01 Matsushita Electric Industrial Co., Ltd. Lombard effect compensation using a frequency shift
EP0833304A2 (en) * 1996-09-30 1998-04-01 Microsoft Corporation Prosodic databases holding fundamental frequency templates for use in speech synthesis

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5400409A (en) * 1992-12-23 1995-03-21 Daimler-Benz Ag Noise-reduction method for noise-affected voice channels
DE4307688A1 (en) * 1993-03-11 1994-09-15 Daimler Benz Ag Method of noise reduction for disturbed voice channels
DE19712632A1 (en) * 1997-03-26 1998-10-01 Thomson Brandt Gmbh Method and device for remote voice control of devices
US6983242B1 (en) * 2000-08-21 2006-01-03 Mindspeed Technologies, Inc. Method for robust classification in speech coding

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE3766124D1 (en) * 1986-02-15 1990-12-20 Smiths Industries Plc METHOD AND DEVICE FOR VOICE PROCESSING.

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0421341A2 (en) * 1989-10-04 1991-04-10 Matsushita Electric Industrial Co., Ltd. Speech recognizer
EP0421341A3 (en) * 1989-10-04 1992-07-29 Matsushita Electric Industrial Co., Ltd. Speech recognizer
US5361324A (en) * 1989-10-04 1994-11-01 Matsushita Electric Industrial Co., Ltd. Lombard effect compensation using a frequency shift
EP0833304A2 (en) * 1996-09-30 1998-04-01 Microsoft Corporation Prosodic databases holding fundamental frequency templates for use in speech synthesis
EP0833304A3 (en) * 1996-09-30 1999-03-24 Microsoft Corporation Prosodic databases holding fundamental frequency templates for use in speech synthesis

Also Published As

Publication number Publication date
DE4015381A1 (en) 1990-11-22
GB2231700B (en) 1993-07-07
JPH03208099A (en) 1991-09-11
GB9010577D0 (en) 1990-07-04
FR2647248A1 (en) 1990-11-23
GB8911153D0 (en) 1989-09-20

Similar Documents

Publication Publication Date Title
US6553342B1 (en) Tone based speech recognition
US5131043A (en) Method of and apparatus for speech recognition wherein decisions are made based on phonemes
US6912499B1 (en) Method and apparatus for training a multilingual speech model set
US7593849B2 (en) Normalization of speech accent
EP0799471B1 (en) Information processing system
JPH0422276B2 (en)
JP2001503154A (en) Hidden Markov Speech Model Fitting Method in Speech Recognition System
WO2000003386A1 (en) Language independent speech recognition
US5142585A (en) Speech processing apparatus and methods
JPS6247320B2 (en)
JPH0876785A (en) Voice recognition device
JPH09230885A (en) Pattern position decision method and device therefor
JPH09230888A (en) Method and device for pattern matching
AU672696B2 (en) Method of speech recognition
GB2231700A (en) Speech recognition
Hansen et al. Robust speech recognition training via duration and spectral-based stress token generation
US5751898A (en) Speech recognition method and apparatus for use therein
US5732393A (en) Voice recognition device using linear predictive coding
US20020095282A1 (en) Method for online adaptation of pronunciation dictionaries
Holmes et al. Why have HMMs been so successful for automatic speech recognition and how might they be improved
Sharma et al. Speech recognition of Punjabi numerals using synergic HMM and DTW approach
Gao et al. A real-time Chinese speech recognition system with unlimited vocabulary
EP0269233A1 (en) Speech recognition apparatus and methods
JP3100180B2 (en) Voice recognition method
JPH0736477A (en) Pattern matching system

Legal Events

Date Code Title Description
732E Amendments to the register in respect of changes of name or changes affecting rights (sect. 32/1977)
PE20 Patent expired after termination of 20 years

Expiry date: 20100510