GB2231698A - Speech recognition - Google Patents

Speech recognition Download PDF

Info

Publication number
GB2231698A
GB2231698A GB9010291A GB9010291A GB2231698A GB 2231698 A GB2231698 A GB 2231698A GB 9010291 A GB9010291 A GB 9010291A GB 9010291 A GB9010291 A GB 9010291A GB 2231698 A GB2231698 A GB 2231698A
Authority
GB
United Kingdom
Prior art keywords
word
features
phrase
words
vocabulary
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
GB9010291A
Other versions
GB2231698B (en
GB9010291D0 (en
Inventor
Ian Bickerton
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Smiths Group PLC
Original Assignee
Smiths Group PLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Smiths Group PLC filed Critical Smiths Group PLC
Publication of GB9010291D0 publication Critical patent/GB9010291D0/en
Publication of GB2231698A publication Critical patent/GB2231698A/en
Application granted granted Critical
Publication of GB2231698B publication Critical patent/GB2231698B/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks

Description

:2 SPEECH RECOGNITION This invention relates to speech recognition
apparatus and methods.
-11;:3 -1- GES, 'D EI in complex equipment having multiple functions it can be useful to be able to control the equipment by spoken commands. This is also useful where the user's hands are occupied with other tasks or where the user is disabled and is unable to use his hands to operate conventional mechanical switches and controls.
Programming of speech recognition apparatus is achieved by reading out a list of words or phrases to be entered into a reference vocabulary. The speech sounds are broken down into spectral components and stored as spectral-temporal word models or templates.
When an unknown word is subsequently spoken this is also broken down into its spectral components and these are compared with the reference vocabulary by means of a suitable algorithm such as the Hidden SemiMarkov Model. The reference vocabulary is preferably established by multiple repetitions of the same word in different circumstances and by different people. This introduces a spread or broadening of the word models so that there is a higher probability that when the same word is subsequently spoken it will be identified against that word model. However, it can result in overlap of similar word models leading to a greater probability of an incorrect identification.
The use of neural nets has also been proposed but these are not suitable for identification of continuous speech.
The ability to achieve accurate identification of spoken words is made more difficult in adverse circumstances such as with high background noise or when the speaker is subject to stress.
It is an object of the invention to provide speech recognition methods that can be used to improve the recognition of speech sounds.
According to one aspect of the present invention there is provided a method of speech recognition comprising the steps of supplying speech signals in respect of a plurality of known words or phrases to a neural net, arranging for the neural net to identify the features of each word or phrase that discriminate them from others of said words or phrases, supplying information in respect of these discriminative features together with information identifying the word or phrase i - 3 with which they are associated to store means to build up a reference vocabulary, and subsequently comparing speech signals in respect of an unknown one of said words or phrases with discriminative features in said vocabulary store so as to identify the words or phrase.
The method preferably includes the steps of speaking each known word or phrase a plurality of times and temporally aligning the examples of each word to produce the speech signals that are supplied to the neural net. The features of each word or phrase that discriminate them from others of said words or phrases may, for example, be spectral features or linear predictive coefficients. The comparison of speech signals in respect of an unknown word or phrase with the reference vocabulary of discriminative features is preferably carried out by a Hidden Semi-Markov Model technique. The reference vocabulary in the store means may contain dynamic time warping templates of the discriminative features. Syntax restriction on the reference vocabulary is preferably performed according to the syntax of previously identified words.
According to another aspect of the present invention there is provided apparatus for performing the method of the above one aspect of the invention.
4 Speech recognition apparatus and its method of operation, in accordance with the present invention, will now be described, by way of example, with reference to the accompanying drawings, in which:
Figure 1 shows the apparatus schematically; Figure 2 illustrates steps in the method; and Figure 3 illustrates a step in the method.
The speech recognition apparatus is indicated generally by the numeral 1 and receives speech input signals from a microphone 2 which may for example be mounted in the oxygen mask of an aircraft pilot. Output signals representative of identified words are supplied by the apparatus 1 to a feedback device 3 and to a utilisation device 4. The feedback device 3 may be a visual display or an audible device arranged to inform the speaker of the words as identified by the apparatus 1. The utilisation device 4 may be arranged to control a function of the aircraft equipment in response to a spoken command recognised by the utilisation device from the output signals of the apparatus.
Signals from the microphone 2 are supplied to a pre-amplifier 10 which includes apre-emphasis stage 11 that produces a flat long-term average speech spectrum to ensure that all the frequency channel outputs occupy a similar dynamic 2ange, the characteristic being nominally flat up to 1 kHz. A switch 12 can be set to give either a 3 or 6 dB/octave lift at higher frequences. The pre-amplifier 10 also includes an anti-aliasing filter 21 in the form of an 8th order Butterworth low-pass filter with a 3dB cut-off frequency set at 4 kHz.
The output from the pre-amplifier 10 is fed via an analogue-to-digital converter 13 to a digital filterbank 14. The filterbank 14 has nineteen channels implemented as assembly software in a TMS32010 microprocessor and is based on the JSRU Channel Vocoder described by Holmes, J.N in IEE Proc., Vol 127, Pt.F, No.1, Feb 1980. The filterbank 14 has uneven channel spacing corresponding approximately with the critical bands of auditory perception in the range 250-4000Hz. The responses of adjacent channels cross at approximately 3dB below their peak. At the centre of a channel the attenuation of a neighbouring channel is approximately lldB.
Signals from the filterbank 14 are supplied to an integration and noise marking unit 15 which incorporates a noise marking algorithm of the kind described by i.S. Bridle et al. 'A noise compensating spectrum distance measure applied to automatic speech recognition. Proc. Inst. Acoust., Windemere Nov. 19841. Adgptive noise cancellation techniques to reduce periodic noise may be implemented by the unit 15 which can be useful in reducing, for example, periodic helicopter noise.
The output of the noise marking unit 15 is supplied to a pattern matching unit 16 which performs the various pattern matching algorithms. The pattern matching unit 16 is connected with a vocabulary store 17 which - 7 contains Markov models in respect of discriminative features of each word or phrase in the reference vocabulary. The discriminative features are entered to the vocabulary in the manner shown in Figures 2 and 3.
Firstly, isolated examples of each of the words or phrases to be entered in the reference vocabulary are recorded. This is repeated so that multiple examples of each word or phrase are available. Next, the individual recorded utterances are temporally aligned to the median of the utterances by means of dynamic programming. This removes the temporal variations in natural speech, where the same word can be spoken at different speaking rates. The median word is selected as that of average duration, or by using some other distance metric which places the word in the middle of the group of words. For example, if the reference vocabulary comprises the digits "zero" to finine", all the training repetitions of each number, after the dynamic processing, will have the same time duration.
The time aligned set of training words are now presented to a neural net. The neural net structure may be single or multiple layered with any conventional error back propagation learning strategy. The neural net is arranged to learn the discriminative spectral features of the vocabulary, that is, those features of one word which discriminate it from other words in the vocubulary. An 8 - example of this is illustrated in Figure 3 which shows, at the left hand, the spectral-temporal analysis of the spoken digit "one". The right-hand of Figure 3 shows those features of the digit "one" which discriminate it from the digits "zero", "two", "three" and so on.
These disciminative features are then transferred to a conventional algorithm which is able to overcome the temporal variability of natural speech. In this example the Hidden Semi-Markov Model (HSMM) is used. The discriminative features identified by the neural net are integrated with the HSMM parameters for storage in the store 17.
In this way, the store 17 contains a model of each word or phrase in the vocabulary, which takes into account the confusibility of that word with other words in the vocabulary. The enrolement procedure for subsequent pattern matching is thereby improved.
The discriminative features used to identify each word need not necessarily be spectral features but could be linear predictive coefficients or any other feature of the speech signal.
The word models in the store may be Dynamic Time Warping (DTW) templates in order to take care of temporal variability and the neural net distance metric summed across the word. A syntax unit 18, connected between the vocabulary store 17 and the pattern matching unit 16 may be used to perform conventional syntax restriction on the stored vocabulary with which the speech is compared, according to the syntax of previously identified words.
The method enables recognition of continuous speech using a neural net enrolement process with the improved recognition performance this can achieve but without excessive processing capacity.
-

Claims (1)

  1. A method of speech recognition comprising the steps of supplying speech signals in respect of a plurality of known words or phrases to a neural net, arranging for the neural net to identify the features of each word or phrase that discriminate them from others of said words or phrases, supplying information in respect of these discriminative features together with information identifying the word or phrase with which they are associated to store means to build up a reference vocabulary, and subsequently comparing speech signals in respect of an unknown one of said words or phrases with discriminative features in said vocabulary store so as to identify the word or phrase.
    A method according to Claim 1, including the steps of speaking each known word or phrase a plurality of times and temporally aligning the examples of each word to produce the speech signals that are supplied to the neural net.
    c 3. - A method according to Claim 1 or 2, wherein the features of each word or phrase that discriminate them from others of said words or phrases are spectral features.
    A method according to Claim 1 or 2,,wherein the features of each word or phrase that discriminate them from others of said words or phrasesare linear predictive coefficients.
    A method according to any one of the preceding claims, wherein the comparison of speech signals in respect of an unknown word or phrase with the reference vocabulary of discriminative features is carried out by a Hidden Semi-Markov Model technique.
    A method according to any one of the preceding claims, wherein the reference vocabulary in the store means contains dynamic time warping templates of the discriminative features.
    A method according to any one of the preceding claims, wherein syntax restriction on the reference vocabulary is performed according to the syntax of previously identified words.
    A method of speech recognition substantially as hereinbefore described with reference to the accompanying drawings.- Apparatus for performing a method according to any one of the preceding claims.
    10.
    Speech recognition apparatus substantially as hereinbefore described with reference to the accompanying drawings.
    11.
    Any novel feature or combination of features as hereinbefore described.
    Published 1990atThePatent Office, Statc-.l-wouse.66'71 High Holborn, London WC1R 4TP. Further copies maybe obtainedfrom The Patentoface. Gales Branch, St Maiy Cray, Orpington, Kent BR5 3RD. Printed by Multiplex techniques ltd, St Mary Cray, Kent, Con. 1187
GB9010291A 1989-05-18 1990-05-08 Speech recognition Expired - Lifetime GB2231698B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB898911461A GB8911461D0 (en) 1989-05-18 1989-05-18 Temperature adaptors

Publications (3)

Publication Number Publication Date
GB9010291D0 GB9010291D0 (en) 1990-06-27
GB2231698A true GB2231698A (en) 1990-11-21
GB2231698B GB2231698B (en) 1993-07-28

Family

ID=10656978

Family Applications (2)

Application Number Title Priority Date Filing Date
GB898911461A Pending GB8911461D0 (en) 1989-05-18 1989-05-18 Temperature adaptors
GB9010291A Expired - Lifetime GB2231698B (en) 1989-05-18 1990-05-08 Speech recognition

Family Applications Before (1)

Application Number Title Priority Date Filing Date
GB898911461A Pending GB8911461D0 (en) 1989-05-18 1989-05-18 Temperature adaptors

Country Status (4)

Country Link
JP (1) JPH0315898A (en)
DE (1) DE4012337A1 (en)
FR (1) FR2647249B1 (en)
GB (2) GB8911461D0 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2258311A (en) * 1991-07-27 1993-02-03 Nigel Andrew Dodd Monitoring a plurality of parameters
FR2695246A1 (en) * 1992-08-27 1994-03-04 Gold Star Electronics Speech recognition system.
EP0618566A1 (en) * 1993-03-29 1994-10-05 Alcatel SEL Aktiengesellschaft Noise reduction for speech recognition
EP0623914A1 (en) * 1993-05-05 1994-11-09 CSELT Centro Studi e Laboratori Telecomunicazioni S.p.A. Speaker independent isolated word recognition system using neural networks
JP3078279B2 (en) 1998-05-07 2000-08-21 クセルト−セントロ・ステユデイ・エ・ラボラトリ・テレコミニカチオーニ・エツセ・ピー・アー Method and apparatus for speech recognition using neural network and Markov model recognition technology
EP2919429A4 (en) * 2012-12-04 2015-12-09 Zte Corp Mobile terminal with built-in voice short message search function and search method therefor

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE19839466A1 (en) 1998-08-29 2000-03-09 Volkswagen Ag Method and control device for operating technical equipment of a vehicle

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2258311A (en) * 1991-07-27 1993-02-03 Nigel Andrew Dodd Monitoring a plurality of parameters
GB2258311B (en) * 1991-07-27 1995-08-30 Nigel Andrew Dodd Apparatus and method for monitoring
FR2695246A1 (en) * 1992-08-27 1994-03-04 Gold Star Electronics Speech recognition system.
EP0618566A1 (en) * 1993-03-29 1994-10-05 Alcatel SEL Aktiengesellschaft Noise reduction for speech recognition
US5583968A (en) * 1993-03-29 1996-12-10 Alcatel N.V. Noise reduction for speech recognition
EP0623914A1 (en) * 1993-05-05 1994-11-09 CSELT Centro Studi e Laboratori Telecomunicazioni S.p.A. Speaker independent isolated word recognition system using neural networks
US5566270A (en) * 1993-05-05 1996-10-15 Cselt-Centro Studi E Laboratori Telecomunicazioni S.P.A. Speaker independent isolated word recognition system using neural networks
JP3078279B2 (en) 1998-05-07 2000-08-21 クセルト−セントロ・ステユデイ・エ・ラボラトリ・テレコミニカチオーニ・エツセ・ピー・アー Method and apparatus for speech recognition using neural network and Markov model recognition technology
EP2919429A4 (en) * 2012-12-04 2015-12-09 Zte Corp Mobile terminal with built-in voice short message search function and search method therefor
US9992321B2 (en) 2012-12-04 2018-06-05 Zte Corporation Mobile terminal with a built-in voice message searching function and corresponding searching method

Also Published As

Publication number Publication date
DE4012337A1 (en) 1990-11-22
FR2647249B1 (en) 1993-07-09
GB8911461D0 (en) 1989-07-05
JPH0315898A (en) 1991-01-24
GB2231698B (en) 1993-07-28
GB9010291D0 (en) 1990-06-27
FR2647249A1 (en) 1990-11-23

Similar Documents

Publication Publication Date Title
Junqua et al. Robustness in automatic speech recognition: fundamentals and applications
US5791904A (en) Speech training aid
US5228087A (en) Speech recognition apparatus and methods
Rosenberg Automatic speaker verification: A review
CN112466326B (en) Voice emotion feature extraction method based on transducer model encoder
US4661915A (en) Allophone vocoder
US5278911A (en) Speech recognition using a neural net
JPH05232984A (en) Reference pattern forming method for voice analysis
US4424415A (en) Formant tracker
Karlsson et al. Speaker verification with elicited speaking styles in the VeriVox project
EP1005019A3 (en) Segment-based similarity measurement method for speech recognition
GB2231698A (en) Speech recognition
GB2230370A (en) Speech recognition
Cheng et al. Performance evaluation of front-end processing for speech recognition systems
CA1232686A (en) Speech recognition
Sawai et al. Spotting Japanese CV-syllables and phonemes using the time-delay neural networks
Naik et al. Evaluation of a high performance speaker verification system for access control
Ainsworth Mechanisms of selective feature adaptation
US6871177B1 (en) Pattern recognition with criterion for output from selected model to trigger succeeding models
Chistovich et al. Identification of one-and two-formant steady-state vowels: A model and experiments
JPH0449955B2 (en)
Wilpon et al. A modified K‐means clustering algorithm for use in speaker‐independent isolated word recognition
JPH07210197A (en) Method of identifying speaker
Gazdag A method of decoding speech
SU1675936A1 (en) Method for verification of speaker

Legal Events

Date Code Title Description
732E Amendments to the register in respect of changes of name or changes affecting rights (sect. 32/1977)
PE20 Patent expired after termination of 20 years

Expiry date: 20100507