GB2137791A - Noise Compensating Spectral Distance Processor - Google Patents

Noise Compensating Spectral Distance Processor Download PDF

Info

Publication number
GB2137791A
GB2137791A GB08233119A GB8233119A GB2137791A GB 2137791 A GB2137791 A GB 2137791A GB 08233119 A GB08233119 A GB 08233119A GB 8233119 A GB8233119 A GB 8233119A GB 2137791 A GB2137791 A GB 2137791A
Authority
GB
United Kingdom
Prior art keywords
noise
spectrum
spectra
distance
template
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
GB08233119A
Other versions
GB2137791B (en
Inventor
John Scott Bridle
Richard Martin Chamberlain
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
UK Secretary of State for Defence
Original Assignee
UK Secretary of State for Defence
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by UK Secretary of State for Defence filed Critical UK Secretary of State for Defence
Priority to GB08233119A priority Critical patent/GB2137791B/en
Publication of GB2137791A publication Critical patent/GB2137791A/en
Application granted granted Critical
Publication of GB2137791B publication Critical patent/GB2137791B/en
Expired legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Noise Elimination (AREA)

Abstract

A spectral distance processor for preparing an input speech spectrum and a template spectrum for comparison, as for example in pattern matching by spectral distance computation, has means for masking the input spectrum (x(f)) with an input noise spectrum estimate (m(f)), means for masking the template spectrum (y(f)) with a template noise spectrum estimate (n(f)) to give masked spectra (x(f), y(f)), and means for marking samples of each masked spectrum with a noise mark (MI, MT), for example 1(speech) or 0(noise), dependent upon whether the sample is estimated to be due to speech or noise. Such noise marked spectra may then be used in spectral distance pattern recognition algorithms. The noise mark may be used to adjust normalisation applied to the spectra before comparison or to recognize that a spectral distance may be due to noise by substituting a default distance for the actual distance should the greater of the masked spectrum samples be marked as noise. <IMAGE>

Description

SPECIFICATION Noise Compensating Spectral Distance Processor This invention relates to spectral distance processors and in particular to spectral distance processors for comparing spectra taken from speech in the presence of background noise.
Speech can be represented as a sequence of spectra which are measures of power at various frequencies. In a speech recognition system spectra from unknown input words are compared with spectra from known templates or references.
An important practical problem in automatic speech recognition is dealing with interfering noise, such as background noise, non-speech sounds made by a speaker and intrucive sounds of short duration such as a door slamming. In general input and template spectra will be obtained in different noise environments to compound the problem of comparison.
In order to provide speech recognition in the presence of noise the technique of noise masking has been proposed. The basis of the technique is to mask those parts of the spectrum which are thought to be due to noise and to leave unchanged those parts of the spectrum estimated to be speech. Both input and template spectra are masked with respect to a spectrum made up of maximum values of an input noise spectrum estimate and a template noise spectrum estimate.
In this way spectral distance between input and template may be calculated as though input and template speech signals were obtained in the same noise background.
Unfortunately known masking techniques have a number of draw backs. In particular the presence of a high noise level in one spectrum can be cross coupled to mask speech signals in the other. Four spectra are required in the spectral distance calculations, making any implementation extremely computation intensive and limiting the practicality of the technique for automated speech recognition.
According to the present invention a spectral distance processor for preparing an input spectrum and a template spectrum for comparison includes: means for masking the input spectrum with respect to an input noise spectrum estimate, means for masking the template spectrum with respect to a template noise spectrum estimate, and means for marking samples of each masked spectrum dependent upon whether the sample is due to noise or speech.
The masked spectra may be used for spectral distance calculations in accordance with known and documented principles. Advantageously the spectra may be normalised before distance calculations are performed.
In a preferred form of the present invention, where the greater of the masked spectral samples is marked to be due to noise a default noise distance is assigned in place of the distance between the two masked spectra.
In an alternative form, each spectral sample is marked with a weighting, dependent upon the likelihood of that sample being due to signal and not noise.
A developed version of the present invention for speech recognition advantageously included in a speech recognition system.
In order that features and advantages of the present invention may be appreciated examples will now be described with reference to the accompanying diagrammatic drawings, of which: Figure 1 represents prior art noise masking; Figure 2 represents prior art noise masking; Figure 3 represents noise masking in accordance with the present invention, and Figure 4 represents noise masking in accordance with the present invention.
In the examples considered two spectra for comparison are referred to as the input and template spectra and their log power spectra denoted by x(f) and y(f) respectively, where f is frequency. Estimates of the spectra of the background noise in the input and template are denoted by m(f) and n(f) respectively. In the figures the spectra are drawn as continuous functions, but in practice we would be typically dealing with the outputs from a bank of band pass filter analysis channels.
In order that the background to the present invention may be appreciated examples of prior art noise masking will now be considered. A detailed account has been given by D. H. Klatt in "A digital filter bank for spectral matching", (Proc Int Conf Acoustics, Speech and Signal Processing, pp 573-576, April 1976).
Figure 1 illustrates the prior art procedure for the case of two identical underlying spectra in different noise backgrounds.
From the two noise estimates, a noise spectrum mask is calculated by: N(f)=Max (m(f), n(f)) The input and template spectra are then masked by the composite noise spectrum to produce the modified spectra: X(f)=Max (x(f), N(f)) Y(f)=Max (y(f), N(f)) The intention is to make new input and template spectra which appear to have the same noise background and so that they can be compared directly using the standard distance calculation.
Figure 1 shows that the method has indeed produced two similar spectra for comparison, X(f), Y(f).
There are problems with the above method. A theoretical problem is that, due to effectively masking one spectrum with the noise estimate of the other spectrum, meaningful differences between the two spectra which were apparent may be hidden. For instance, it there is high background noise in one signal, then the masking of the other signal may lessen the difference seen in the data. This can happen because the level of power in the two spectra is different, even though it is only the shape of the two spectra that we want to compare. A practical problem is that the calculation of the noise-masked distance requires four spectra, and the spectrum distance is the most computation-intensive operation in a pattern-matching speech recogniser.
In another example of the method (Figure 2), the technique fails to provide spectra suitable for comparison (X2(f), Y2(f)) since a high level of noise in the input spectrum (M2(f)) is coupled via noise mask N2(f) to masked template Y2(f).
In accordance with the present invention an input spectrum x(f) (Fig. 3) is masked with an estimate of input noise m(f) to give a masked input X(f) such that: X(f)=max (x(f), m(f)) A template spectrum y(f) is similarly masked with noise estimate n(f) to give a masked template Y(f) such that: Y(f)=max (y(f), n(f)) It will be appreciated that if background noise is stationary then masking will have little effect.
The masking will however be useful in fluctuating or high noise level conditions. It will further be appreciated that cross-coupling of noise via the masking process cannot occur.
During the masking operations noise marks M and MT are associated with the masked spectra X(f), Y(f) respectively according to whether the value arose from noise (noise mark 1) or speech (noise mark 0) and taken into account during spectral distance calculations on X(f) and Y(f).
The way in which masked spectra X(f) and Y(f) may be compared will now be described graphically with reference to Fig. 4.
The input 40 and template 41 spectra are plotted on the same axes and the parts of the spectra that are considered to be noise (noise mark 1) are drawn in dashed lines, while the soiid lines represent the parts of the spectra that are thought to be speech. It will be appreciated that the noise spectra are no longer required for this distance calculation.
The usual distance function is denoted by F(X-Y) e.g. F(X-Y)=(X-Y)2.
A spectral distance calculation, modified to include information about the noise may now be performed as follows: If, at any frequency channel, the larger of X(f) and Y(f) is due to the noise, as in Regions 1 and 3 of Figure 4 then the channel distance, D, is given by D=D* (a) where D* is a default noise distance. In this case nothing can be deduced about the difference between the two spectra at this frequency channel. Instead of assigning a zero value (which denotes a perfect match) to the distance for such a channel, D is given the non-zero value D*. In this way a perfect match can only be found between spectra that are identical after normalisations and not from a comparison of two spectra that are just noise.
If the maximum of X(f) and Y(f) is due to the signal, as in Regions 2, 4 and 5 of Figure 4, then the channel distance is given by D=F(X(f), Y(f)) (b) It will be realised that this equation uses all the available information from the channel because, even if the lower level is due to noise, the difference between the two signal levels must be at least that in (b). In the special case when the higher level is due to signal, the lower level is due to noise and the value of D from (b) is less than D*, then we assign the value D* as the distance for that channel. The distance between the two spectra can now be found by adding together the values of D from (a) or (b) for each channel.
This algorithm may be implemented simply in hardware since after the spectra have been marked for signal or noise, all that has to be stored is the decision of the marking for each channel and this only requires one bit. Thus the noise compensation is not just part of the acoustic analysis but also an integral part of the distance calculation.
Before the spectrum distance is calculated various spectral normalisations may advantageously be applied. Usually an amplitude normalisation is carried out by subtracting a proportion of the means from the two spectra.
However, even the amplitude normalisation may be adversely affected by the background in that the estimates of the mean may be distorted by the noise in some channels. The most comprehensive method of applying the amplitude normalisation in the present invention is to calculate the estimate of the mean from those channels which are considered to be speech in both the template and the input. This involves a significant amount of computation and this can be reduced considerably by subtracting off a proportion of the peak channel level, which should be due to the speech.
A generalised form of the present invention will now be described in which each masked spectrum sample is marked with a weight, which is an indication of the likelihood that the channel value is due to signal rather than noise.
The weight is found by comparing the channel value with the noise estimate. Letting the weights for the input and template be denoted by WX(f) and WY(f), these weights are associated with the masked spectra and can be used in the various spectral normalisations. By making this extension to the invention the implementation now requires more than one bit for the weight for each channel.
The spectrum distance calculation is then modified so that the channel distance is weighted between the normal distance and the default noise distance: D=W(f) F(X(f), Y(f))+(l--W(f)) (c) where W(f) is the weight of the higher signal, that is W(f)=VVX(f) if X(f) > Y(f), or VV(f)=VVY(f) if X(f) < Y(f) Again the spectrum distance is just the sum over all the channels of all the values of D from (c).
In this way the distance calculation is now continuous as the noise level rises, since the weights adjust gradually to changes in the estimates of the noise level. However, the distance is still discontinuous when the input and template values are nearly equal and their weights are different. This can be simply solved by introducing a slight change so that, when the channel values of the input and template are nearly equal, both WX(f) and WY(f) are used in calculating D.
Though this continuous version of the method does not require a hard decision about the signal, it does require extra storage for the weights and more computation to use them. A compromise can be taken by using just a few bits for the weights, for instance two bits may be adequate.
This reta.ins the advantages of the continuous version of the method without adding too much to the storage and computation requirements.
CLAIMS (Filed on 18/11/83) The matter for which the applicant seeks protection is: 1. A spectral distance processor for preparing an input spectrum and a template spectrum for comparison including means for masking the input spectrum with respect to an input noise spectrum estimate, means for masking the template spectrum with respect to a template noise spectrum estimate, and means for marking samples of each masked spectrum dependent upon whether the sample is due to noise or speech.
2. A spectral distance processor as claimed in claim 1 and including means for performing spectral distance calculations to compare the masked spectra.
3. A spectral distance processor as claimed in claim 2 and including means for normalising the spectra before comparison.
4. A spectral distance processor as claimed in claim 2 or claim 3 and including means for assigning a default distance in place of the calculated distance whenever the greater of the masked spectrum samples is marked as due to noise.
5. A spectral distance processor as claimed in any preceding claim and including a single bit of store for storing the noise mark associated with each sample.
6. A spectral distance processor as claimed in any of claims 1 to 4 and wherein the noise mark associated with each sample is a weighting dependent upon the likelihood of that sample being due to speech not noise.
7. A spectral distance processor as claimed in claim 6 and including storage bits for storing a noise mark weighting associated with each sample and wherein the number of storage bits is less than the number of bits required to fully specify the weighting.
8. A speech recognition system including a spectral distance processor as claimed in any

Claims (1)

  1. preceding claim.
    9. A speech recognition system as claimed in claim 8 and including a plurality of frequency restricted channels, each channel having a spectral distance processor as claimed in any of claims 1 to 7.
    10. A speech recognition system as claimed in claim 9 and including means for normalising the spectra in each channel with respect to an estimate of the mean from those channels which are marked to be speech in both input and template spectra.
    11. A spectral distance processor substantially as herein described with reference to Figs. 3 and 4 of the drawings.
GB08233119A 1982-11-19 1982-11-19 Noise compensating spectral distance processor Expired GB2137791B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
GB08233119A GB2137791B (en) 1982-11-19 1982-11-19 Noise compensating spectral distance processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB08233119A GB2137791B (en) 1982-11-19 1982-11-19 Noise compensating spectral distance processor

Publications (2)

Publication Number Publication Date
GB2137791A true GB2137791A (en) 1984-10-10
GB2137791B GB2137791B (en) 1986-02-26

Family

ID=10534385

Family Applications (1)

Application Number Title Priority Date Filing Date
GB08233119A Expired GB2137791B (en) 1982-11-19 1982-11-19 Noise compensating spectral distance processor

Country Status (1)

Country Link
GB (1) GB2137791B (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0216118A2 (en) * 1985-08-26 1987-04-01 International Standard Electric Corporation New York Noise compensation in speech recognition apparatus
WO1987003995A1 (en) * 1985-12-20 1987-07-02 Bayerische Motoren Werke Aktiengesellschaft Process for speech recognition in a noisy environment
GB2186726A (en) * 1986-02-15 1987-08-19 Smith Ind Plc Speech recognition apparatus
EP0240329A2 (en) * 1986-04-04 1987-10-07 National Research Development Corporation Noise compensation in speech recognition
EP0240330A2 (en) * 1986-04-04 1987-10-07 National Research Development Corporation Noise compensation in speech recognition
EP0255529A1 (en) * 1986-01-06 1988-02-10 Motorola, Inc. Frame comparison method for word recognition in high noise environments
GB2196460A (en) * 1986-10-03 1988-04-27 Ricoh Kk Voice recognition
US4860360A (en) * 1987-04-06 1989-08-22 Gte Laboratories Incorporated Method of evaluating speech
US4918732A (en) * 1986-01-06 1990-04-17 Motorola, Inc. Frame comparison method for word recognition in high noise environments
WO1991011696A1 (en) * 1990-02-02 1991-08-08 Motorola, Inc. Method and apparatus for recognizing command words in noisy environments
EP0458615A2 (en) * 1990-05-22 1991-11-27 Nec Corporation Speech recognition method with noise reduction and a system therefor
WO1994017515A1 (en) * 1993-01-29 1994-08-04 Telefonaktiebolaget Lm Ericsson Method and apparatus for encoding/decoding of background sounds
WO1994028542A1 (en) * 1993-05-26 1994-12-08 Telefonaktiebolaget Lm Ericsson Discriminating between stationary and non-stationary signals
WO1995012879A1 (en) * 1993-11-02 1995-05-11 Telefonaktiebolaget Lm Ericsson Discriminating between stationary and non-stationary signals
US5781640A (en) * 1995-06-07 1998-07-14 Nicolino, Jr.; Sam J. Adaptive noise transformation system
GB2330677A (en) * 1997-10-21 1999-04-28 Lothar Rosenbaum Phonetic control apparatus
WO2000075918A1 (en) * 1999-06-07 2000-12-14 Telefonaktiebolaget Lm Ericsson (Publ) Weighted spectral distance calculator
GB2374967B (en) * 2001-04-17 2005-06-01 Symbol Technologies Inc Arrangement for and method of establishing a logical relationship among peripherals in a wireless local area network

Cited By (38)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0216118A2 (en) * 1985-08-26 1987-04-01 International Standard Electric Corporation New York Noise compensation in speech recognition apparatus
EP0216118A3 (en) * 1985-08-26 1988-07-06 International Standard Electric Corporation New York Noise compensation in speech recognition apparatus
WO1987003995A1 (en) * 1985-12-20 1987-07-02 Bayerische Motoren Werke Aktiengesellschaft Process for speech recognition in a noisy environment
EP0231490A1 (en) * 1985-12-20 1987-08-12 Bayerische Motoren Werke Aktiengesellschaft, Patentabteilung AJ-3 Method for speech recognition in a noisy environment
EP0255529A1 (en) * 1986-01-06 1988-02-10 Motorola, Inc. Frame comparison method for word recognition in high noise environments
EP0255529A4 (en) * 1986-01-06 1988-06-08 Motorola Inc Frame comparison method for word recognition in high noise environments.
US4918732A (en) * 1986-01-06 1990-04-17 Motorola, Inc. Frame comparison method for word recognition in high noise environments
GB2186726A (en) * 1986-02-15 1987-08-19 Smith Ind Plc Speech recognition apparatus
EP0240330A2 (en) * 1986-04-04 1987-10-07 National Research Development Corporation Noise compensation in speech recognition
US4905286A (en) * 1986-04-04 1990-02-27 National Research Development Corporation Noise compensation in speech recognition
EP0240329A2 (en) * 1986-04-04 1987-10-07 National Research Development Corporation Noise compensation in speech recognition
EP0240330A3 (en) * 1986-04-04 1988-07-27 National Research Development Corporation Noise compensation in speech recognition
EP0240329A3 (en) * 1986-04-04 1988-07-27 National Research Development Corporation Noise compensation in speech recognition
US5033089A (en) * 1986-10-03 1991-07-16 Ricoh Company, Ltd. Methods for forming reference voice patterns, and methods for comparing voice patterns
GB2233137A (en) * 1986-10-03 1991-01-02 Ricoh Kk Voice recognition
GB2196460B (en) * 1986-10-03 1991-05-15 Ricoh Kk Methods for comparing an input voice pattern with a registered voice pattern and voice recognition systems
GB2233137B (en) * 1986-10-03 1991-06-05 Ricoh Kk Methods for forming registered voice patterns for use in pattern comparison in pattern recognition
GB2196460A (en) * 1986-10-03 1988-04-27 Ricoh Kk Voice recognition
US4860360A (en) * 1987-04-06 1989-08-22 Gte Laboratories Incorporated Method of evaluating speech
WO1991011696A1 (en) * 1990-02-02 1991-08-08 Motorola, Inc. Method and apparatus for recognizing command words in noisy environments
EP0458615A2 (en) * 1990-05-22 1991-11-27 Nec Corporation Speech recognition method with noise reduction and a system therefor
EP0458615A3 (en) * 1990-05-22 1992-03-04 Nec Corporation Speech recognition method with noise reduction and a system therefor
AU644875B2 (en) * 1990-05-22 1993-12-23 Nec Corporation Speech recognition method with noise reduction and a system therefor
AU666612B2 (en) * 1993-01-29 1996-02-15 Telefonaktiebolaget Lm Ericsson (Publ) Method and apparatus for encoding/decoding of background sounds
US5632004A (en) * 1993-01-29 1997-05-20 Telefonaktiebolaget Lm Ericsson Method and apparatus for encoding/decoding of background sounds
WO1994017515A1 (en) * 1993-01-29 1994-08-04 Telefonaktiebolaget Lm Ericsson Method and apparatus for encoding/decoding of background sounds
US5579432A (en) * 1993-05-26 1996-11-26 Telefonaktiebolaget Lm Ericsson Discriminating between stationary and non-stationary signals
WO1994028542A1 (en) * 1993-05-26 1994-12-08 Telefonaktiebolaget Lm Ericsson Discriminating between stationary and non-stationary signals
AU672934B2 (en) * 1993-11-02 1996-10-17 Telefonaktiebolaget Lm Ericsson (Publ) Discriminating between stationary and non-stationary signals
US5579435A (en) * 1993-11-02 1996-11-26 Telefonaktiebolaget Lm Ericsson Discriminating between stationary and non-stationary signals
WO1995012879A1 (en) * 1993-11-02 1995-05-11 Telefonaktiebolaget Lm Ericsson Discriminating between stationary and non-stationary signals
US5781640A (en) * 1995-06-07 1998-07-14 Nicolino, Jr.; Sam J. Adaptive noise transformation system
GB2330677A (en) * 1997-10-21 1999-04-28 Lothar Rosenbaum Phonetic control apparatus
WO2000075918A1 (en) * 1999-06-07 2000-12-14 Telefonaktiebolaget Lm Ericsson (Publ) Weighted spectral distance calculator
JP2003501701A (en) * 1999-06-07 2003-01-14 テレフオンアクチーボラゲット エル エム エリクソン(パブル) Weighted spectral distance calculator
AU766857B2 (en) * 1999-06-07 2003-10-23 Telefonaktiebolaget Lm Ericsson (Publ) Weighted spectral distance calculator
US6983245B1 (en) 1999-06-07 2006-01-03 Telefonaktiebolaget Lm Ericsson (Publ) Weighted spectral distance calculator
GB2374967B (en) * 2001-04-17 2005-06-01 Symbol Technologies Inc Arrangement for and method of establishing a logical relationship among peripherals in a wireless local area network

Also Published As

Publication number Publication date
GB2137791B (en) 1986-02-26

Similar Documents

Publication Publication Date Title
GB2137791A (en) Noise Compensating Spectral Distance Processor
US7155386B2 (en) Adaptive correlation window for open-loop pitch
CA2034354C (en) Signal processing device
US4516215A (en) Recognition of speech or speech-like sounds
US4905286A (en) Noise compensation in speech recognition
US4918732A (en) Frame comparison method for word recognition in high noise environments
KR100192854B1 (en) Method for spectral estimation to improve noise robustness for speech recognition
US5963904A (en) Phoneme dividing method using multilevel neural network
US4665548A (en) Speech analysis syllabic segmenter
EP0459215B1 (en) Voice/noise splitting apparatus
US7475012B2 (en) Signal detection using maximum a posteriori likelihood and noise spectral difference
US4853958A (en) LPC-based DTMF receiver for secondary signalling
GB2188763A (en) Noise compensation in speech recognition
KR100308028B1 (en) method and apparatus for adaptive speech detection and computer-readable medium using the method
Krishnamachari et al. Use of local kurtosis measure for spotting usable speech segments in co-channel speech
US4161625A (en) Method for determining the fundamental frequency of a voice signal
CA1301338C (en) Frame comparison method for word recognition in high noise environments
Ney An optimization algorithm for determining the endpoints of isolated utterances
Varga et al. Control experiments on noise compensation in hidden Markov model based continuous word recognition
Yang et al. Deep learning-based speech presence probability estimation for noise PSD estimation in single-channel speech enhancement
JPH04369698A (en) Voice recognition system
Ferrara A method for cancelling interference from a constant envelope signal
KR102358151B1 (en) Noise reduction method using convolutional recurrent network
US6961718B2 (en) Vector estimation system, method and associated encoder
US6993478B2 (en) Vector estimation system, method and associated encoder

Legal Events

Date Code Title Description
732E Amendments to the register in respect of changes of name or changes affecting rights (sect. 32/1977)
732E Amendments to the register in respect of changes of name or changes affecting rights (sect. 32/1977)
PE20 Patent expired after termination of 20 years

Effective date: 20021118