WO1991003042A1 - A method and an apparatus for classification of a mixed speech and noise signal - Google Patents
A method and an apparatus for classification of a mixed speech and noise signal Download PDFInfo
- Publication number
- WO1991003042A1 WO1991003042A1 PCT/DK1990/000214 DK9000214W WO9103042A1 WO 1991003042 A1 WO1991003042 A1 WO 1991003042A1 DK 9000214 W DK9000214 W DK 9000214W WO 9103042 A1 WO9103042 A1 WO 9103042A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- signal
- speech
- envelopes
- synchronism
- noise
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 29
- 238000013528 artificial neural network Methods 0.000 claims description 3
- 230000009183 running Effects 0.000 claims description 3
- 230000000875 corresponding effect Effects 0.000 description 6
- 238000001914 filtration Methods 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 230000000737 periodic effect Effects 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 230000001629 suppression Effects 0.000 description 3
- 230000002596 correlated effect Effects 0.000 description 2
- 230000003340 mental effect Effects 0.000 description 2
- 229920000136 polysorbate Polymers 0.000 description 2
- 101000822695 Clostridium perfringens (strain 13 / Type A) Small, acid-soluble spore protein C1 Proteins 0.000 description 1
- 101000655262 Clostridium perfringens (strain 13 / Type A) Small, acid-soluble spore protein C2 Proteins 0.000 description 1
- 125000000205 L-threonino group Chemical group [H]OC(=O)[C@@]([H])(N([H])[*])[C@](C([H])([H])[H])([H])O[H] 0.000 description 1
- 101000655256 Paraclostridium bifermentans Small, acid-soluble spore protein alpha Proteins 0.000 description 1
- 101000655264 Paraclostridium bifermentans Small, acid-soluble spore protein beta Proteins 0.000 description 1
- 230000005534 acoustic noise Effects 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000029058 respiratory gaseous exchange Effects 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
Definitions
- the invention concerns a method and an apparatus for classification of a mixed speech and noise signal as being significantly or insignificantly affected by the speech signal.
- the time intervals where the mixed signal is insignifi ⁇ cantly affected by the speech signal may be used for forming a running estimate of the noise signal with known methods, it being possible to suppress the noise on the basis of this estimate.
- the invention may be used in electroacustic systems for transmission and signal processing of speech signals (e.g. mobile telephones, speech recognition systems and hearing aids), where it is endeavoured to eliminate or reduce de- gradation of speech quality, speech recognition and speech perception because of present background noise using noise suppressing and/or speech enhancing methods.
- speech signals e.g. mobile telephones, speech recognition systems and hearing aids
- Electroacustic systems for transmission and signal pro- cessing of speech signals exist in numerous types and for many different purposes.
- the expansive development in the field of digital electronics, including particularly the digital signal processors, has made it possible to employ a plurality of methods not practically useful before in connection with removing or suppressing, in real time, the background noise, which occurs either acoustically simul ⁇ taneously with the speech signal (e.g. in a helicopter cockpit where machine and rotor noise affects the acoustic communication from the pilot) or as an electric signal, equivalent therewith, in the transmission system itself.
- Such methods are known from the literature and are called noise suppression or speech enhancement methods. Of these methods may be mentioned adaptive filtering and spectral subtraction. See e.g. (1) and (7).
- the aim of improving the signal/noise ratio is that the methods are to counteract the degradation of the reception caused by the noise and the intelligibility of the transmitted speech signal.
- Several of the known methods are based on a run- ning estimate of the statistic characteristics of the background noise, e.g. intensity and frequency content.
- the characteristics of the noise may be estimated by suitable signal analysis. Assuming a certain stationarity of the background noise this estimate may be used for adjusting the noise suppression or speech en ⁇ hancement method until the next time the noise can be estimated.
- the energy histogram and valley detector prin ⁇ ciples In particular two of the known principles should be men ⁇ tioned: the energy histogram and valley detector prin ⁇ ciples.
- a noise suppression method (3) use of the valley detector method is reported for pointing out the time intervals in which a mixed speech and noise signal exclusively consists of background noise (i.e. corresponding to pauses in the speech signal).
- the method is incorporated in a type of feedback loop by acting on the individual frequency bands of the output signal and with the purpose of increasing the field of use of the speech/noise detector.
- none of the known speech and pause detectors are particularly robust when the speech signal is subjected to e.g. considerable reverberation, or when the background noise is added in a poor signal/noise ratio (less than 0 dB) or has a speech-like nature, i.e. resembles the speech signal from one or more speakers. In these cases the detection will be less certain with known methods. It has been attempted to reduce this problem by using a priori knowledge about the speech and noise signals. It has thus been utilized in (1) and (2) that the amplitude fluctuations in speech and noise are different in certain cases. When, however, the noise is speech-like, this difference will be marginal.
- a speech signal includes a plu ⁇ rality of time segments where the speech signal contri ⁇ butes only insignificantly to the mixed signal.
- Such seg ⁇ ments are not just speech pauses (between words and sen- tences, breathing), but in particular also very short in ⁇ tervals, typically within a word where the speech signal assumes a value so that it just contributes insignifi ⁇ cantly to the mixed signal.
- These segments are detected, and it is possible- on the basis of this to update para ⁇ meters for the background noise. This is done with unpre- cedented frequency and can therefore form the basis for a considerably more precise estimate of the background noise.
- the energy can assume relatively great values in short time intervals, corresponding to some of the voiced sounds (e.g. the open vowels) as well as some of the consonants (the fricatives and the plosives). Therefore, the signal/noise ratio will be relatively great in time segments containing these speech sounds, and these segments are thus particularly useful for detecting pre ⁇ sence of speech in background noise.
- the reason why the energy is great in the mentioned speech sounds is the following:
- a vowel may be described as a (quasi)periodic time signal which in terms of frequency consists of a funda ⁇ mental frequency and its harmonics, whereby the speech energy simultaneously occurs in a larger frequency range.
- a fricative and/or a plosive may be described as a short, noise-like time signal where the energy simul ⁇ taneously occurs in a wide frequency range.
- the frequency range of the speech signal is suitably divided into a plu ⁇ rality of frequency bands, and it thus applies that for each of the two types of speech sounds the energy occurs with a certain simultaneousness between the frequency bands.
- the envelope of a frequency restricted subsignal containing two or more consecutive harmonic frequencies will always be periodic and substantially synchronous with the fundamental frequency, since the envelope represents a beat signal with a frequency equal to the difference be ⁇ tween the two harmonics, which is precisely equal to the fundamental frequency. Since it is the same frequency, viz. the fundamental frequency of the speech signal, for all the subsignals which causes the beat signal which is detected by envelopment, the envelopes of the subsignals will substantially be synchonous or correlated with each other.
- each subsignal has a frequency band width which always comprises at least two harmonic frequencies. This is obtained with a band width of at least twice the fundamental frequency. If the fundamental frequency is e.g. 220 Hz, the band width must at least be 440 Hz.
- fig. 1 is a block diagram schematically showing an appa ⁇ ratus according to the invention
- fig. 2 shows an example of an input signal consisting of a portion of a speech signal without noise, and how this signal is processed in the apparatus in fig. 1,
- fig. 2A shows the input signal
- fig. 2B shows the frequency limited subsignals originating from filtering of the input signal
- fig « 2C shows the envelope signals corresponding to the subsignals in fig. 2B
- fig. 2D shows the synchonism signal from the synchronism detector as well as a threshold value with which it is compared
- fig. 2E shows the final classification signal from the threshold detector.
- an electric input signal 101 consisting of a speech signal mixed with a noise signal (trafic noise, cafeteria noise, speech from other persons or the like) is passed to a filter bank 102 consisting of a plurality of optionally overlapping bandpass filters with increasing center frequency and covering in combination the entire frequency range of the speech signal or part thereof.
- Each bandpass filter has a band width greater than twice the greatest expected value of the fundamental frequency of the speech signal,- so that a subsignal 103 comprising at least two consecutive harmonic frequencies to the funda- mental frequency can pass through each bandpass filter.
- the subsignals are passed to their respective envelope detectors 104, which form the time envelopes 105 for the subsignals 103 e.g. by means of rectification, squaring or analytical signals as well as optional subsequent low-pass filtering.
- This signal processing, which following band ⁇ pass filtering of the input signal generates and utilizes the envelopes of the bandpass filtered subsignals is known in other connections from the acoustic/audiological field, see e.g. (6).
- the envelope signals are passed to a synchronism detector 106, which produces a measure of synchronism between the envelope signals 105 for a time segment of the signals. Then, the time course of the computed synchronism has the shape of a staircase curve and is called the synchronism signal 107.
- the principle of the synchronism detector 106 may e.g. be based on correlation, an artificial neural network or another computing method applied to all or a subset of the envelope signals 105.
- a correlation can be computed by first computing the product sum of the signal values for any pair of signals i.e. the envelope signals from two adjacent bandpass filters and then performing summation of all the computed product sums.
- the synchronism signal 107 is passed to a thres ⁇ hold detector 108 where the synchronism signal 107 is com- pared with a threshold value. If the synchronism signal
- the classification signal 109 is set to the value binary 1. If not, the classification signal 109 is set to the value binary 0.
- the overall function of the synchronism detector 106 and the threshold detector 108 may also be implemented by means of either a trained, a self-organizing or other artificial neural network using the envelope signals 105 as input signals and forming the desired classification signal 109 as output signal for classification of the mixed signal.
- Presence of a noise signal affects the classification more or less depending upon the characteristics of the noise signal. If the noise signal is stochastic, speech-like noise, the speech detection will by and large not be af ⁇ fected even with a very small signal/noise ratio. If, on the other hand, the noise signal is a signal with an in- herent modulation as a speech signal, or if it is a real speech signal from one or more persons, the interplay be ⁇ tween the actual signal/noise ratio and the construction of the threshold detector 108 will be of decisive impor ⁇ tance. When e.g.
- the threshold detector 108 is arranged such that the threshold value 210 with a given time con ⁇ stant adaptively adjusts itself corresponding to a given fraction of the size of the synchronism signal 107, then only the dominating speech signal will advantageously be detected. Removal of the lowest frequency components of the synchronism signal provides the additional advantage that a continuous noise signal consisting of harmonic fre ⁇ quency components (e.g. acoustic noise from a rotating machine), will not erroneously be classified as being a speech signal.
- Fig. 2 shows an example of how a given input signal 201 is processed in the apparatus in fig. 1. To illustrate the fundamental principle of the invention the input signal 201 is shown in fig.
- Fig. 2A shows a short speech signal without noise consisting first of a (voiced) vowel and then of an unvoiced fricative.
- Fig. 2B shows the frequency limited subsignals 203 formed in the filter bank 102.
- Fig. 2C illustrates the envelope signals 205 formed by the enve ⁇ lope detectors 104 from the subsignals 203 in fig. 2B.
- the envelope signals 205 At the vowel, the envelope signals 205 in several frequency bands are shown to be correlated with each other and modu ⁇ lated with a frequency corresponding to the fundamental frequency.
- the envelope signals 205 show that short-term energy is present simultaneously in several frequency bands.
- Fig. 2D shows the synchronism signal 207 computed from the synchronism detector 106 as well as the threshold value 210 with which it is compared.
- fig. 2E shows the obtained classification signal 209.
- An apparatus according to the invention may be implemented either in analog or digital hardware or in software or in combinations thereof.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
Abstract
For classification of a mixed speech and noise signal (101) the signal is divided into separate, frequency limited subsignals (103), each of which contains at least two harmonic frequencies for the speech signal. The envelopes (105) of the subsignals (103) are formed as well as a measure (107) of synchronism between the envelopes (105). The synchronism measure (107) is compared with a threshold value for classification of the mixed signal as being significantly or insignificantly affected by the speech signal. The classification takes place with an unpresidented frequency and can therefore form the basis for a considerably more precise estimate of the noise signal than before, in particular when this has a speech-like nature.
Description
A method and an apparatus for classification of a mixed speech and noise signal
The invention concerns a method and an apparatus for classification of a mixed speech and noise signal as being significantly or insignificantly affected by the speech signal.
The time intervals where the mixed signal is insignifi¬ cantly affected by the speech signal may be used for forming a running estimate of the noise signal with known methods, it being possible to suppress the noise on the basis of this estimate.
The invention may be used in electroacustic systems for transmission and signal processing of speech signals (e.g. mobile telephones, speech recognition systems and hearing aids), where it is endeavoured to eliminate or reduce de- gradation of speech quality, speech recognition and speech perception because of present background noise using noise suppressing and/or speech enhancing methods.
Electroacustic systems for transmission and signal pro- cessing of speech signals exist in numerous types and for many different purposes. The expansive development in the field of digital electronics, including particularly the digital signal processors, has made it possible to employ a plurality of methods not practically useful before in connection with removing or suppressing, in real time, the background noise, which occurs either acoustically simul¬ taneously with the speech signal (e.g. in a helicopter cockpit where machine and rotor noise affects the acoustic communication from the pilot) or as an electric signal, equivalent therewith, in the transmission system itself.
Such methods are known from the literature and are called noise suppression or speech enhancement methods. Of these methods may be mentioned adaptive filtering and spectral subtraction. See e.g. (1) and (7). The aim of improving the signal/noise ratio (the ratio of speech signal magni¬ tude to noise magnitude) is that the methods are to counteract the degradation of the reception caused by the noise and the intelligibility of the transmitted speech signal. Several of the known methods are based on a run- ning estimate of the statistic characteristics of the background noise, e.g. intensity and frequency content. With a speech or pause detector time segments are identi¬ fied with and without speech signal, respectively, and in the segments exclusively containing background noise (speech pauses) the characteristics of the noise may be estimated by suitable signal analysis. Assuming a certain stationarity of the background noise this estimate may be used for adjusting the noise suppression or speech en¬ hancement method until the next time the noise can be estimated.
Several methods are described in the literature for dis¬ tinguishing between voiced speech, unvoiced speech, and pauses, both without and with background noise. See e.g. ( ), (5) and (8). (9) includes i.a. a survey of the most important methods which have been used for classification of speech, in particular in connection with speech recog¬ nition systems.
In particular two of the known principles should be men¬ tioned: the energy histogram and valley detector prin¬ ciples. In a noise suppression method (3) use of the valley detector method is reported for pointing out the time intervals in which a mixed speech and noise signal exclusively consists of background noise (i.e. corresponding to pauses in the speech signal). In the
described invention the method is incorporated in a type of feedback loop by acting on the individual frequency bands of the output signal and with the purpose of increasing the field of use of the speech/noise detector.
However, none of the known speech and pause detectors are particularly robust when the speech signal is subjected to e.g. considerable reverberation, or when the background noise is added in a poor signal/noise ratio (less than 0 dB) or has a speech-like nature, i.e. resembles the speech signal from one or more speakers. In these cases the detection will be less certain with known methods. It has been attempted to reduce this problem by using a priori knowledge about the speech and noise signals. It has thus been utilized in (1) and (2) that the amplitude fluctuations in speech and noise are different in certain cases. When, however, the noise is speech-like, this difference will be marginal.
I So far, no speech detector has been developed which can operate reliably both with a poor signal/noise ratio and with speech-like noise. The object of the present inven¬ tion is therefore to provide a method and an apparatus where this problem is solved.
This object is achieved by the method stated in claim 1 and the apparatus stated in claim 8, involving detection of the time segments in a mixed speech and noise signal which are dominated by the speech signal. This is to be understood in combination with well-known knowledge, which is described below, that a speech signal includes a plu¬ rality of time segments where the speech signal contri¬ butes only insignificantly to the mixed signal. Such seg¬ ments are not just speech pauses (between words and sen- tences, breathing), but in particular also very short in¬ tervals, typically within a word where the speech signal
assumes a value so that it just contributes insignifi¬ cantly to the mixed signal. These segments are detected, and it is possible- on the basis of this to update para¬ meters for the background noise. This is done with unpre- cedented frequency and can therefore form the basis for a considerably more precise estimate of the background noise.
In a speech signal the energy can assume relatively great values in short time intervals, corresponding to some of the voiced sounds (e.g. the open vowels) as well as some of the consonants (the fricatives and the plosives). Therefore, the signal/noise ratio will be relatively great in time segments containing these speech sounds, and these segments are thus particularly useful for detecting pre¬ sence of speech in background noise. The reason why the energy is great in the mentioned speech sounds is the following:
1) A vowel may be described as a (quasi)periodic time signal which in terms of frequency consists of a funda¬ mental frequency and its harmonics, whereby the speech energy simultaneously occurs in a larger frequency range.
2) A fricative and/or a plosive may be described as a short, noise-like time signal where the energy simul¬ taneously occurs in a wide frequency range.
In the preferred embodiment of the invention the frequency range of the speech signal is suitably divided into a plu¬ rality of frequency bands, and it thus applies that for each of the two types of speech sounds the energy occurs with a certain simultaneousness between the frequency bands. Further, it is special to the vowels that since the difference between two consecutive harmonic frequencies is always equal to the fundamental frequency for the speech
signal, the envelope of a frequency restricted subsignal containing two or more consecutive harmonic frequencies will always be periodic and substantially synchronous with the fundamental frequency, since the envelope represents a beat signal with a frequency equal to the difference be¬ tween the two harmonics, which is precisely equal to the fundamental frequency. Since it is the same frequency, viz. the fundamental frequency of the speech signal, for all the subsignals which causes the beat signal which is detected by envelopment, the envelopes of the subsignals will substantially be synchonous or correlated with each other.
In order that this envelope, which is periodic with the fundamental frequency, can always be produced, it is ne¬ cessary that each subsignal has a frequency band width which always comprises at least two harmonic frequencies. This is obtained with a band width of at least twice the fundamental frequency. If the fundamental frequency is e.g. 220 Hz, the band width must at least be 440 Hz.
It is well-known from the literature, see e.g. (3), to examine a mixed speech and noise signal by division into time intervals and by splitting into a number of sub- signals by means of a filter bank consisting of bandpass filters. However, in contrast to the previously described methods, this is done in a particular manner in the pre¬ sent invention, since the invention realizes a filter bank consisting of bandpass filters with a band width which is especially dependant upon general characteristics of the speech signal, as well as a detector utilizing the corre¬ lation between the envelopes of the subsignals. Moreover, and still in contrast to the previously described methods, the aim of the present invention is not to point out the time intervals in the mixed speech and noise signal which just consist of noise (i.e. corresponding to pauses in the
speech signal), but to point out the intervals which are dominated by the speech signal.
The invention will be explained more fully by the follow- ing description of a preferred embodiment with reference to the drawing, in which
fig. 1 is a block diagram schematically showing an appa¬ ratus according to the invention,
fig. 2 shows an example of an input signal consisting of a portion of a speech signal without noise, and how this signal is processed in the apparatus in fig. 1,
fig. 2A shows the input signal,
fig. 2B shows the frequency limited subsignals originating from filtering of the input signal,
fig« 2C shows the envelope signals corresponding to the subsignals in fig. 2B,
fig. 2D shows the synchonism signal from the synchronism detector as well as a threshold value with which it is compared, and
fig. 2E shows the final classification signal from the threshold detector.
In fig. 1 an electric input signal 101 consisting of a speech signal mixed with a noise signal (trafic noise, cafeteria noise, speech from other persons or the like) is passed to a filter bank 102 consisting of a plurality of optionally overlapping bandpass filters with increasing center frequency and covering in combination the entire frequency range of the speech signal or part thereof. Each
bandpass filter has a band width greater than twice the greatest expected value of the fundamental frequency of the speech signal,- so that a subsignal 103 comprising at least two consecutive harmonic frequencies to the funda- mental frequency can pass through each bandpass filter.
The subsignals are passed to their respective envelope detectors 104, which form the time envelopes 105 for the subsignals 103 e.g. by means of rectification, squaring or analytical signals as well as optional subsequent low-pass filtering. This signal processing, which following band¬ pass filtering of the input signal generates and utilizes the envelopes of the bandpass filtered subsignals is known in other connections from the acoustic/audiological field, see e.g. (6).
The envelope signals are passed to a synchronism detector 106, which produces a measure of synchronism between the envelope signals 105 for a time segment of the signals. Then, the time course of the computed synchronism has the shape of a staircase curve and is called the synchronism signal 107.
The principle of the synchronism detector 106 may e.g. be based on correlation, an artificial neural network or another computing method applied to all or a subset of the envelope signals 105. For example, a correlation can be computed by first computing the product sum of the signal values for any pair of signals i.e. the envelope signals from two adjacent bandpass filters and then performing summation of all the computed product sums.
Finally, the synchronism signal 107 is passed to a thres¬ hold detector 108 where the synchronism signal 107 is com- pared with a threshold value. If the synchronism signal
107 is greater than the threshold value, the time segment
in question is classified as being dominated by speech, and the classification signal 109 is set to the value binary 1. If not, the classification signal 109 is set to the value binary 0.
The overall function of the synchronism detector 106 and the threshold detector 108 may also be implemented by means of either a trained, a self-organizing or other artificial neural network using the envelope signals 105 as input signals and forming the desired classification signal 109 as output signal for classification of the mixed signal.
Presence of a noise signal affects the classification more or less depending upon the characteristics of the noise signal. If the noise signal is stochastic, speech-like noise, the speech detection will by and large not be af¬ fected even with a very small signal/noise ratio. If, on the other hand, the noise signal is a signal with an in- herent modulation as a speech signal, or if it is a real speech signal from one or more persons, the interplay be¬ tween the actual signal/noise ratio and the construction of the threshold detector 108 will be of decisive impor¬ tance. When e.g. the threshold detector 108 is arranged such that the threshold value 210 with a given time con¬ stant adaptively adjusts itself corresponding to a given fraction of the size of the synchronism signal 107, then only the dominating speech signal will advantageously be detected. Removal of the lowest frequency components of the synchronism signal provides the additional advantage that a continuous noise signal consisting of harmonic fre¬ quency components (e.g. acoustic noise from a rotating machine), will not erroneously be classified as being a speech signal.
Fig. 2 shows an example of how a given input signal 201 is processed in the apparatus in fig. 1. To illustrate the fundamental principle of the invention the input signal 201 is shown in fig. 2A as a short speech signal without noise consisting first of a (voiced) vowel and then of an unvoiced fricative. Fig. 2B shows the frequency limited subsignals 203 formed in the filter bank 102. Fig. 2C illustrates the envelope signals 205 formed by the enve¬ lope detectors 104 from the subsignals 203 in fig. 2B. At the vowel, the envelope signals 205 in several frequency bands are shown to be correlated with each other and modu¬ lated with a frequency corresponding to the fundamental frequency. At the fricative, the envelope signals 205 show that short-term energy is present simultaneously in several frequency bands. Fig. 2D shows the synchronism signal 207 computed from the synchronism detector 106 as well as the threshold value 210 with which it is compared. Finally, fig. 2E shows the obtained classification signal 209.
An apparatus according to the invention may be implemented either in analog or digital hardware or in software or in combinations thereof.
References:
(1) US Patent No.- 4 025 721
(2) US Patent No. 4 185 168
(3) US Patent No. 4 630 304
(4) Cox B.V. and Timothy L.M.K. 1980. Nonparametric Rank- Order Statistics Applied to Robust Voiced-Unvoiced- Silence Classification. IEEE Trans. ASSP 28,5,550- 561.
(5) Gordos G. 1983. SPEECH DETECTION IN SEVERE NOISE. Proc. 11 ICA 91-94.
(6) Houtgast T. and Steeneken H.J.M. 1973. The modulation transfer function in room acoustics as a predictor of speech intelligibility. Acoustica, 28, 66-73.
(7) Lim J.S. 1986. SPEECH ENHANCEMENT. Proc. ICASSP 3135- 3142.
(8) McAulay R.J. and Malpass M.L. 1980. Speech Enhance- ment Using Soft-Decision Noise Suppression Filter.
IEEE Trans. ASSP 28,2,137-145.
(9) Savoji M.H. 1989. A robust algorithm for accurate endpointing of speech signals. Speech Comm. 8, 45-60.
Claims
1. A method of classifying, in a selected time interval, a mixed speech and noise signal (101, 201) as being signi¬ ficantly or insignificantly affected by the speech signal, where the mixed signal is divided into a plurality of se¬ parate, frequency limited subsignals (103, 203), c h a ¬ r a c t e r i z e d in that
- each subsignal (103, 203) comprises at least two harmo¬ nic frequencies for a fundamental frequency of the speech signal,
- the time envelope (105, 205) is generated for the sub- signals (103, 203),
a measure (107, 207) of synchronism between these enve¬ lopes (105, 205) is generated, and
this measure (107, 207) is compared with a threshold value (210).
2. A method according to claim 1, c h a r a c t e r - i z e d in that the mixed signal is divided into a plu¬ rality of time intervals in which the signal is classified successively.
3. A method according to claim 1, c h a r a c t e r - i z e d in that the selected time interval is a running time window.
4. A method according to claims 1-3, c h a r a c t e r ¬ i z e d in that all envelopes are used for generating the measure (107, 207) of synchronism between the envelopes (105, 205).
5. A method according to claims 1-3, c h a r a c t e r ¬ i z e d in that one or more subsets of the envelopes (105, 205) are used for generating the measure (107, 207) of synchronism between the envelopes (105, 205).
6. A method according to claims 1-5, c h a r a c t e r ¬ i z e d in that the generation of the measure (107, 207) of synchronism between the envelopes (105, 205) is based on a correlation computation.
7. A method according to claims 1-5, c h a r a c t e r ¬ i z e d in that the envelopes (105, 205) are passed as input signals to an artificial neural network which clas¬ sifies the signal.
8. An apparatus for classification of a mixed speech and noise signal (101, 201), comprising filter means each of which permits passage of a subsignal (103, 203), c h a ¬ r a c t e r i z e d in that
each subsignal (103, 203) contains at least two harmo¬ nic frequencies for a fundamental frequency for the speech signal, and that the apparatus moreover comprises
- means (194) for generating the time envelopes (105, 205) of the subsignals,
- means (106) for generating a measure (107, 207) of synchronism between these envelopes, as well as
- means (108) for comparing the synchronism signal (107, 207) with a given threshold value (210).
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
DK406189A DK406189A (en) | 1989-08-18 | 1989-08-18 | METHOD AND APPARATUS FOR CLASSIFYING A MIXED SPEECH AND NOISE SIGNAL |
DK4061/89 | 1989-08-18 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO1991003042A1 true WO1991003042A1 (en) | 1991-03-07 |
Family
ID=8129776
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/DK1990/000214 WO1991003042A1 (en) | 1989-08-18 | 1990-08-17 | A method and an apparatus for classification of a mixed speech and noise signal |
Country Status (2)
Country | Link |
---|---|
DK (1) | DK406189A (en) |
WO (1) | WO1991003042A1 (en) |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5406635A (en) * | 1992-02-14 | 1995-04-11 | Nokia Mobile Phones, Ltd. | Noise attenuation system |
WO2000005923A1 (en) * | 1998-07-24 | 2000-02-03 | Siemens Audiologische Technik Gmbh | Hearing aid having an improved speech intelligibility by means of frequency selective signal processing, and a method for operating such a hearing aid |
WO2001047335A2 (en) | 2001-04-11 | 2001-07-05 | Phonak Ag | Method for the elimination of noise signal components in an input signal for an auditory system, use of said method and a hearing aid |
WO2005086536A1 (en) * | 2004-03-02 | 2005-09-15 | Oticon A/S | Method for noise reduction in an audio device and hearing aid with means for reducing noise |
EP2533550A1 (en) | 2011-06-06 | 2012-12-12 | Oticon A/s | Diminishing tinnitus loudness by hearing instrument treatment |
EP2560410A1 (en) | 2011-08-15 | 2013-02-20 | Oticon A/s | Control of output modulation in a hearing instrument |
EP2563045A1 (en) | 2011-08-23 | 2013-02-27 | Oticon A/s | A method and a binaural listening system for maximizing a better ear effect |
EP2563044A1 (en) | 2011-08-23 | 2013-02-27 | Oticon A/s | A method, a listening device and a listening system for maximizing a better ear effect |
EP2613567A1 (en) | 2012-01-03 | 2013-07-10 | Oticon A/S | A method of improving a long term feedback path estimate in a listening device |
EP2663094A1 (en) | 2012-05-09 | 2013-11-13 | Oticon A/s | Methods and apparatus for processing audio signals |
EP2677770A1 (en) | 2012-06-21 | 2013-12-25 | Oticon A/s | Hearing aid comprising a feedback alarm |
EP2840810A2 (en) | 2013-04-24 | 2015-02-25 | Oticon A/s | A hearing assistance device with a low-power mode |
EP2849462A1 (en) | 2013-09-17 | 2015-03-18 | Oticon A/s | A hearing assistance device comprising an input transducer system |
US9344817B2 (en) | 2000-01-20 | 2016-05-17 | Starkey Laboratories, Inc. | Hearing aid systems |
EP3068146B1 (en) | 2015-03-13 | 2017-10-11 | Sivantos Pte. Ltd. | Method for operating a hearing device and hearing device |
EP3048813B1 (en) | 2015-01-22 | 2018-03-14 | Sivantos Pte. Ltd. | Method and device for suppressing noise based on inter-subband correlation |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4277645A (en) * | 1980-01-25 | 1981-07-07 | Bell Telephone Laboratories, Incorporated | Multiple variable threshold speech detector |
US4382164A (en) * | 1980-01-25 | 1983-05-03 | Bell Telephone Laboratories, Incorporated | Signal stretcher for envelope generator |
DE2649259C2 (en) * | 1976-10-29 | 1983-06-09 | Felten & Guilleaume Fernmeldeanlagen GmbH, 8500 Nürnberg | Method for the automatic detection of disturbed telephone speech |
US4630304A (en) * | 1985-07-01 | 1986-12-16 | Motorola, Inc. | Automatic background noise estimator for a noise suppression system |
US4696039A (en) * | 1983-10-13 | 1987-09-22 | Texas Instruments Incorporated | Speech analysis/synthesis system with silence suppression |
-
1989
- 1989-08-18 DK DK406189A patent/DK406189A/en unknown
-
1990
- 1990-08-17 WO PCT/DK1990/000214 patent/WO1991003042A1/en unknown
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE2649259C2 (en) * | 1976-10-29 | 1983-06-09 | Felten & Guilleaume Fernmeldeanlagen GmbH, 8500 Nürnberg | Method for the automatic detection of disturbed telephone speech |
US4277645A (en) * | 1980-01-25 | 1981-07-07 | Bell Telephone Laboratories, Incorporated | Multiple variable threshold speech detector |
US4382164A (en) * | 1980-01-25 | 1983-05-03 | Bell Telephone Laboratories, Incorporated | Signal stretcher for envelope generator |
US4696039A (en) * | 1983-10-13 | 1987-09-22 | Texas Instruments Incorporated | Speech analysis/synthesis system with silence suppression |
US4630304A (en) * | 1985-07-01 | 1986-12-16 | Motorola, Inc. | Automatic background noise estimator for a noise suppression system |
Cited By (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU666161B2 (en) * | 1992-02-14 | 1996-02-01 | Nokia Mobile Phones Limited | Noise attenuation system for voice signals |
US5406635A (en) * | 1992-02-14 | 1995-04-11 | Nokia Mobile Phones, Ltd. | Noise attenuation system |
WO2000005923A1 (en) * | 1998-07-24 | 2000-02-03 | Siemens Audiologische Technik Gmbh | Hearing aid having an improved speech intelligibility by means of frequency selective signal processing, and a method for operating such a hearing aid |
US6768801B1 (en) | 1998-07-24 | 2004-07-27 | Siemens Aktiengesellschaft | Hearing aid having improved speech intelligibility due to frequency-selective signal processing, and method for operating same |
US9344817B2 (en) | 2000-01-20 | 2016-05-17 | Starkey Laboratories, Inc. | Hearing aid systems |
US9357317B2 (en) | 2000-01-20 | 2016-05-31 | Starkey Laboratories, Inc. | Hearing aid systems |
WO2001047335A2 (en) | 2001-04-11 | 2001-07-05 | Phonak Ag | Method for the elimination of noise signal components in an input signal for an auditory system, use of said method and a hearing aid |
WO2005086536A1 (en) * | 2004-03-02 | 2005-09-15 | Oticon A/S | Method for noise reduction in an audio device and hearing aid with means for reducing noise |
US7489789B2 (en) | 2004-03-02 | 2009-02-10 | Oticon A/S | Method for noise reduction in an audio device and hearing aid with means for reducing noise |
EP2533550A1 (en) | 2011-06-06 | 2012-12-12 | Oticon A/s | Diminishing tinnitus loudness by hearing instrument treatment |
EP2560410A1 (en) | 2011-08-15 | 2013-02-20 | Oticon A/s | Control of output modulation in a hearing instrument |
EP2563044A1 (en) | 2011-08-23 | 2013-02-27 | Oticon A/s | A method, a listening device and a listening system for maximizing a better ear effect |
EP2563045A1 (en) | 2011-08-23 | 2013-02-27 | Oticon A/s | A method and a binaural listening system for maximizing a better ear effect |
EP2613567A1 (en) | 2012-01-03 | 2013-07-10 | Oticon A/S | A method of improving a long term feedback path estimate in a listening device |
EP2663094A1 (en) | 2012-05-09 | 2013-11-13 | Oticon A/s | Methods and apparatus for processing audio signals |
EP2677770A1 (en) | 2012-06-21 | 2013-12-25 | Oticon A/s | Hearing aid comprising a feedback alarm |
EP2840810A2 (en) | 2013-04-24 | 2015-02-25 | Oticon A/s | A hearing assistance device with a low-power mode |
EP2849462A1 (en) | 2013-09-17 | 2015-03-18 | Oticon A/s | A hearing assistance device comprising an input transducer system |
US9538296B2 (en) | 2013-09-17 | 2017-01-03 | Oticon A/S | Hearing assistance device comprising an input transducer system |
US10182298B2 (en) | 2013-09-17 | 2019-01-15 | Oticfon A/S | Hearing assistance device comprising an input transducer system |
EP3048813B1 (en) | 2015-01-22 | 2018-03-14 | Sivantos Pte. Ltd. | Method and device for suppressing noise based on inter-subband correlation |
EP3068146B1 (en) | 2015-03-13 | 2017-10-11 | Sivantos Pte. Ltd. | Method for operating a hearing device and hearing device |
Also Published As
Publication number | Publication date |
---|---|
DK406189A (en) | 1991-02-19 |
DK406189D0 (en) | 1989-08-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO1991003042A1 (en) | A method and an apparatus for classification of a mixed speech and noise signal | |
Schroeder | Vocoders: Analysis and synthesis of speech | |
McAulay et al. | Speech enhancement using a soft-decision noise suppression filter | |
Ibrahim | Preprocessing technique in automatic speech recognition for human computer interaction: an overview | |
US5749067A (en) | Voice activity detector | |
Holmes | The JSRU channel vocoder | |
US20050108004A1 (en) | Voice activity detector based on spectral flatness of input signal | |
Kleinschmidt | Methods for capturing spectro-temporal modulations in automatic speech recognition | |
JPH05346797A (en) | Voiced sound discriminating method | |
CN110390957A (en) | Method and apparatus for speech detection | |
Mourjopoulos et al. | Modelling and enhancement of reverberant speech using an envelope convolution method | |
de-La-Calle-Silos et al. | Synchrony-based feature extraction for robust automatic speech recognition | |
Shoba et al. | Adaptive energy threshold for monaural speech separation | |
US3405237A (en) | Apparatus for determining the periodicity and aperiodicity of a complex wave | |
Brown et al. | A neural oscillator sound separator for missing data speech recognition | |
Kawamura et al. | A noise reduction method based on linear prediction analysis | |
Lin | A New Frequency Coverage Metric and a New Subband Encoding Model, with an Application in Pitch Estimation. | |
Ganapathy et al. | Comparison of modulation features for phoneme recognition | |
Muhsina et al. | Signal enhancement of source separation techniques | |
Logeshwari et al. | A survey on single channel speech separation | |
Radfar et al. | MPTRACKER: A new Multi-Pitch detection and separation algorithm for mixed speech signals | |
Whitmal et al. | Wavelet-based noise reduction | |
Nikhil et al. | Impact of ERB and bark scales on perceptual distortion based near-end speech enhancement | |
de Cheveigné | A mixed speech F0 estimation algorithm. | |
Kingsbury et al. | Improving ASR performance for reverberant speech |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): JP US |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): AT BE CH DE DK ES FR GB IT LU NL SE |