GB2426166A - Voice activity detector - Google Patents

Voice activity detector Download PDF

Info

Publication number
GB2426166A
GB2426166A GB0509415A GB0509415A GB2426166A GB 2426166 A GB2426166 A GB 2426166A GB 0509415 A GB0509415 A GB 0509415A GB 0509415 A GB0509415 A GB 0509415A GB 2426166 A GB2426166 A GB 2426166A
Authority
GB
United Kingdom
Prior art keywords
noise
voice activity
likelihood ratio
speech
estimate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
GB0509415A
Other versions
GB2426166B (en
GB0509415D0 (en
Inventor
Jabloun Firas
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Toshiba Europe Ltd
Original Assignee
Toshiba Research Europe Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Toshiba Research Europe Ltd filed Critical Toshiba Research Europe Ltd
Priority to GB0509415A priority Critical patent/GB2426166B/en
Publication of GB0509415D0 publication Critical patent/GB0509415D0/en
Priority to EP06252433A priority patent/EP1722357A3/en
Priority to US11/429,308 priority patent/US7596496B2/en
Priority to PCT/JP2006/309624 priority patent/WO2006121180A2/en
Priority to CN200680000377.0A priority patent/CN101080765A/en
Priority to JP2007546958A priority patent/JP2008534989A/en
Publication of GB2426166A publication Critical patent/GB2426166A/en
Application granted granted Critical
Publication of GB2426166B publication Critical patent/GB2426166B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals

Abstract

A voice activity detection method comprising the steps of (a) Estimating in a noise power estimator the noise power within a signal having a speech component and a noise component, and (b) Calculating a likelihood ratio for the presence of speech in the signal from the estimated power of noise signals from step (a) and a complex Gaussian statistical model. The advantage of this technique is that it does not require a previously calculated noise power estimate (obtained from a previously calculated likelihood ratio) and therefore any errors are not re-iterated in a feedback loop and consequently do not increase.

Description

Voice Activity Detection Apparatus and Method The present invention
relates to signal processing and in particular a voice activity detection method and voice activity detector.
Speech signals that are transmitted by speech communication devices will often be comipted to some extent by noise which interferes with and degrades the performance of coding, detection and recognition algorithms.
A variety of different voice activity detectors and detection methods have been developed in order to detect speech periods in input signals which comprise both speech and noise components. Such devices and methods have application in areas such as speech coding, speech enhancement and speech recognition.
The simplest form of voice activity detection is an energy based method in which the power of an input signal is assessed in order to determine if speech is present (i.e. an increase in energy indicates the presence of speech). Such a technique works well where the signal to noise ratio is high but becomes increasingly unreliable in the presence of noisy signals.
A voice activity detection method based on the use of a statistical model is described in "A Statistical Model Based Voice Activity Detection" by Solm et al [IEEE Signal Processing Letters Vol 6, No 1, January 1999]. The statistical model described uses a model for noise and speech to calculate a likelihood ratio (LR) statistic (where LR = [probability speech is present]/[probability speech is absent]). The LR statistic so calculated is then compared to a threshold value in order to decide whether the speech signal (or section thereof) under analysis contains speech.
The Sohn et al technique was modified in "Improved Voice Activity Detection Based on a Smoothed Statistical Likelihood Ratio" by Cho et al, In Proceedings of ICASSP, Salt Lake City, USA, vol. 2, pp 737-740, May 2001. The modified version of the technique proposes the use of a smoothed likelihood ratio (SLR) in order to alleviate detection errors that might otherwise be encountered at speech offset regions.
In order to calculate LR (or SLR) the above statistical methods both require the use of an existing noise power estimate. This noise estimate is obtained using the LRJSLR calculated during previous iterations of the analysis frames.
There thus exists a feedback mechanism within the above described statistical methods in which the likelihood ratio is calculated using an existing noise estimate which is in turn calculated using a previously derived likelihood ratio value. Such a feedback mechanism can result in an accumulation of errors which impacts upon the overall performance of the system.
As noted above the likelihood ratio that is calculated is compared to a threshold value in order to decide if speech is present. However, the likelihood ratios calculated in the above techniques can vary over the order of 60dB or more. If there are large variations in the noise in the input signal then the threshold value may become an inaccurate indicator of the presence of speech and system performance may decrease.
It is therefore an object of the present invention to provide a voice activity detection method and apparatus that substantially overcomes or mitigates the above mentioned
problems with the prior art.
According to a first aspect of the present invention there is provided a voice activity detection method comprising the steps of (a) Estimating in a noise power estimator the noise power within a signal having a speech component and a noise component (b) Calculating a likelihood ratio for the presence of speech in the signal from the estimated power of noise signals from step (a) and a complex Gaussian statistical model.
The present invention proposes a voice activity detection method based on a statistical model wherein an independent noise estimation component is used to provide the model with a noise estimate. Since the noise estimation is now independent of the calculation of the likelihood ratio there is no longer a feedback ioop between the noise estimation and the LR calculation.
The noise estimation may be conveniently performed by a quantile based noise estimation method (see for example "Quantile Based Noise Estimation for Spectral Subtration and Wiener Filtering" by Stahl, Fischer and Bippus, pp1 875-1878, vol. 3, ICASSP 2000; see also "Noise Power Spectral Density Estimation Based on Optimal Smoothing and Minimum Statistics", by Martin in IEEE Trans. Speech and Audio Processing, Vol. 9, No. 5, July 2001, pp. 504-5 12). However, any suitable noise estimation technique may be used.
Preferably the noise estimation value is further processed by smoothing the estimated value by a first order recursive function.
Conventional quantile based noise estimation methods require that a signal is analysed over K+l frequency bands and T time frames for each time frame. This can be computationally expensive and so conveniently only a subset of the K+1 frequencies may be updated at any one time frame. The noise estimate at the remaining frequencies may be derived by interpolation from those values that have been updated.
It is noted that the threshold value against which the presence of speech is assessed is crucial to the overall performance of a voice activity detector. As noted above the calculated likelihood ratio can actually vary over many dBs and so preferably the parameter should be set such that it is robust to changes in the input speech dynamic range and/or the noise conditions.
Conveniently the calculated likelihood ratio can be restricted/compressed using a non- linear function to a pre-determined interval (e.g. between zero and one). By compressing the likelihood ratio in this way the effects of variations in the SNR are mitigated against and the performance of the voice detector is improved.
Conveniently the likelihood ratio may be restricted to the range zero-toone by the following function P(t) = 1- min(l,et)) where I'(t) is the smoothed likelihood ratio for frame t.
According to a second aspect of the present invention there is provided a voice activity detection method comprising the steps of (a) estimating the noise power within a signal having a speech component and a noise component (b) calculating a likelihood ratio for the presence of speech in the signal from the estimated power of noise signals from step (a) and a complex Gaussian statistical model (c) updating the noise power estimate based on the likelihood ratio calculated in step (b) wherein the likelihood ratio is restricted using a non-linear function to a predetermined interval.
In the voice activity methods of the first and second aspects of the present invention the likelihood ratio that is calculated is compared to a pre-defined threshold value in order to determine the presence or absence of speech.
Conveniently in both aspects of the invention the noisy speech signal under analysis is transformed from the time domain to the frequency domain via a Fast Fourier Transform step.
In both the first and second aspects of the present invention the likelihood ratio (LR) of the kth spectral bin may be defined as P(XkHIk) 1 IYkk A= = exp k P(XkHO,k) 1''k where hypothesis H0 represents the absence of speech; hypothesis H1 represents the presence of speech; Ik and k' the a posteriori and a priori signal-to-noise ratios (SNR) respectively, defined as = and k = L; and Nk and Ask are the noise and speech variances at frequency index k respectively Conveniently the likelihood ratio may be smoothed in the log domain using a first order recursive system in order to improve performance. In such cases the smoothed likelihood ratio may be calculated as kYk (t) = k (t -1) + (1- K) log Ak (t) where K is a smoothing factor and I is the time frame index.
The geometric mean of the smoothed likelihood ratio can conveniently be computed as Y(t) = - P (t) and I'(t) is used to determine the presence of speech. [Note: Depending on the noise characteristics certain frequency bands can be eliminated from the above summation].
In a third aspect of the present invention which corresponds to the first aspect of the invention there is provided a voice activity detector comprising a likelihood ratio calculator for calculating a likelihood ratio for the presence of speech in a noisy signal using an estimate of the noise power in the noisy signal and a complex Gaussian statistical model wherein the noise power estimate is calculated independently of the VAD.
In a fourth aspect of the present invention which corresponds to the second aspect of the invention there is provided a voice activity detector comprising a likelihood ratio calculator for calculating a likelihood ratio for the presence of speech in a noisy signal using an estimate of the noise power in the noisy signal and a complex Gaussian statistical model wherein the likelihood ratio is used to update the noise estimate within the detector and wherein the likelihood ratio is restricted using a non-linear function to a predetermined interval.
In a further aspect of the present invention there is provided a voice activity detection system comprising a voice activity detector according to the third aspect of the present invention or a voice activity detector configured to implement the first aspect of the present invention and a noise estimator for providing a noise estimate to the voice activity detector for a signal including a noise component and a speech component.
The skilled person will recognise that the above-described equalisers and methods may be embodied as processor control code, for example on a carrier medium such as a disk, CD- or DVD-ROM, programmed memory such as read only memory (Firmware), or on a data carrier such as an optical or electrical signal carrier.
These and other aspects of the invention will now be further described, by way of example only, with reference to the accompanying figures in which: Figure 1 shows a schematic illustration of a prior art voice activity detector Figure 2 shows a schematic illustration of a voice activity detector according to the present invention Figure 3 shows a plot of signal power versus frequency for a noisy speech signal Figure 4 shows a frequency versus time plot for a signal over T time frames Figure 5 shows power spectrum values of a particular frequency bin versus time Figure 6 shows accuracy of speech recognition versus signal-to-noise values for a signal comprising German speech Figure 7 shows accuracy of speech recognition versus signal-to-noise values for a signal comprising UK English speech.
In the statistical model used in the present invention (and also described in Cho et al) a voice activity decision is made by testing two hypotheses, H0 and H1 where H0 indicates the absence of speech and H1 indicates the presence of speech.
The statistical model assumes that each spectral component of the speech and noise has a complex Gaussian distribution in which noise is additive and uncorrelated with the speech. Based on this assumption the conditional probability density functions (PDF) of a noisy spectral component Xk, given ok and H1 are as follows: P(XkHok)= 1 exp- (1) N,k N, k j and 1 expIX' (2) 2r(2N,k +2sk) N,k S,k j where Nk and 2Sk are the noise and speech variances at frequency index k respectively.
The likelihood ratio (LR) of the k spectral bin is then defined as A = P(Xk I1k) 1 expI 7kk (3) P(X,JHOJ) 1+k 11kJ where Yk and k' the a posteriori and a priori signal-to-noise ratios (SNIR) respectively, are defined as Yk = (4) and (5) In the prior art the noise variance, ,Nk is derived through noise adaptation in which the variance of the noise spectrum of the kth spectral component in the tt frame is updated in a recursive way as = + (1- 17)E(J\T'X ) (6) where is a smoothing factor. The expected noise power spectrum E(T'X ) is estimated by means of a soft decision technique as = (t)2p(Jq + P(JIlk)' ) (7) where P(11l,k X') = 1 P(JJOk Xt) and PIL,k is calculated as follows: (8) I X) 1+ P(HI,k) p(Hok) It is thus noted that the noise variance calculated in Equation (6) utilises (in Eq. 7) PDF values for the presence and absence of speech. The PDF calculations, in turn, indirectly use values for 2Nk (see Equation (2)).
The unknown a priori speech absence probability (which can also be upper and lower bounded by user predefined limits) can be written as follows p(H) = flp(Ht') + (1 - fi)p(H' (9) It is therefore clear that a feedback mechanism exists in the method described according to the prior art which can lead to an accumulation of errors.
The above discussion is represented schematically in Figure 1 in which a Voice Activity Detector 1 according to the prior art comprises a Likelihood Ratio calculation component 3 and also a noise estimation component 5. The output 7 of the LR component feeds into the noise estimation component 5 and the output 9 of the noise estimation component feeds into the LR component.
The voice activity detection method of the first (and third) aspect (s) of the present invention is represented schematically in Figure 2 in which a Voice Activity Detector 11 comprises a LR component 13. An independent noise estimation component 15 feeds noise estimates 17 into the LR component in order to derive the Likelihood ratio.
The voice activity detector according to the first and third aspects of the present invention estimates the noise variance 2Nk externally using a suitable technique. For example a quantile based noise estimation approach (as described in more detail below) may be used to estimate the noise variance.
The voice activity detector according to the second and fourth aspects of the present invention processes the likelihood ratio derived in a LR component using a non-linear function in order to restrict the values of the ratio to a predetermined interval.
The speech variance is then estimated in the present invention as = fl.2+(1_fl)max( ()2.ZN,k,0) (10) wherein 13 is the speech variance forgetting factor.
The likelihood ratio can then be calculated as described with reference to Equations (1)- (5). Speech presence or absence is then calculated by comparing the LR to a threshold value.
It is noted that in all aspects of the present invention the performance of the voice activity detector may be improved by smoothing the likelihood ratio in the log domain using a first order recursive system wherein (t) = ,&l'k (t -1) + (1- K) logA (t) (11) where t is the time frame index and K is a smoothing factor. The geometric mean of the smoothed likelihood ratio (SLR) (equivalent to the arithmetic mean in the log domain) may then be calculated as _ 1 K-I (12) P(t) - k(t) I'(t) can then be used to detect speech presence or absence as before by comparison with a threshold value.
The threshold value against which the LR and SLR are compared to determine the presence of speech is crucial to the behaviour and performance of the Voice Activity Detector. The value chosen for the parameter (for example by simulation experiments) should be robust to changes in the input speech dynamic range andlor the noise conditions. Usually, this parameter has to be adjusted whenever the SNR values change.
However, as noted above the LR/SLR may vary across many dBs and it can therefore be difficult to set the parameter at a suitable value.
In order to mitigate against changes in the SNR the LRISLR calculated in the first and third aspects of the present invention may be further processed by a non-linear function in order to restrict the values for the likelihood ratio to a particular interval, e.g. between zero (0) and one (1). By compressing the likelihood ratio in this way the effects of noise variances can be reduced and system performance increased. It is noted that this restrictive function corresponds to the second aspect of the present invention but may also be used in conjunction with the first aspect of the present invention.
An example of a function suitable for restricting the likelihood ratio value to the [0,11 interval is - (13) = 1- min(1, e') In the first aspect of the present invention the noise estimate is derived externally to the likelihood ratio calculation. One method of deriving such an estimate is by a quantile based noise estimation (QBNE) approach.
A QNBE approach estimates the noise power spectrum continuously (i.e. even during periods of speech activity) by utilising the assumption that the speech signal is not stationary and will not occupy the same frequency band permanently. The noise signal on the other hand is assumed to be slowly varying compared to the speech signal such that it can be considered relatively constant for several consecutive analysis frames (time periods).
Working under the above assumptions it is possible to sort the noisy signal (in order to build sorted buffers) for each frequency band under consideration over a period of time and to retrieve a noise estimate from the so constructed buffers.
The QBNE approach is illustrated in Figures 3 to 5.
Figure 3 shows a plot of signal power (power spectrum) versus frequency for a noise signal 18 and a speech signal at two different times, tj and t2 (in the Figure the speech signal at time tj is labelled 19 and at time t2 it is labelled 20). It can be seen that the speech signal does not occupy the same frequencies at each time and so the noise, at a particular frequency, can be estimated when speech does not occupy that particular frequency band. In the Figure, for example, the noise at frequencies fj and f2 can be estimated at time tj and the noise at frequenciesf3 andf4 can be estimated at time t2.
For a noisy signal, X(k, t) is the power spectrum of the noisy signal where k is the frequency bin index and t is the time (frame) index. If the past and the future T/2 frames are stored in a buffer then for frame t, these T frames X(k, t) can be sorted at each frequency bin in an ascending order such that (14) where t e[t_T/2,t+T12_l}.
The above equation is illustrated in Figures 4 and 5. Turning to Figure 4 a frequency versus time plot is shown for a number of time frames (for the sake of clarity only 5 of the total T frames are shown). Depending on the particular application thirty time frames may be stored in the buffer, i.e. T=30). At each frame the power spectrum of the signal is a vector represented by the vertical boxes (2 1,23,25,27,29).
For a particular frequency, k, (illustrated by the horizontal box 31 in Figure 4) the power spectrum values over a window of T frames may be stored in a FIFO buffer as illustrated in Figure 5. The stored frames can then be sorted in ascending order (as described in relation to Equation 14 above) using any fast sorting technique.
The noise estimate, N(k, t), for the kth frequency may be taken as the qth quantile of the values sorted in the buffer. In other words, I(k,t) =X(k,t) (15) [qrj where O<q<1 and [ ] denotes rounding down to the nearest integer.
The noise estimate may be worked out for each frequency band.
In calculating a noise estimate it is assumed that, for T frames, one particular frequency will be occupied by a speech component for at most 50% of the time. Therefore, if q is set equal to 0.5 then the median value will be selected as the noise estimate. It is thought that the median quantile value will give better performance than other quantile values as it is less vulnerable to outlying variations.
The QBNE derived noise estimate can be improved by smoothing the value obtained from Equation 15 above using a first order recursive function, wherein J.T(k, t) = p(k, t)N(k,t -1) + (1- p(k,t)) N(k,t) (16) where N is the noise estimate derived in Equation 15 above, N is the smoothed noise estimate and p(k, t) is a frequency dependent smoothing parameter which is updated at every frame t according to the signal-to-noise ratio (SNR).
The instantaneous SNR may be defined as the ratio between the input noisy speech spectrum and the current QBNE noise estimate, i.e. y(k,t)= X(k,t) (17) N(k,t) Alternatively, the noise estimate from the previous frame may also be used such that X(k,t) (18) N(k,t -1) In either case the smoothing parameter may be obtained as p(k t)= y(k,t) (19) r(k,t)+p Where p is a parameter that controls the sensitivity to the QBNE estimate.
It is noted that as the SNR increases it should be arranged that the QBNE noise estimate for a particular frequency should have little effect on an updated noise estimate. On the other hand, if the SNR is low, i.e. noise dominates a given frame at a given frequency, then the QBNE estimate from one frame to the next will become more reliable and consequently a current noise estimate should have a larger effect on an updated estimate. The parameter p controls the sensitivity to the QBNE estimate. If p - 0 then p(k, t) -* 1 and N(k, t) will have little effect on the noise estimate. If p - , on the other hand, then N(k, t) will dominate the estimate at each frame.
It is noted that conventional speech analysis systems often analyse input signals in more than one hundred frequency bands. If the neighbouring 30 frames are also stored and analysed in order to derive the noise estimate then it may become computationally prohibitively expensive to maintain and update a noise estimate at every frequency for every frame.
The noise estimate may therefore only be updated over a sub-set of the total frequency bands under analysis. For example, if there are 10 frequency bands then for a first frame t the noise estimate may only be calculated and updated for the odd frequency bands (1,3,5,7,9). During the next frame t', the noise estimate may be calculated and updated for the even frequency bands (2,4,6,8,10).
For frame t, the noise estimate on the even frequency bands may be estimated by interpolation from the odd frequency values. For frame t', the noise estimate on the odd frequency bands may be estimated by interpolation from the even frequency values.
A voice activity detector according to aspects of the present invention was evaluated against a conventional detector for both German and UK English speech utterances. The VAD was used to detect the start and end points of the utterances for speech recognition purposes.
In a first experiment car noise was artificially added to a first data set at different signal- to-noise ratios. Speech signals were padded with silent periods at the start and end of the utterances.
Figure 6 shows the speech recognition accuracy results of the first experiment for the German data set. The solid line, marked "FA", represents recognition results corresponding with accurate endpoints obtained via forced alignment..
Line X in Figure 6 shows results using a prior art voice activity detector (internal noise estimation and no compression of likelihood ratio), line Y shows results for a voice activity detector which calculates a likelihood ratio which is then smoothed and compressed as detailed above (i.e. a voice activity detector according to the second and fourth aspects of the present invention) and Line Z shows the results for a voice activity detector which utilises an independent noise estimator (i.e. a voice activity detector according to the first and third aspects of the present invention).
It can be seen that the voice activity detectors according to aspects of the present invention outperform the prior art detector, especially at low SNR levels.
Furthermore, it can also be seen that the use of an external noise estimate (line Z) further enhances the performance of the voice activity detector when compared to the version which smoothes and compresses the likelihood ratio (line Y).
Figure 7 shows the results of a similar evaluation this time performed with an English language data set. As for the German utterance the results according to aspects of the present invention are an improvement over the prior art system.
A further performance evaluation is shown in Table 1 below for two further data sets, C and D. which were recorded in a second experiment conducted in a car.
Once again evaluation has been performed for both UK English and German and it can be seen that a voice activity detector according to the present invention which uses an independent noise estimation outperforms the prior art system. For German utterances the recognition error rate is reduced by around 30% and for UK English the reduction is around 25%.
Voice activity German UK English detector DATA SET C DATA SET D C D COMPARISON 94.1 92.7 92.4 88.3 PRIORART 86.1 80.4 83.6 78.5
VAD WITH
COMPRESSION 90.3 82.4 88.7 83.4
OFLR _________ _________ ________ _________
VAD WITH
EXTERNAL 90.5 85.9 87.7 84.0
NOISE
ESTIMATION ____________ ____________ ____________ ____________
TABLE 1

Claims (19)

1. A voice activity detection method comprising the steps of (a) Estimating in a noise power estimator the noise power within a signal having a speech component and a noise component (b) Calculating a likelihood ratio for the presence of speech in the signal from the estimated power of noise signals from step (a) and a complex Gaussian statistical model.
2. A voice activity detection method as claimed in claim 1 wherein the likelihood ratio in step (b) is restricted using a non-linear function to a predetermined interval.
3. A voice activity detection method as claimed in claim 2 wherein the likelihood ratio is restricted by the function HP(t) = 1-min(l,e ) where I'(t) is the likelihood ratio
4. A voice activity detection method as claimed in any preceding claim wherein the noise power estimator uses a quantile based estimation method to estimate the noise power.
5. A voice activity detection method as claimed in claim 4 wherein the noise power estimate is smoothed using a first order recursive function.
6. A voice activity detection method as claimed in any preceding claim wherein the signal is analysed over K+1 frequency bands and for each time frame the noise power estimate is only updated over a sub-set of the K+1 frequency bands.
7. A voice activity detection method as claimed in claim 6 wherein the noise estimate is updated over all K+1 frequency bands by interpolation from the sub-set of updated frequency bands.
8. A voice activity detection method comprising the steps of (a) estimating the noise power within a signal having a speech component and a noise component (b) calculating a likelihood ratio for the presence of speech in the signal from the estimated power of noise signals from step (a) and a complex Gaussian statistical model (c) updating the noise power estimate based on the likelihood ratio calculated in step (b) wherein the likelihood ratio is restricted using a non-linear function to a predetermined interval.
9. A voice activity detection method as claimed in any preceding claim wherein the likelihood ratio is compared to a threshold value in order to detect the presence or absence of speech.
10. A voice activity detection method as claimed in any preceding claim wherein the likelihood ratio is determined by the following equation P(XkHIk) 1 IYkk A = = exp k P(XkHOk) 1+k wherein hypothesis H0 represents the absence of speech; hypothesis H1 represents the presence of speech; ANk and are the noise and speech variances at frequency index k respectively; and Yk and k' are defined as Yk and k =
11. A voice activity detection method as claimed in claim 10 wherein a smoothed likelihood ratio is calculated by the following equation k (t) = k (t -1) + (1-ic) log Ak (t) where K iS a smoothing factor and t is the time frame index.
12. A voice activity detection method as claimed in claim 11 wherein the geometric mean of the smoothed likelihood ratio is calculated as P(t) = P, (t) and 1'(t) is used to determine the presence of speech.
13. A voice activity detector comprising a likelihood ratio calculator for calculating a likelihood ratio for the presence of speech in a noisy signal using an estimate of the noise power in the noisy signal and a complex Gaussian statistical model wherein the noise power estimate is calculated independently of the VAD.
14. A voice activity detector comprising a likelihood ratio calculator for calculating a likelihood ratio for the presence of speech in a noisy signal using an estimate of the noise power in the noisy signal and a complex Gaussian statistical model wherein the likelihood ratio is used to update the noise estimate within the detector and wherein the likelihood ratio is restricted using a non-linear function to a predetermined interval.
15. Processor control code to, when running, implement the method of any one of claims ito 12.
16. A carrier carrying the processor control code of claim 16.
17. Processor control code to, when running, implement the voice activity detector of either of claims 13 or 14.
18. A carrier carrying the processor control code of claim 17.
19. A voice activity detection system comprising a voice activity detector. according to claim 13 or a voice activity detector configured to implement the method of any of claims 1 to 7 and a noise estimator for providing a noise estimate to the voice activity detector for a signal including a noise component and a speech component.
GB0509415A 2005-05-09 2005-05-09 Voice activity detection apparatus and method Expired - Fee Related GB2426166B (en)

Priority Applications (6)

Application Number Priority Date Filing Date Title
GB0509415A GB2426166B (en) 2005-05-09 2005-05-09 Voice activity detection apparatus and method
EP06252433A EP1722357A3 (en) 2005-05-09 2006-05-08 Voice activity detection apparatus and method
US11/429,308 US7596496B2 (en) 2005-05-09 2006-05-08 Voice activity detection apparatus and method
CN200680000377.0A CN101080765A (en) 2005-05-09 2006-05-09 Voice activity detection apparatus and method
PCT/JP2006/309624 WO2006121180A2 (en) 2005-05-09 2006-05-09 Voice activity detection apparatus and method
JP2007546958A JP2008534989A (en) 2005-05-09 2006-05-09 Voice activity detection apparatus and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB0509415A GB2426166B (en) 2005-05-09 2005-05-09 Voice activity detection apparatus and method

Publications (3)

Publication Number Publication Date
GB0509415D0 GB0509415D0 (en) 2005-06-15
GB2426166A true GB2426166A (en) 2006-11-15
GB2426166B GB2426166B (en) 2007-10-17

Family

ID=34685294

Family Applications (1)

Application Number Title Priority Date Filing Date
GB0509415A Expired - Fee Related GB2426166B (en) 2005-05-09 2005-05-09 Voice activity detection apparatus and method

Country Status (6)

Country Link
US (1) US7596496B2 (en)
EP (1) EP1722357A3 (en)
JP (1) JP2008534989A (en)
CN (1) CN101080765A (en)
GB (1) GB2426166B (en)
WO (1) WO2006121180A2 (en)

Families Citing this family (35)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2031583B1 (en) * 2007-08-31 2010-01-06 Harman Becker Automotive Systems GmbH Fast estimation of spectral noise power density for speech signal enhancement
US20090150144A1 (en) * 2007-12-10 2009-06-11 Qnx Software Systems (Wavemakers), Inc. Robust voice detector for receive-side automatic gain control
KR101317813B1 (en) * 2008-03-31 2013-10-15 (주)트란소노 Procedure for processing noisy speech signals, and apparatus and program therefor
KR101335417B1 (en) * 2008-03-31 2013-12-05 (주)트란소노 Procedure for processing noisy speech signals, and apparatus and program therefor
CN101853666B (en) * 2009-03-30 2012-04-04 华为技术有限公司 Speech enhancement method and device
KR101581883B1 (en) * 2009-04-30 2016-01-11 삼성전자주식회사 Appratus for detecting voice using motion information and method thereof
CN102405463B (en) * 2009-04-30 2015-07-29 三星电子株式会社 Utilize the user view reasoning device and method of multi-modal information
JP5411936B2 (en) * 2009-07-21 2014-02-12 日本電信電話株式会社 Speech signal section estimation apparatus, speech signal section estimation method, program thereof, and recording medium
DK3493205T3 (en) * 2010-12-24 2021-04-19 Huawei Tech Co Ltd METHOD AND DEVICE FOR ADAPTIVE DETECTION OF VOICE ACTIVITY IN AN AUDIO INPUT SIGNAL
US8650029B2 (en) * 2011-02-25 2014-02-11 Microsoft Corporation Leveraging speech recognizer feedback for voice activity detection
JP5643686B2 (en) * 2011-03-11 2014-12-17 株式会社東芝 Voice discrimination device, voice discrimination method, and voice discrimination program
US20120245927A1 (en) * 2011-03-21 2012-09-27 On Semiconductor Trading Ltd. System and method for monaural audio processing based preserving speech information
US20130090926A1 (en) * 2011-09-16 2013-04-11 Qualcomm Incorporated Mobile device context information using speech detection
US9754608B2 (en) * 2012-03-06 2017-09-05 Nippon Telegraph And Telephone Corporation Noise estimation apparatus, noise estimation method, noise estimation program, and recording medium
US9258653B2 (en) 2012-03-21 2016-02-09 Semiconductor Components Industries, Llc Method and system for parameter based adaptation of clock speeds to listening devices and audio applications
US20130317821A1 (en) * 2012-05-24 2013-11-28 Qualcomm Incorporated Sparse signal detection with mismatched models
CA2804120C (en) 2013-01-29 2020-03-31 Her Majesty The Queen In Right Of Canada As Represented By The Minister Of National Defence Vehicle noise detectability calculator
FR3002679B1 (en) * 2013-02-28 2016-07-22 Parrot METHOD FOR DEBRUCTING AN AUDIO SIGNAL BY A VARIABLE SPECTRAL GAIN ALGORITHM HAS DYNAMICALLY MODULABLE HARDNESS
US9275638B2 (en) * 2013-03-12 2016-03-01 Google Technology Holdings LLC Method and apparatus for training a voice recognition model database
CN103730124A (en) * 2013-12-31 2014-04-16 上海交通大学无锡研究院 Noise robustness endpoint detection method based on likelihood ratio test
CN104269180B (en) * 2014-09-29 2018-04-13 华南理工大学 A kind of quasi- clean speech building method for speech quality objective assessment
CN105810201B (en) * 2014-12-31 2019-07-02 展讯通信(上海)有限公司 Voice activity detection method and its system
WO2016135741A1 (en) * 2015-02-26 2016-09-01 Indian Institute Of Technology Bombay A method and system for suppressing noise in speech signals in hearing aids and speech communication devices
CN105513614B (en) * 2015-12-03 2019-05-03 广东顺德中山大学卡内基梅隆大学国际联合研究院 A kind of area You Yin detection method based on noise power spectrum Gamma statistical distribution model
CN105575406A (en) * 2016-01-07 2016-05-11 深圳市音加密科技有限公司 Noise robustness detection method based on likelihood ratio test
CN110070883B (en) * 2016-01-14 2023-07-28 深圳市韶音科技有限公司 Speech enhancement method
CN105869658B (en) * 2016-04-01 2019-08-27 金陵科技学院 A kind of sound end detecting method using nonlinear characteristic
US20170365249A1 (en) * 2016-06-21 2017-12-21 Apple Inc. System and method of performing automatic speech recognition using end-pointing markers generated using accelerometer-based voice activity detector
US10224053B2 (en) * 2017-03-24 2019-03-05 Hyundai Motor Company Audio signal quality enhancement based on quantitative SNR analysis and adaptive Wiener filtering
US10339962B2 (en) 2017-04-11 2019-07-02 Texas Instruments Incorporated Methods and apparatus for low cost voice activity detector
CA3067233A1 (en) 2017-06-21 2018-12-27 Monsanto Technology Llc Automated systems for removing tissue samples from seeds, and related methods
CN109754823A (en) * 2019-02-26 2019-05-14 维沃移动通信有限公司 A kind of voice activity detection method, mobile terminal
US11170760B2 (en) * 2019-06-21 2021-11-09 Robert Bosch Gmbh Detecting speech activity in real-time in audio signal
CN112489692A (en) * 2020-11-03 2021-03-12 北京捷通华声科技股份有限公司 Voice endpoint detection method and device
CN113470621B (en) * 2021-08-23 2023-10-24 杭州网易智企科技有限公司 Voice detection method, device, medium and electronic equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040122667A1 (en) * 2002-12-24 2004-06-24 Mi-Suk Lee Voice activity detector and voice activity detection method using complex laplacian model
JP2005249816A (en) * 2004-03-01 2005-09-15 Internatl Business Mach Corp <Ibm> Device, method and program for signal enhancement, and device, method and program for speech recognition

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE69831991T2 (en) * 1997-03-25 2006-07-27 Koninklijke Philips Electronics N.V. Method and device for speech detection
US6349278B1 (en) * 1999-08-04 2002-02-19 Ericsson Inc. Soft decision signal estimation
US20040064314A1 (en) * 2002-09-27 2004-04-01 Aubert Nicolas De Saint Methods and apparatus for speech end-point detection
CA2420129A1 (en) * 2003-02-17 2004-08-17 Catena Networks, Canada, Inc. A method for robustly detecting voice activity
JP4497911B2 (en) * 2003-12-16 2010-07-07 キヤノン株式会社 Signal detection apparatus and method, and program

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040122667A1 (en) * 2002-12-24 2004-06-24 Mi-Suk Lee Voice activity detector and voice activity detection method using complex laplacian model
JP2005249816A (en) * 2004-03-01 2005-09-15 Internatl Business Mach Corp <Ibm> Device, method and program for signal enhancement, and device, method and program for speech recognition

Also Published As

Publication number Publication date
JP2008534989A (en) 2008-08-28
US7596496B2 (en) 2009-09-29
CN101080765A (en) 2007-11-28
EP1722357A2 (en) 2006-11-15
GB2426166B (en) 2007-10-17
EP1722357A3 (en) 2008-11-05
WO2006121180A2 (en) 2006-11-16
GB0509415D0 (en) 2005-06-15
WO2006121180A3 (en) 2007-05-18
US20060253283A1 (en) 2006-11-09

Similar Documents

Publication Publication Date Title
GB2426166A (en) Voice activity detector
CN108831499B (en) Speech enhancement method using speech existence probability
Cho et al. Improved voice activity detection based on a smoothed statistical likelihood ratio
KR101141033B1 (en) Noise variance estimator for speech enhancement
KR20010075343A (en) Noise suppression for low bitrate speech coder
US11114105B2 (en) Estimation of background noise in audio signals
US7526428B2 (en) System and method for noise cancellation with noise ramp tracking
Martin et al. Single‐Channel Speech Presence Probability Estimation and Noise Tracking
KR20160116440A (en) SNR Extimation Apparatus and Method of Voice Recognition System
KR100784456B1 (en) Voice Enhancement System using GMM
GB2426167A (en) Quantile based noise estimation
CN113838476B (en) Noise estimation method and device for noisy speech
Górriz et al. Generalized LRT-based voice activity detector
KR100798056B1 (en) Speech processing method for speech enhancement in highly nonstationary noise environments
KR101993003B1 (en) Apparatus and method for noise reduction
Erkelens et al. Fast noise tracking based on recursive smoothing of MMSE noise power estimates
Deng et al. Likelihood ratio sign test for voice activity detection
GB2437868A (en) Estimating noise power spectrum, sorting time frames, calculating the quantile and interpolating values over all remaining frequencies
Chen et al. A MDCT-based click noise reduction method for MPEG-4 AAC codec
CN115527550A (en) Single-microphone subband domain noise reduction method and system
Sharmida et al. A robust observation model for automatic speech recognition with Adaptive Thresholding
Singh et al. Sigmoid based Adaptive Noise Estimation Method for Speech Intelligibility Improvement
Górriz et al. Effective speech/pause discrimination using an integrated bispectrum likelihood ratio test
Yang et al. De-Noising Using Dual Threshold Function For Speaker Recognition At Low SNR
Teja et al. Noise Estimation based on Entropy without using VAD for Speech Enhancement

Legal Events

Date Code Title Description
PCNP Patent ceased through non-payment of renewal fee

Effective date: 20230509