WO2008074893A1

WO2008074893A1 - On-line learning method and system for speech denoising

Info

Publication number: WO2008074893A1
Application number: PCT/EP2007/064515
Authority: WO
Inventors: Vladimir Braquet
Original assignee: France Telecom
Priority date: 2006-12-21
Filing date: 2007-12-21
Publication date: 2008-06-26
Also published as: EP1936608A1

Abstract

The system produces an estimator of noise which is an unknown function of physical quantities without necessity for said function to be linear. A microphone (21) is arranged for capturing a sound and means (22, 24, 29) are arranged for associating a first vector value of the physical quantities to said sound by a common index at the same time as the sound is captured. A generator (23) and a shift register (29) are arranged for storing captured sounds and associated said first vector value by incrementing a value of the index each time a sound is captured. When said index value is or has been incremented a number of times at least equal to a first integer (L) greater than one, each time said index value is incremented, the generator (23) generates a current sequence of coefficients for a linear combination of functions which satisfy Mercer conditions wherein a first argument of one of the functions is the one of first vector values having the index value corresponding to a rank of the coefficient in the sequence and for setting values of the coefficients so as to a quantity of the last captured sounds equal to said first integer be substantially equal each to an occurrence of the linear combination wherein a second argument of the functions is another one of first vector values having the index value associated to the sound. The estimator is produced by the generator (23) performing the linear combination resulting from the current generated sequence when a next captured sound is not pure noise.

Description

On-line learning method and system for speech denoising.

This invention relates to digital signal processing and in particular to a method of processing a signal such as a speech signal, for example for use in cancellation of non-stationary noise resulting from external events that can be measured or informed.

Akinori Ito, Takashi Kanayama Motoyuki Suzuki and Shozo Makino show an example of usefulness of such methods and systems in their article entitled "Internal Noise Suppression for Speech Recognition by Small Robots" published in pages 2685-2688 of INTERSPEECH 2005. To suppress unstable noise, they must predict the spectrum of the noise frame by frame. To achieve this, they constructed a neural network that predicts the spectrum of the internal noise from the status of joints. At least 10 000 samples are required for a learning stage that points on a slow convergence which can be problematic particularly with databases becoming huge.

There are many other situations wherein speech denoising is useful like for the examples described in the international patent application WO2006/032760. Here are calculated one or more noise reduction filters from an estimated power spectral density (PSD) of the noise. The estimation of the PSD is not per se an object of WO2006/032760.

UK patent application GB2406487 discloses a modified affϊne projection algorithm for non stationary-signal. The affine projection algorithm (APA) presents fast convergence features and seems to be well adapted for filtering an echo which is correlated with the speech signal. The problem of the APA is that it is not applicable for filtering noise which is uncorrelated with the speech and which is a function of other changing physical values than voice, particularly when said function is non linear.

To avoid the problems of the prior art, an object of the invention is a method or an apparatus for generating an estimator of noise which is an unknown function of physical quantities without necessity for said function to be linear and that with good quality of a fast convergence. Particularly the method according to the invention comprises steps of capturing a sound and associating to the sound by a common index a first vector value of physical quantities inducing the sound at the same time as the sound is captured. Step of capturing is repeated by incrementing a value of the index each time said step is repeated When said index value is or has been incremented a number of times at least equal to a first integer (L) greater than one, each time said index value is incremented, the method further comprises the steps of:

- generating (107) a current sequence of coefficients for a linear combination of functions which satisfy Mercer conditions wherein a first argument of one of the functions is the first vector value of the one of the capturing steps having the index value corresponding to a rank of the coefficient in the sequence and

- setting values of the coefficients so as to a quantity of the last captured sounds equal to said first integer be substantially equal each to an occurrence of the linear combination wherein a second argument of the functions is the first vector value of the capturing step having the index value associated to the sound.

The linear combination resulting from the current generated sequence is performed to produce the estimator when a next captured sound is not pure noise.

Prefered modes of implementation of the invention are now described with reference to the drawings wherein:

- figure 1 is schematic representation of a device according to the invention;

- figure 2 presents steps of a first method implementation according to the invention;

- figure 3 presents steps of a second method implementation according to the invention.

Figure 1 is a schematic representation of a device according to the invention for restitution of a signal s which is emitted in a noisy environment. The signal s is for instance a sound of voice type or other that needs to be cleaned from noise for exploitation purpose.

The device comprises a microphone 21 for capture of sound and a de-noising module 20 for providing an estimate ^Λs of the signal s. Therefore a first contact of a switch 22 between the microphone 21 and the de-noising module is arranged to connect the microphone 21 with the de -noising module 20 when the signal s is present so as to supply the de-noising module 20 with a received signal r which comprises in that case the signal s and a noise y. The detection of the presence of the signal s is not an object of the invention, it can be realized by a voice activity detection (VAD) system, a camera detecting a person or any other system like for example simply a button.

When the signal s is not present, said first contact of the switch 22 is arranged to normally connect the microphone 21 to an estimator generator 23 so as to supply the estimator generator 23 with the noise y so long as the signal s is not present. The estimator generator is arranged according to the invention for providing an estimator ((X₁, x _!, K} that can be used by the de-noising module 20 for subtracting an estimation of noise ^Ay from the signal r so as to provide the estimate ^As of the signal s. For that purpose, a second contact of the switch 22 is arranged for connecting the estimator generator 23 to the de-noising module 20 the same time as the first contact of the switch 22 is connecting the microphone 21 to the de-noising module 20. In a normal state, the second contact of the switch 22 is arranged to loop the estimator generator 23 on itself so as to adapt said estimator in real time according to the received noise y.

The estimator is provided for giving an estimation of noise that is a function of data which are collected in a vector x threw an input 24 of the device. The value of each component of the vector is given for instance by a sensor 25, 26, 27, 28 connected to the input 24. Here four sensors are represented but it will be easily understood that the invention can be implemented with any number of sensors more or less than four including a sole one sensor, in which case the vector x is simply a scalar x. The type of data is any that suits for an estimation of noise resulting from an event measurable by such data and liable to create or to contribute to the noise received by the microphone 21. For a non limitative illustration purpose only, the data can be an angle of a moving arm of a robot, a speed or acceleration, a spent power of a motor, a sound captured by another microphone.

A third contact of the switch 22 is arranged for connecting the input 24 to the de- noising module when the first contact is connecting the microphone 21 to the de- noising module. In that way, a real time estimation of noise ^Ay can be calculated with help of the estimator so as for the de-noising module to elaborate the estimate ^As in a similar way but not necessary the same as the one taught in WO2006/032760.

When not connecting the input 24 to the de-noising module 20, the third contact of the switch 22 is arranged to connect the input 24 to a shift register 29. An output of each cell is arranged to be connected to the estimator generator 23 when said cell receives from a preceding cell or from the input 24 a value x i of vector wherein a index i is comprised between 1 and n, 1 for the oldest value and n for the last one which is received threw the input 24. The manner for shifting the values in the register is not essential for the method according to the invention, it can be by means of a clock of the device in a manner usually known in the art for sampling or every time a new value is detected. Useful feature of the invention is that a noise is sampled at the same time as a new value X_n of vector shifts the preceding ones in the register.

The estimator generator 23 is arranged for starting a process of constructing the estimator when receiving from the shift register 29 a predetermined number L of values X ₁ with their index i less or equal than n and greater than n-L. The process is executed by running the now explained steps of a method implementing the invention.

Referring now to figure 2, the number L is predetermined by setting its value in an initialization step 100. The determination of said value per se is not in the scope of the invention, it can result from theoretical considerations or more practically from testing the device by a user providing successively different values of L up to achieve a more acceptable result on the estimation of the signal s by the de-noising module 20.

The estimator {α_1? x _l5 K} is provided by the estimator generator 23 for calculating an estimate ^Ay of noise in the form:

^ΛJ :=/«(*) = ∑>_!K(^)

The noise estimation function f relates to the vector x by a linear combination of expressions of a kernel function K when applied each to the current vector x and to a past value X ₁ of it. Said kernel function K to be used is any function that satisfies the Mercer condition. Mathematical sciences define that the Mercer conditions are satisfied when for any number n of complex values a; or a, and of vectors X ₁ or x _, with real values, ) gives a non negative real value. We can easy check that for

example the Gaussian function Y_^\x_i ,x_j )= e ^2σ2 satisfies the Mercer conditions.

Therefore this Gaussian function can be used for implementing the invention. Other Mercer conditions satisfying kernels are known and can also be used according to the best suited solution in the context of the exploitation of the device. Here is a non limitative list for illustrative purpose only: - a polynomial kernel in the form of K[X₁ , x_} ) = (l + X₁ ■ x_} f

- an exponential kernel in the form of κ(x_; , x J = e ^β"

- a sigmoidal kernel in the form of κ(x_;,x_y J= tanh(^_ox_; • X₇ + β₀ J.

The coefficients (X₁ are upgraded in real time by a loop of steps 101 to 108 wherein step 101 is triggered again for each new received value of data x considered as a supplementary last received data x _n the same time as a received noise y considered as the last received noise y_n. Each time step 101 is executed, a supplementary coefficient α_n is created with a value initialized to zero. The loop is executed for each value of n, said value being initialized in step 100 and incremented in step 103 or 108 to be ready for a following execution of the loop. Step 102 tests if number n is greater or equal to

L so as to furnish coefficients (X₁ for a total number n of received data at least equal to

L. So long as number n is less than L, step 103 is branched on step 101.

Considering a index i comprised between n-L+1 and n, the L last coefficients (X₁ of the estimator are calculated in step 107 by using the following formula:

Wherein Dy,(n) is a distance separating a noise y, from an estimation of that noise which is done with the estimation function fusing coefficients α_m(n-l) with the values they currently had when executing step 101. The noise y, is the one which was or is measured along the time when n was or is equal to j in step 101.

The set of coefficients θi(n) on the left side of the setting symbol ":=" is for the coefficients generated by the current execution of the loop with rank n wherein the coefficients (X₁(Ji-I) on the right side are those initialized to zero and or generated by a preceding execution of the loop with rank n-1.

In step 107,

is a coefficient on line n-i, column n-j in an inverse matrix of a h=k=L-\ kernel matrix %i » (ϊt) generated in step 104. Any known method of the art can be h=k=0 used for obtaining the inverse matrix of the kernel matrix. In step 104 executed before step 107 in case of a positive response to the test of step 102, the kernel matrix for the loop of rank n is generated by the formula: h=k=L-\

Zh,k (ⁿ) ^{:= K}(x»-^)⁺ ^Λ (³) h=k=0 Wherein ζ_{h k} is a matrix regularization parameter. In other words it is a value that is equal to zero when the index h is different of the index k and is equal to a constant when the two indexes h and k are equal. The regularization parameter assures that the h=k=L-\ matrix %r » (ϊl) has an inverse. h=k=0

The parameters p and μ_n are for improving the efficiency of the method and will be explained later. Without said parameters or with p and μ_n constants respectively equal to 0 and to 1 which is the same, the formulae used in step 107 are similar to the following ones:

DyM)Ay₁ -∑cc_m(n -l)-κ(x_m, xj (5) m=\ Mathematical considerations show that a setting of coefficients (X₁ according to formula (4) induces that for every j in the range of n-L+1 to n:

yj == ∑a_ι(n)-κ(x_ι,x_j ) (6) ι=l

It is interesting to note from formula (5) that by an execution of a following loop for a new value of noise J_n+1, the equation (6) has for effect that for every j in the range of n-L+1 to n, the distance DyJ(Ji +1) is equal to zero. The only distance which is different from zero is Dy_n+ι(n+l) which is given by:

Dy_n+l(n + (7)

Because the kernel function K satisfies the Mercer condition, it can be shown that greater is L, faster the distance Dy_n(n) is decreasing, in other words faster the method is converging.

Advantageously, the method comprises a step 105 wherein the coefficients (X₁ are divided by (1+p) with a regularization parameter or forgetting factor p having a value greater than zero. Therewith, each time step 105 is executed after step 102, the coefficients (X₁ are decreasing but in such a manner as to preserve the ratio between coefficients of any pair. Because divided many times, rather old coefficients, that is with the smaller indexes, become immaterial after a sufficient number of executions of the loop. The coefficients can be swapped out of the memory of the device executing the method, saving so much storage as time computing resources of it. For instance on figure 2, the coefficients (X₁ with i less than n-L+1 are specifically divided by (1+p) in step 105. The coefficients (X₁ with i greater than n-L are divided by (1+p) in step 107 according to formulae (1) and (2). In the following, every time the regularization parameter p is present in a formula, it will be understood that p is null, being the same as not being present for implementations without step 105.

Advantageously also, the formula (1) used in step 107 comprises a step size μ(n) which in this case is initialized in step 100 to a value μ(0). The value of the step size μ(n) can be a constant equal to μ(0) for every execution of step 107 or can be varied according to n. In that case the method comprises further a step 106 wherein the step size μ(n) is set to a value μ but limited by a minimum μ_min and a maximum μ_max. The values μmm and μ_max are set in initialization step 100 respectively to a value greater or equal to zero and to a value preferably less or equal than one. A possible formula for achieving that is: μ(n) := min(max( μ ,μ_min), μ_max)

Before setting μ(n) in step 106, the value μ is updated by the formula:

In the formula, a prediction error is given by:

Wherein a function^ of the vector x_n is given by the formula:

We see here that for example when j=0, the prediction error e_n° is the difference between the last received value of noise e_n° := y_n and a value which would have been predicted or estimated from the n-1 preceding measures with help of the function^. We see also here that not only a last value of noise is compared with an estimation of noise resulting from the current available function but that everyone of the L'-l preceding received values of noise y_n__} is compared with a value of noise which would have been estimated by the last available function^.

An adaptive size cost parameter γ, an adaptive step size cost order L', an adaptive step size cost insensitivity ε' and an adaptive step size recursive parameter η are respectively a positive real number greater than one, an integer less, equal or greater than L, a positive real number near to zero and a positive real number less than two which can be set in step 100 in case of step 106 existing. When the value of the size cost parameter γ is equal to one, we see that the updating of the value μ is independent of the prediction error. When furthermore the values of the adaptive step size recursive parameter η and of the regularization factor p are respectively equal to one and zero, a simple expression of the value μ is given by:

In the expression of μ , components β_m of a weight gradient are updated according to the formula:

For every index i comprised in the range of n-L+1 to n, the value of a gradient Δ_; is given by the formula:

^Δ, -

A second mode of implementation of the method according to the invention is now described in reference to figure 3 wherein steps 100 to 103 are similar to the ones of the first mode of implementation previously described in reference to figure 2.

Considering an index i comprised between n-L+1 and n, the L last coefficients (X₁ of the estimator are calculated in step 207 by using the following formula:

Wherein X_n^ (n) and λ_n__t (n) are Lagrange multipliers such as for the distance Dy _} (n) being less than ε for every j being in the range of n-L+1 to n, in other words:

- e ≤ iy_j - ∑«> -i)-κ(^χ _ffl,^χ ₇ )W

V m=\ J

More precisely, the Lagrange multipliers are calculated according the following sequence. A step 204 preceding step 206 is similar to previously described step 104 in that a

h=L-\ k=L-\ kernel matrix K (χ X t M^s calculated for having LxL coefficients, each h=0 k=0

coefficient of rank h, k, being equal to a kernel function of vectors x_n__h and x_n__k .

Furthermore a quadratic matrix Q(n) is constructed for having 2Lx2L coefficients given by:

Qh,_k - Qh₊LML - i^Xn-k r^~ζh,k

A linear vector p(n) having L components P_k and L components p_k+L, is given by the formula:

Wherein when k=n-j and 0 otherwise.

The matrix Q(n) and the linear vector p(n) are then input in a quadratic programming library that is arranged to produce in output values of X⁺ _^1 and A^~__: in the form of a vector A having 2L positive components such that:

A^τ(n) = {λl_ιa;J_M+ι + A^τp(n)j^λ

Any quadratic programming library arranged for calculating such an argument of a maximum value is adapted like for example the GQP library available on http : //www, gnu.org/softw are/gsl .

In formula (T), the step size μ(n) is not necessary. It can be a constant which when equal to 1 is the same has not being present. A variable step size improves the method. In that case the method comprises further a step 205 wherein the step size μ(n) is set to a value μ but limited by a minimum μ_min and a maximum μ_max. The values ^_n and μ_max are set in initialization step 100 respectively to a value greater or equal to zero and to a value preferably less or equal than one. A possible formula for achieving that is: μ(n) := min(max(μ ,μ_min), μ_max)

Before setting μ(n) in step 205, the value μ is updated by the formula:

In the formula, a prediction error e_n ^J is given by:

Wherein a function^ of the vector x_n _ is given by the formula:

We see here that for example when j=0, the prediction error e_n° is the difference between the last received value of noise y_n and a value which would have been predicted or estimated from the n-1 preceding measures with help of the function f_n. We see also here that not only a last value of noise is compared with an estimation of noise resulting from the current available function but that everyone of the L'-l preceding received values of noise y is compared with a value of noise which would have been estimated by the last available function^.

An adaptive size cost parameter γ, an adaptive step size cost order L', an adaptive step size cost insensitivity ε' and an adaptive step size recursive parameter η are respectively a positive real number greater than one, an integer less, equal or greater than L, a positive real number near to zero and a positive real number less than two which can be set in step 100 in case of step 106 existing. When the value of the size cost parameter γ is equal to one, we see that the updating of the value μ is independent of the prediction error value except for its sign. When furthermore the values of the adaptive step size recursive parameter η is equal to one, a simple expression of the value μ is given by:

In other words, the present method relates to a method for producing an estimator of noise which is an unknown function of physical quantities without necessity for said function to be linear comprising steps of:

- capturing (101) a sound and associating to said sound by a common index a first vector value of said physical quantities at the same time as the sound is captured; - repeating (102) said step of capturing, incrementing a value of the index each time said step is repeated, and when said index value is or has been incremented a number of times at least equal to a first integer (L) greater than one, each time said index value is incremented:

- setting values of the coefficients so as to a quantity of the last captured sounds equal to said first integer be substantially equal each to an occurrence of the linear combination wherein a second argument of the functions is the first vector value of the capturing step having the index value associated to the sound;

- performing the linear combination resulting from the current generated sequence to produce the estimator when a next captured sound is not pure noise.

Claims

Claims:

1. Method for producing an estimator of noise f, said noise y being an unknown function of physical quantities x, said estimator of noise being a linear combination of kernel functions K which satisfy Mercer conditions, said method comprising the steps of:

- capturing (101) a sound (y_n) and associating to said sound a vector value (X_n) of said physical quantities using a common index n for the time of capture;

- repeating (102) said step of capturing, incrementing the value of the index each time said step is repeated, and when said index value has been incremented a number of times at least equal to a first integer L greater than one, the step of repeating the step of capturing further comprising the steps of:

- defining an estimator of noise f_n for the current index value using a sequence of coefficients ((X₁) for a linear combination of the kernel functions K according to:

wherein a first argument of one of said kernel functions is the vector value Xf from the capturing step of rank i, and x a vector value of the physical quantities; - setting the values of the coefficients ((X₁) so as to:

y_j - ∑α i(n)- κ(i ^χi,^χ _.iJ ≤ ε i=l for the L last captured sounds y,, with j e {n-L+l,...,n}, ε being a fixed value, - using the current estimator of noise f_n when a next captured sound is not pure noise.

2. Method according to Claim 1 wherein when a previous sequence was generated in relation with a preceding capturing step, the values of the coefficients are set for the current sequence being at a minimum distance of the previous sequence according to a predetermined metric associated with sequences.

3. Method according to Claim 1 or 2 wherein one of the last captured sounds is considered substantially equal to said occurrence of the linear combination when a difference between the said one sound and the occurrence is in a sufficiently small interval comprising zero and wherein the values of the (L) more recently generated coefficients comprise a difference (λ^ —λ^~) between two Lagrange multipliers, a first one and a second one corresponding respectively to a positive limit and to a negative limit of said small interval.

4. Method according to Claim 3 wherein said difference between two Lagrange multipliers is multiplied by a step size which is updated according to said Lagrange multipliers.

5. Method according to Claim 1 or 2 wherein one of the last captured sounds is considered substantially equal to said occurrence of the linear combination when a difference between the said one sound and the occurrence is equal to zero and wherein the values of the (L) more recently generated coefficients comprise a difference between a last captured sound and the linear combination associated with a preceding value of said common index.

6. Method according to Claim 5 wherein said difference between the last captured sound and the linear combination is multiplied by a step size which is updated according to another difference which is between the last captured sound and a previous occurrence of the linear combination.

7. Method according to anyone of the preceding Claims wherein said coefficients are multiplied by a forgetting factor having a value less than one each time said common index is incremented.

8. System for producing an estimator of noise, said noise being an unknown function of physical quantities, said estimator of noise being a linear combination of kernel functions K which satisfy Mercer conditions, said system comprising: - a microphone (21) arranged for capturing a sound and means (22, 24, 29) arranged for associating to said sound a vector value of said physical quantities using a common index for the time of capture; - a generator (23) and a shift register (29) arranged for storing captured sounds and associating said vector value while incrementing the index value each time a sound is captured, the generator and the shift register being further arranged to, when said index value has been incremented a number of times at least equal to a first integer (L) greater than one:

- define an estimator of noise f_n for the current index value using a sequence of coefficients ((X₁) for a linear combination of the kernel functions K according to:

wherein a first argument of one of said kernel functions is the vector value Xf from the capturing step of rank i, and x a vector value of the physical quantities;

- setting the values of the coefficients ((X₁) so as to:

for the L last captured sounds y,, with j e {n-L+l,...,n}, ε being a fixed value, the generator and the shift register being further arranged to provide an estimator of noise using the current estimator of noise f_n when a next captured sound is not pure noise.

9. System according to Claim 8 wherein the generator (23) is arranged for setting the values of the coefficients of the current sequence being at a minimum distance of a previous sequence according to a predetermined metric associated with sequences.

10. System according to Claim 8 or 9 wherein one of the last captured sounds is considered substantially equal to said occurrence of the linear combination when a difference between the said one sound and the occurrence is in a sufficiently small interval comprising zero and wherein the values of the (L) more recently generated coefficients comprise a difference (λ^ —λ^~) between two Lagrange multipliers, a first one and a second one corresponding respectively to a positive limit and to a negative limit of said small interval.

11. Method according to Claim 8 or 9 wherein one of the last captured sounds is considered substantially equal to said occurrence of the linear combination when a difference between the said one sound and the occurrence is equal to zero and wherein the values of the (L) more recently generated coefficients comprise a difference between a last captured sound and the linear combination associated with a preceding value of said common index.

12. A computer program providing computer executable instructions, which when loaded onto a computer causes the computer to computer to perform the method according to Claims 1 to 7.

13. A medium bearing the computer program according to Claim 12.