EP1936608A1

EP1936608A1 - On-line learning method and system for speech denoising

Info

Publication number: EP1936608A1
Application number: EP06301278A
Authority: EP
Inventors: Vladimir FRANCE TELECOM JAPAN Braquet
Original assignee: France Telecom SA
Current assignee: Orange SA
Priority date: 2006-12-21
Filing date: 2006-12-21
Publication date: 2008-06-25
Also published as: WO2008074893A1

Abstract

The system produces an estimator of noise which is an unknown function of physical quantities without necessity for said function to be linear. A microphone (21) is arranged for capturing a sound and means (22, 24, 29) are arranged for associating a first vector value of the physical quantities to said sound by a common index at the same time as the sound is captured. A generator (23) and a shift register (29) are arranged for storing captured sounds and associated said first vector value by incrementing a value of the index each time a sound is captured. When said index value is or has been incremented a number of times at least equal to a first integer (L) greater than one, each time said index value is incremented, the generator (23) generates a current sequence of coefficients for a linear combination of functions which satisfy Mercer conditions wherein a first argument of one of the functions is the one of first vector values having the index value corresponding to a rank of the coefficient in the sequence and for setting values of the coefficients so as to a quantity of the last captured sounds equal to said first integer be substantially equal each to an occurrence of the linear combination wherein a second argument of the functions is another one of first vector values having the index value associated to the sound. The estimator is produced by the generator (23) performing the linear combination resulting from the current generated sequence when a next captured sound is not pure noise.

Description

This invention relates to digital signal processing and in particular to a method of processing a signal such as a speech signal, for example for use in cancellation of non-stationary noise resulting from external events that can be measured or informed.
Akinori Ito, Takashi Kanayama Motoyuki Suzuki and Shozo Makino show an example of usefulness of such methods and systems in their article entitled "Internal Noise Suppression for Speech Recognition by Small Robots" published in pages 2685-2688 of INTERSPEECH 2005. To suppress unstable noise, they must predict the spectrum of the noise frame by frame. To achieve this, they constructed a neural network that predicts the spectrum of the internal noise from the status of joints. At least 10 000 samples are required for a learning stage that points on a slow convergence which can be problematic particularly with databases becoming huge.
There are many other situations wherein speech denoising is useful like for the examples described in the international patent application W02006/032760 . Here are calculated one or more noise reduction filters from an estimated power spectral density (PSD) of the noise. The estimation of the PSD is not per se an object of W02006/032760 .
UK patent application GB2406487 discloses a modified affine projection algorithm for non stationary-signal. The affine projection algorithm (APA) presents fast convergence features and seems to be well adapted for filtering an echo which is correlated with the speech signal. The problem of the APA is that it is not applicable for filtering noise which is uncorrelated with the speech and which is a function of other changing physical values than voice, particularly when said function is non linear.
To avoid the problems of the prior art, an object of the invention is a method or an apparatus for generating an estimator of noise which is an unknown function of physical quantities without necessity for said function to be linear and that with good quality of a fast convergence.
Particularly the method according to the invention comprises steps of capturing a sound and associating to the sound by a common index a first vector value of physical quantities inducing the sound at the same time as the sound is captured. Step of capturing is repeated by incrementing a value of the index each time said step is repeated When said index value is or has been incremented a number of times at least equal to a first integer (L) greater than one, each time said index value is incremented, the method further comprises the steps of:

generating (107) a current sequence of coefficients for a linear combination of functions which satisfy Mercer conditions wherein a first argument of one of the functions is the first vector value of the one of the capturing steps having the index value corresponding to a rank of the coefficient in the sequence and
setting values of the coefficients so as to a quantity of the last captured sounds equal to said first integer be substantially equal each to an occurrence of the linear combination wherein a second argument of the functions is the first vector value of the capturing step having the index value associated to the sound.

Prefered modes of implementation of the invention are now described with reference to the drawings wherein:

figure 1 is schematic representation of a device according to the invention;
figure 2 presents steps of a first method implementation according to the invention;
figure 3 presents steps of a second method implementation according to the invention.

Figure 1 is a schematic representation of a device according to the invention for restitution of a signal s which is emitted in a noisy environment. The signal s is for instance a sound of voice type or other that needs to be cleaned from noise for exploitation purpose.
The device comprises a microphone 21 for capture of sound and a de-noising module 20 for providing an estimate ^s of the signal s. Therefore a first contact of a switch 22 between the microphone 21 and the de-noising module is arranged to connect the microphone 21 with the de-noising module 20 when the signal s is present so as to supply the de-noising module 20 with a received signal which comprises in that case the signal s and a noise y. The detection of the presence of the signal s is not an object of the invention, it can be realized by a voice activity detection (VAD) system, a camera detecting a person or any other system like for example simply a button.
When the signal s is not present, said first contact of the switch 22 is arranged to normally connect the microphone 21 to an estimator generator 23 so as to supply the estimator generator 23 with the noise y so long as the signal s is not present. The estimator generator is arranged according to the invention for providing an estimator {α_i, x _i, K} that can be used by the de-noising module 20 for subtracting an estimation of noise ^y from the signal r so as to provide the estimate ^s of the signal s. For that purpose, a second contact of the switch 22 is arranged for connecting the estimator generator 23 to the de-noising module 20 the same time as the first contact of the switch 22 is connecting the microphone 21 to the de-noising module 20. In a normal state, the second contact of the switch 22 is arranged to loop the estimator generator 23 on itself so as to adapt said estimator in real time according to the received noise y.
The estimator is provided for giving an estimation of noise that is a function of data which are collected in a vector x threw an input 24 of the device. The value of each component of the vector is given for instance by a sensor 25, 26, 27, 28 connected to the input 24. Here four sensors are represented but it will be easily understood that the invention can be implemented with any number of sensors more or less than four including a sole one sensor, in which case the vector x is simply a scalar x. The type of data is any that suits for an estimation of noise resulting from an event measurable by such data and liable to create or to contribute to the noise received by the microphone 21. For a non limitative illustration purpose only, the data can be an angle of a moving arm of a robot, a speed or acceleration, a spent power of a motor, a sound captured by another microphone.
A third contact of the switch 22 is arranged for connecting the input 24 to the denoising module when the first contact is connecting the microphone 21 to the denoising module. In that way, a real time estimation of noise ^y can be calculated with help of the estimator so as for the de-noising module to elaborate the estimate ^s in a similar way but not necessary the same as the one taught in W02006/032760 .
When not connecting the input 24 to the de-noising module 20, the third contact of the switch 22 is arranged to connect the input 24 to a shift register 29. An output of each cell is arranged to be connected to the estimator generator 23 when said cell receives from a preceding cell or from the input 24 a value x _i of vector wherein a index i is comprised between 1 and n, 1 for the oldest value and n for the last one which is received threw the input 24. The manner for shifting the values in the register is not essential for the method according to the invention, it can be by means of a clock of the device in a manner usually known in the art for sampling or every time a new value is detected. Useful feature of the invention is that a noise is sampled at the same time as a new value x _n of vector shifts the preceding ones in the register.
The estimator generator 23 is arranged for starting a process of constructing the estimator when receiving from the shift register 29 a predetermined number L of values x _i with their index i less or equal than n and greater than n-L. The process is executed by running the now explained steps of a method implementing the invention.
Referring now to figure 2, the number L is predetermined by setting its value in an initialization step 100. The determination of said value per se is not in the scope of the invention, it can result from theoretical considerations or more practically from testing the device by a user providing successively different values of L up to achieve a more acceptable result on the estimation of the signal s by the de-noising module 20.
The estimator {α_i, x _i, K} is provided by the estimator generator 23 for calculating an estimate ^y of noise in the form: ${}^{\land}y : = f (\vec{x}) = \sum_{i = 1}^{n} α_{i} K ({\vec{x}}_{i} \vec{x})$
The noise estimation function f relates to the vector x by a linear combination of expressions of a kernel function K when applied each to the current vector x and to a past value x _i of it.
Said kernel function K to be used is any function that satisfies the Mercer condition. Mathematical sciences define that the Mercer conditions are satisfied when for any number n of complex values a_i or a_j and of vectors x _i or x _j with real values, $\sum_{i = 0}^{n - 1} \sum_{j = 0}^{n - 1} a_{i} a_{j}^{*} K ({\vec{x}}_{i} {\vec{x}}_{j})$
gives a non negative real value. We can easy check that for example the Gaussian function $K ({\vec{x}}_{i} {\vec{x}}_{j}) = e^{- \frac{{‖ {\vec{x}}_{i} - {\vec{x}}_{j} ‖}^{2}}{2 σ^{2}}}$
satisfies the Mercer conditions. Therefore this Gaussian function can be used for implementing the invention. Other Mercer conditions satisfying kernels are known and can also be used according to the best suited solution in the context of the exploitation of the device. Here is a non limitative list for illustrative purpose only:

a polynomial kernel in the form of K( x _i,x _j ) = (1 + x _i · x _j ) ^q
an exponential kernel in the form of $K ({\vec{x}}_{i} {\vec{x}}_{j}) = e^{- \frac{‖ {\vec{x}}_{i} - {\vec{x}}_{j} ‖}{2 β_{0}}}$
a sigmoidal kernel in the form of K( x _i,x _j )= tanh(ξ₀ x _i · x _j + β₀).

The coefficients α_i are upgraded in real time by a loop of steps 101 to 108 wherein step 101 is triggered again for each new received value of data x considered as a supplementary last received data x _n the same time as a received noise y considered as the last received noise y_n . Each time step 101 is executed, a supplementary coefficient α_n is created with a value initialized to zero. The loop is executed for each value of n, said value being initialized in step 100 and incremented in step 103 or 108 to be ready for a following execution of the loop. Step 102 tests if number n is greater or equal to L so as to furnish coefficients α_i for a total number n of received data at least equal to L. So long as number n is less than L, step 103 is branched on step 101.
Considering a index i comprised between n-L+1 and n, the L last coefficients α_i of the estimator are calculated in step 107 by using the following formula: ${\{α_{i} (n)\}}_{i = n - L + 1}^{n} : = {\{\frac{α_{i} (n - 1)}{1 + ρ} + μ (n) \sum_{j = n - L + 1}^{n} χ_{n - i, n - j}^{- 1} (n) \cdot D y_{j} (n)\}}_{i = n - L + 1}^{n}$
Wherein Dy_j(n) is a distance separating a noise y_j from an estimation of that noise which is done with the estimation function f using coefficients α_m(n-1) with the values they currently had when executing step 101. The noise y_j is the one which was or is measured along the time when n was or is equal to j in step 101. $D y_{j} (n) : = (y_{j} - \sum_{m = 1}^{n - 1} \frac{α_{m} (n - 1)}{1 + ρ} K ({\vec{x}}_{m} {\vec{x}}_{j}))$
The set of coefficients α_i(n) on the left side of the setting symbol ":=" is for the coefficients generated by the current execution of the loop with rank n wherein the coefficients α_i(n-1) on the right side are those initialized to zero and or generated by a preceding execution of the loop with rank n-1.
In step 107, $χ_{n - i, n - j}^{- 1} (n)$
is a coefficient on line n-i, column n-j in an inverse matrix of a kernel matrix ${χ_{h, k} (n)}_{h = k = 0}^{h = k = L - 1}$
generated in step 104. Any known method of the art can be used for obtaining the inverse matrix of the kernel matrix. In step 104 executed before step 107 in case of a positive response to the test of step 102, the kernel matrix for the loop of rank n is generated by the formula: ${χ_{h, k} (n)}_{h = k = 0}^{h = k = L - 1} : = K ({\vec{x}}_{n - h} {\vec{x}}_{n - k}) + ζ_{n, k}$
Wherein ζ_h,k is a matrix regularization parameter. In other words it is a value that is equal to zero when the index h is different of the index k and is equal to a constant when the two indexes h and k are equal. The regularization parameter assures that the matrix ${χ_{h, k} (n)}_{h = k = 0}^{h = k = L - 1}$
has an inverse.
The parameters ρ and µ_n are for improving the efficiency of the method and will be explained later. Without said parameters or with ρ and µ_n constants respectively equal to 0 and to 1 which is the same, the formulae used in step 107 are similar to the following ones: ${\{α_{i} (n)\}}_{i = n - L + 1}^{n} : = {\{α_{i} (n - 1) + \sum_{j = n - L + 1}^{n} χ_{n - i, n - j}^{- 1} (n) \cdot D y_{j} (n)\}}_{i = n - L + 1}^{n}$
$D y_{j} (n) : = (y_{j} - \sum_{m = 1}^{n - 1} α_{m} (n - 1) \cdot K ({\vec{x}}_{m} {\vec{x}}_{j}))$
Mathematical considerations show that a setting of coefficients α_i according to formula (4) induces that for every j in the range of n-L+1 to n: $y_{j} = \sum_{i = 1}^{n} α_{i} (n) \cdot K ({\vec{x}}_{i} {\vec{x}}_{j})$
It is interesting to note from formula (5) that by an execution of a following loop for a new value of noise y _n+1, the equation (6) has for effect that for every j in the range of n-L+1 to n, the distance Dy _j(n+1) is equal to zero. The only distance which is different from zero is Dy _n+1(n+1) which is given by: $D y_{n + 1} (n + 1) : = (y_{n + 1} - \sum_{m = 1}^{n} α_{m} (n) \cdot K ({\vec{x}}_{m} {\vec{x}}_{n + 1}))$
Because the kernel function K satisfies the Mercer condition, it can be shown that greater is L, faster the distance Dy _n(n) is decreasing, in other words faster the method is converging.
Advantageously, the method comprises a step 105 wherein the coefficients α_i are divided by (1+p) with a regularization parameter or forgetting factor ρ having a value greater than zero. Therewith, each time step 105 is executed after step 102, the coefficients α_i are decreasing but in such a manner as to preserve the ratio between coefficients of any pair. Because divided many times, rather old coefficients, that is with the smaller indexes, become immaterial after a sufficient number of executions of the loop. The coefficients can be swapped out of the memory of the device executing the method, saving so much storage as time computing resources of it. For instance on figure 2, the coefficients α_i with i less than n-L+1 are specifically divided by (1+p) in step 105. The coefficients α_i with i greater than n-L are divided by (1+p) in step 107 according to formulae (1) and (2). In the following, every time the regularization parameter ρ is present in a formula, it will be understood that ρ is null, being the same as not being present for implementations without step 105.
Advantageously also, the formula (1) used in step 107 comprises a step size µ(n) which in this case is initialized in step 100 to a value µ(0). The value of the step size µ(n) can be a constant equal to µ(0) for every execution of step 107 or can be varied according to n. In that case the method comprises further a step 106 wherein the step size µ(n) is set to a value µ̃ but limited by a minimum µ_min and a maximum µ_max. The values µ_min and µ_max are set in initialization step 100 respectively to a value greater or equal to zero and to a value preferably less or equal than one. A possible formula for achieving that is: $μ (n) : = \min (\max (\tilde{μ} μ_{\min}), μ_{\max})$
Before setting µ(n) in step 106, the value µ̃ is updated by the formula: $\tilde{μ} : = μ (n - 1) + \frac{η}{1 + ρ} \sum_{j = 0}^{Lʹ - 1} sign (e_{n}^{j}) {|e_{n}^{j}|}_{εʹ}^{γ - 1} (\sum_{m = 1}^{n - 1} β_{m} K ({\vec{x}}_{m} {\vec{x}}_{n - j}))$
In the formula, a prediction error is given by: $e_{n}^{j} : = y_{n - j} - f_{n} ({\vec{x}}_{n - j})$
Wherein a function f _n of the vector x _n-j is given by the formula: $f_{n} ({\vec{x}}_{n - j}) : = \sum_{i = 1}^{n - 1} α_{i} K ({\vec{x}}_{i} {\vec{x}}_{n - j})$
We see here that for example when j=0, the prediction error $e_{n}$ $_{0}$
is the difference between the last received value of noise $e_{n}^{0} : = y_{n}$
and a value which would have been predicted or estimated from the n-1 preceding measures with help of the function f _n. We see also here that not only a last value of noise is compared with an estimation of noise resulting from the current available function but that everyone of the L'-1 preceding received values of noise y_n-j is compared with a value of noise which would have been estimated by the last available function f_n .
An adaptive size cost parameter γ, an adaptive step size cost order L', an adaptive step size cost insensitivity ε' and an adaptive step size recursive parameter η are respectively a positive real number greater than one, an integer less, equal or greater than L, a positive real number near to zero and a positive real number less than two which can be set in step 100 in case of step 106 existing. When the value of the size cost parameter γ is equal to one, we see that the updating of the value µ̃ is independent of the prediction error. When furthermore the values of the adaptive step size recursive parameter η and of the regularization factor ρ are respectively equal to one and zero, a simple expression of the value µ̃ is given by: $\tilde{μ} : = μ_{n - 1} + \sum_{j = 0}^{Lʹ - 1} (\sum_{m = 1}^{n - 1} β_{m} K ({\vec{x}}_{m} {\vec{x}}_{n - j}))$
In the expression of µ̃, components β_m of a weight gradient are updated according to the formula: ${\{β_{i}\}}_{i = 1}^{n} : = \frac{1}{1 + ρ} {\{β_{i}\}}_{i = 1}^{n - 1} + {\{Δ_{i}\}}_{i = n - L + 1}^{n}$
For every index i comprised in the range of n-L+1 to n, the value of a gradient Δ _i is given by the formula: $Δ_{i} : = \sum_{k = 0}^{L - 1} χ_{n [n - i, k]}^{- 1} (y_{n - k} - \sum_{i = 1}^{n - 1} (α_{m} + μ_{n - 1} β_{m}) \frac{K ({\vec{x}}_{i} {\vec{x}}_{n - j})}{1 + ρ})$
A second mode of implementation of the method according to the invention is now described in reference to figure 3 wherein steps 100 to 103 are similar to the ones of the first mode of implementation previously described in reference to figure 2.
Considering an index i comprised between n-L+1 and n, the L last coefficients α_i of the estimator are calculated in step 207 by using the following formula: ${\{α_{i} (n)\}}_{i = n - L + 1}^{n} : = {\{\frac{α_{i} (n - 1)}{1 + ρ} + μ (n) \frac{λ_{n - 1}^{+} (n) - λ_{n - 1}^{-} (n)}{1 + ρ}\}}_{i = n - L + 1}^{n}$
Wherein $λ_{n - i}^{+} (n)$
and $λ_{n - i}^{-} (n)$
are Lagrange multipliers such as for the distance Dy_j (n) being less than ε for every j being in the range of n-L+1 to n, in other words: $- ε \leq (y_{j} - \sum_{m = 1}^{n - 1} α_{m} (n - 1) \cdot K ({\vec{x}}_{m} {\vec{x}}_{j})) \leq ε$
More precisely, the Lagrange multipliers are calculated according the following sequence.
A step 204 preceding step 206 is similar to previously described step 104 in that a kernel matrix ${K_{h = 0}^{k = L - 1}}_{k = 0}^{h = L - 1} ({\vec{x}}_{n - h} {\vec{x}}_{n - k})$
is calculated for having LxL coefficients, each coefficient of rank h, k, being equal to a kernel function of vectors x _n-h and x _n-k. Furthermore a quadratic matrix Q(n) is constructed for having 2Lx2L coefficients given by: $Q_{h, k} = Q_{h + L, k + L} = \frac{1}{1 + ρ} K ({\vec{x}}_{n - h} {\vec{x}}_{n - k}_{h = k = 0}^{h = k = L - 1}) + ζ_{h, k}$
$Q_{h + L, k} = Q_{h, k + L} = \frac{- 1}{1 + ρ} K ({\vec{x}}_{n - h}, {\vec{x}}_{n - k}_{h = k = 0}^{h = k = L - 1})$
A linear vector p(n) having L components p_k and L components p_k+L, is given by the formula: $\{\begin{array}{l} p_{k} \\ p_{k + L} \end{array}\} : = \{\begin{array}{l} δ_{k, n - j} (y_{j} - \sum_{m = 1}^{n - 1} α_{m} (n - 1) \cdot K ({\vec{x}}_{m} {\vec{x}}_{j}) - ε) \\ δ_{k, n - j} (- y_{j} - \sum_{m = 1}^{n - 1} α_{m} (n - 1) \cdot K ({\vec{x}}_{m} {\vec{x}}_{j}) - ε) \end{array}\}$
Wherein δ_k,n-j=1 when k=n-j and 0 otherwise.
The matrix Q(n) and the linear vector p(n) are then input in a quadratic programming library that is arranged to produce in output values of $λ_{n - i}^{+}$
and $λ_{n - i}^{-}$
in the form of a vector A having 2L positive components such that: $Λ^{T} (n) = {\{λ_{n - i}^{+} λ_{n - i}^{-}\}}_{i = n - L + 1}^{n} : = Arg \max (- \frac{1}{2} Λ^{T} Q (n) Λ + Λ^{T} p (n))$
Any quadratic programming library arranged for calculating such an argument of a maximum value is adapted like for example the GQP library available on http://www.gnu.org/software/gsl.
In formula (1'), the step size µ(n) is not necessary. It can be a constant which when equal to 1 is the same has not being present. A variable step size improves the method. In that case the method comprises further a step 205 wherein the step size µ(n) is set to a value µ̃ but limited by a minimum µ_min and a maximum µ_max. The values µ_min and µ_max are set in initialization step 100 respectively to a value greater or equal to zero and to a value preferably less or equal than one. A possible formula for achieving that is: $μ (n) : = \min (\max (\tilde{μ} μ_{\min}), μ_{\max})$
Before setting µ(n) in step 205, the value µ̃ is updated by the formula: $\tilde{μ} : = μ (n - 1) + \frac{η}{1 + ρ} \sum_{j = 0}^{Lʹ - 1} sign (e_{n}^{j}) {|e_{n}^{j}|}_{εʹ}^{γ - 1} (\sum_{m = 1}^{n - 1} (λ_{n - i}^{+} - λ_{n - i}^{-}) K ({\vec{x}}_{m} {\vec{x}}_{n - j}))$
In the formula, a prediction error $e_{n}^{j}$
is given by: $e_{n}^{j} : = y_{n - j} - f_{n} ({\vec{x}}_{n - j})$
Wherein a function f _n of the vector x _n-j is given by the formula: $f_{n} ({\vec{x}}_{n - j}) : = \sum_{i = 1}^{n - 1} α_{i} (n - 1) K ({\vec{x}}_{i} {\vec{x}}_{n - j})$
We see here that for example when j=0, the prediction error $e_{n}$ $_{0}$
is the difference between the last received value of noise y_n and a value which would have been predicted or estimated from the n-1 preceding measures with help of the function f _n. We see also here that not only a last value of noise is compared with an estimation of noise resulting from the current available function but that everyone of the L'-1 preceding received values of noise y_n-j is compared with a value of noise which would have been estimated by the last available function f _n.
An adaptive size cost parameter γ, an adaptive step size cost order L', an adaptive step size cost insensitivity ε' and an adaptive step size recursive parameter η are respectively a positive real number greater than one, an integer less, equal or greater than L, a positive real number near to zero and a positive real number less than two which can be set in step 100 in case of step 106 existing. When the value of the size cost parameter γ is equal to one, we see that the updating of the value µ̃ is independent of the prediction error value except for its sign. When furthermore the values of the adaptive step size recursive parameter η is equal to one, a simple expression of the value µ̃ is given by: $\tilde{μ} : = μ (n - 1) + \frac{1}{1 + ρ} \sum_{j = 0}^{Lʹ - 1} sign (e_{n}^{j}) (\sum_{i = n - L}^{n - 1} (λ_{n - i}^{+} - λ_{n - i}^{-}) K ({\vec{x}}_{i} {\vec{x}}_{n - j}))$

Claims

Method for producing an estimator of noise which is an unknown function of physical quantities without necessity for said function to be linear comprising steps of:
- capturing (101) a sound and associating to said sound by a common index a first vector value of said physical quantities at the same time as the sound is captured;

- repeating (102) said step of capturing, incrementing a value of the index each time said step is repeated, and when said index value is or has been incremented a number of times at least equal to a first integer (L) greater than one, each time said index value is incremented:
- generating (107) a current sequence of coefficients for a linear combination of functions which satisfy Mercer conditions wherein a first argument of one of the functions is the first vector value of the one of the capturing steps having the index value corresponding to a rank of the coefficient in the sequence and

- setting values of the coefficients so as to a quantity of the last captured sounds equal to said first integer be substantially equal each to an occurrence of the linear combination wherein a second argument of the functions is the first vector value of the capturing step having the index value associated to the sound;

- performing the linear combination resulting from the current generated sequence to produce the estimator when a next captured sound is not pure noise.
Method according to Claim 1 wherein when a previous sequence was generated in relation with a preceding capturing step, the values of the coefficients are set for the current sequence being at a minimum distance of the previous sequence according to a predetermined metric associated with sequences.
Method according to Claim 1 or 2 wherein one of the last captured sounds is considered substantially equal to said occurrence of the linear combination when a difference between the said one sound and the occurrence is in a sufficiently small interval comprising zero and wherein the values of the (L) more recently generated coefficients comprise a difference $(λ_{i}^{+} - λ_{i}^{-})$
between two Lagrange multipliers, a first one and a second one corresponding respectively to a positive limit and to a negative limit of said small interval.
Method according to Claim 3 wherein said difference between two Lagrange multipliers is multiplied by a step size which is updated according to said Lagrange multipliers.
Method according to Claim 1 or 2 wherein one of the last captured sounds is considered substantially equal to said occurrence of the linear combination when a difference between the said one sound and the occurrence is equal to zero and wherein the values of the (L) more recently generated coefficients comprise a difference between a last captured sound and the linear combination associated with a preceding value of said common index.
Method according to Claim 5 wherein said difference between the last captured sound and the linear combination is multiplied by a step size which is updated according to another difference which is between the last captured sound and a previous occurrence of the linear combination.
Method according to anyone of the preceding Claims wherein said coefficients are multiplied by a forgetting factor having a value less than one each time said common index is incremented.
System for producing an estimator of noise which is an unknown function of physical quantities without necessity for said function to be linear comprising:
- a microphone (21) arranged for capturing a sound and means (22, 24, 29) arranged for associating to said sound by a common index a first vector value of said physical quantities at the same time as the sound is captured;

- a generator (23) and a shift register (29) arranged for storing captured sounds and associated said first vector value by incrementing a value of the index each time a sound is captured, and when said index value is or has been incremented a number of times at least equal to a first integer (L) greater than one, each time said index value is incremented:
the generator (23) is arranged for generating a current sequence of coefficients for a linear combination of functions which satisfy Mercer conditions wherein a first argument of one of the functions is the one of first vector values having the index value corresponding to a rank of the coefficient in the sequence and

for setting values of the coefficients so as to a quantity of the last captured sounds equal to said first integer be substantially equal each to an occurrence of the linear combination wherein a second argument of the functions is another one of first vector values having the index value associated to the sound;

- so as to produce the estimator when a next captured sound is not pure noise by performing the linear combination resulting from the current generated sequence.
System according to Claim 8 wherein the generator (23) is arranged for setting the values of the coefficients of the current sequence being at a minimum distance of a previous sequence according to a predetermined metric associated with sequences.
System according to Claim 8 or 9 wherein one of the last captured sounds is considered substantially equal to said occurrence of the linear combination when a difference between the said one sound and the occurrence is in a sufficiently small interval comprising zero and wherein the values of the (L) more recently generated coefficients comprise a difference $(λ_{i}^{+} - λ_{i}^{-})$
between two Lagrange multipliers, a first one and a second one corresponding respectively to a positive limit and to a negative limit of said small interval.
Method according to Claim 8 or 9 wherein one of the last captured sounds is considered substantially equal to said occurrence of the linear combination when a difference between the said one sound and the occurrence is equal to zero and wherein the values of the (L) more recently generated coefficients comprise a difference between a last captured sound and the linear combination associated with a preceding value of said common index.