WO2008074893A1 - On-line learning method and system for speech denoising - Google Patents

On-line learning method and system for speech denoising Download PDF

Info

Publication number
WO2008074893A1
WO2008074893A1 PCT/EP2007/064515 EP2007064515W WO2008074893A1 WO 2008074893 A1 WO2008074893 A1 WO 2008074893A1 EP 2007064515 W EP2007064515 W EP 2007064515W WO 2008074893 A1 WO2008074893 A1 WO 2008074893A1
Authority
WO
WIPO (PCT)
Prior art keywords
sound
value
noise
linear combination
coefficients
Prior art date
Application number
PCT/EP2007/064515
Other languages
French (fr)
Inventor
Vladimir Braquet
Original Assignee
France Telecom
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by France Telecom filed Critical France Telecom
Publication of WO2008074893A1 publication Critical patent/WO2008074893A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering

Definitions

  • This invention relates to digital signal processing and in particular to a method of processing a signal such as a speech signal, for example for use in cancellation of non-stationary noise resulting from external events that can be measured or informed.
  • UK patent application GB2406487 discloses a modified aff ⁇ ne projection algorithm for non stationary-signal.
  • the affine projection algorithm (APA) presents fast convergence features and seems to be well adapted for filtering an echo which is correlated with the speech signal.
  • the problem of the APA is that it is not applicable for filtering noise which is uncorrelated with the speech and which is a function of other changing physical values than voice, particularly when said function is non linear.
  • an object of the invention is a method or an apparatus for generating an estimator of noise which is an unknown function of physical quantities without necessity for said function to be linear and that with good quality of a fast convergence.
  • the method according to the invention comprises steps of capturing a sound and associating to the sound by a common index a first vector value of physical quantities inducing the sound at the same time as the sound is captured. Step of capturing is repeated by incrementing a value of the index each time said step is repeated When said index value is or has been incremented a number of times at least equal to a first integer (L) greater than one, each time said index value is incremented, the method further comprises the steps of:
  • the linear combination resulting from the current generated sequence is performed to produce the estimator when a next captured sound is not pure noise.
  • FIG. 1 is schematic representation of a device according to the invention.
  • FIG. 2 presents steps of a first method implementation according to the invention
  • FIG. 3 presents steps of a second method implementation according to the invention.
  • Figure 1 is a schematic representation of a device according to the invention for restitution of a signal s which is emitted in a noisy environment.
  • the signal s is for instance a sound of voice type or other that needs to be cleaned from noise for exploitation purpose.
  • the device comprises a microphone 21 for capture of sound and a de-noising module 20 for providing an estimate ⁇ s of the signal s. Therefore a first contact of a switch 22 between the microphone 21 and the de-noising module is arranged to connect the microphone 21 with the de -noising module 20 when the signal s is present so as to supply the de-noising module 20 with a received signal r which comprises in that case the signal s and a noise y.
  • the detection of the presence of the signal s is not an object of the invention, it can be realized by a voice activity detection (VAD) system, a camera detecting a person or any other system like for example simply a button.
  • VAD voice activity detection
  • said first contact of the switch 22 is arranged to normally connect the microphone 21 to an estimator generator 23 so as to supply the estimator generator 23 with the noise y so long as the signal s is not present.
  • the estimator generator is arranged according to the invention for providing an estimator ((X 1 , x ! , K ⁇ that can be used by the de-noising module 20 for subtracting an estimation of noise A y from the signal r so as to provide the estimate A s of the signal s.
  • a second contact of the switch 22 is arranged for connecting the estimator generator 23 to the de-noising module 20 the same time as the first contact of the switch 22 is connecting the microphone 21 to the de-noising module 20.
  • the second contact of the switch 22 is arranged to loop the estimator generator 23 on itself so as to adapt said estimator in real time according to the received noise y.
  • the estimator is provided for giving an estimation of noise that is a function of data which are collected in a vector x threw an input 24 of the device.
  • the value of each component of the vector is given for instance by a sensor 25, 26, 27, 28 connected to the input 24.
  • the vector x is simply a scalar x.
  • the type of data is any that suits for an estimation of noise resulting from an event measurable by such data and liable to create or to contribute to the noise received by the microphone 21.
  • the data can be an angle of a moving arm of a robot, a speed or acceleration, a spent power of a motor, a sound captured by another microphone.
  • a third contact of the switch 22 is arranged for connecting the input 24 to the de- noising module when the first contact is connecting the microphone 21 to the de- noising module.
  • a real time estimation of noise A y can be calculated with help of the estimator so as for the de-noising module to elaborate the estimate A s in a similar way but not necessary the same as the one taught in WO2006/032760.
  • the third contact of the switch 22 is arranged to connect the input 24 to a shift register 29.
  • An output of each cell is arranged to be connected to the estimator generator 23 when said cell receives from a preceding cell or from the input 24 a value x i of vector wherein a index i is comprised between 1 and n, 1 for the oldest value and n for the last one which is received threw the input 24.
  • the manner for shifting the values in the register is not essential for the method according to the invention, it can be by means of a clock of the device in a manner usually known in the art for sampling or every time a new value is detected.
  • Useful feature of the invention is that a noise is sampled at the same time as a new value X n of vector shifts the preceding ones in the register.
  • the estimator generator 23 is arranged for starting a process of constructing the estimator when receiving from the shift register 29 a predetermined number L of values X 1 with their index i less or equal than n and greater than n-L. The process is executed by running the now explained steps of a method implementing the invention.
  • the number L is predetermined by setting its value in an initialization step 100.
  • the determination of said value per se is not in the scope of the invention, it can result from theoretical considerations or more practically from testing the device by a user providing successively different values of L up to achieve a more acceptable result on the estimation of the signal s by the de-noising module 20.
  • the estimator ⁇ 1? x l5 K ⁇ is provided by the estimator generator 23 for calculating an estimate A y of noise in the form:
  • the noise estimation function f relates to the vector x by a linear combination of expressions of a kernel function K when applied each to the current vector x and to a past value X 1 of it.
  • Said kernel function K to be used is any function that satisfies the Mercer condition.
  • Mathematical sciences define that the Mercer conditions are satisfied when for any number n of complex values a; or a, and of vectors X 1 or x _, with real values, ) gives a non negative real value.
  • the coefficients (X 1 are upgraded in real time by a loop of steps 101 to 108 wherein step 101 is triggered again for each new received value of data x considered as a supplementary last received data x n the same time as a received noise y considered as the last received noise y n .
  • step 101 is executed, a supplementary coefficient ⁇ n is created with a value initialized to zero.
  • the loop is executed for each value of n, said value being initialized in step 100 and incremented in step 103 or 108 to be ready for a following execution of the loop.
  • Step 102 tests if number n is greater or equal to
  • step 103 is branched on step 101.
  • the L last coefficients (X 1 of the estimator are calculated in step 107 by using the following formula:
  • Dy,(n) is a distance separating a noise y, from an estimation of that noise which is done with the estimation function fusing coefficients ⁇ m (n-l) with the values they currently had when executing step 101.
  • the noise y is the one which was or is measured along the time when n was or is equal to j in step 101.
  • ⁇ h k is a matrix regularization parameter. In other words it is a value that is equal to zero when the index h is different of the index k and is equal to a constant when the two indexes h and k are equal.
  • step 107 The parameters p and ⁇ n are for improving the efficiency of the method and will be explained later. Without said parameters or with p and ⁇ n constants respectively equal to 0 and to 1 which is the same, the formulae used in step 107 are similar to the following ones:
  • the method comprises a step 105 wherein the coefficients (X 1 are divided by (1+p) with a regularization parameter or forgetting factor p having a value greater than zero.
  • the coefficients (X 1 are decreasing but in such a manner as to preserve the ratio between coefficients of any pair. Because divided many times, rather old coefficients, that is with the smaller indexes, become immaterial after a sufficient number of executions of the loop.
  • the coefficients can be swapped out of the memory of the device executing the method, saving so much storage as time computing resources of it. For instance on figure 2, the coefficients (X 1 with i less than n-L+1 are specifically divided by (1+p) in step 105.
  • the formula (1) used in step 107 comprises a step size ⁇ (n) which in this case is initialized in step 100 to a value ⁇ (0).
  • the value of the step size ⁇ (n) can be a constant equal to ⁇ (0) for every execution of step 107 or can be varied according to n.
  • the method comprises further a step 106 wherein the step size ⁇ (n) is set to a value ⁇ but limited by a minimum ⁇ min and a maximum ⁇ max .
  • the values ⁇ mm and ⁇ max are set in initialization step 100 respectively to a value greater or equal to zero and to a value preferably less or equal than one.
  • An adaptive size cost parameter ⁇ , an adaptive step size cost order L', an adaptive step size cost insensitivity ⁇ ' and an adaptive step size recursive parameter ⁇ are respectively a positive real number greater than one, an integer less, equal or greater than L, a positive real number near to zero and a positive real number less than two which can be set in step 100 in case of step 106 existing.
  • the value of the size cost parameter ⁇ is equal to one, we see that the updating of the value ⁇ is independent of the prediction error.
  • a simple expression of the value ⁇ is given by:
  • the L last coefficients (X 1 of the estimator are calculated in step 207 by using the following formula:
  • X n ⁇ (n) and ⁇ n _ t (n) are Lagrange multipliers such as for the distance Dy ⁇ (n) being less than ⁇ for every j being in the range of n-L+1 to n, in other words:
  • step 204 preceding step 206 is similar to previously described step 104 in that a
  • coefficient of rank h, k being equal to a kernel function of vectors x n _ h and x n _ k .
  • quadratic matrix Q(n) is constructed for having 2Lx2L coefficients given by:
  • the matrix Q(n) and the linear vector p(n) are then input in a quadratic programming library that is arranged to produce in output values of X + ⁇ 1 and A ⁇ _ : in the form of a vector A having 2L positive components such that:
  • a ⁇ (n) ⁇ l ⁇ a;J M+ ⁇ + A ⁇ p(n)j ⁇
  • Any quadratic programming library arranged for calculating such an argument of a maximum value is adapted like for example the GQP library available on http : //www, gnu.org/softw are/gsl .
  • the step size ⁇ (n) is not necessary. It can be a constant which when equal to 1 is the same has not being present.
  • a variable step size improves the method.
  • the method comprises further a step 205 wherein the step size ⁇ (n) is set to a value ⁇ but limited by a minimum ⁇ min and a maximum ⁇ max .
  • the values ⁇ n and ⁇ max are set in initialization step 100 respectively to a value greater or equal to zero and to a value preferably less or equal than one.
  • the prediction error e n ° is the difference between the last received value of noise y n and a value which would have been predicted or estimated from the n-1 preceding measures with help of the function f n .
  • the prediction error e n ° is the difference between the last received value of noise y n and a value which would have been predicted or estimated from the n-1 preceding measures with help of the function f n .
  • not only a last value of noise is compared with an estimation of noise resulting from the current available function but that everyone of the L'-l preceding received values of noise y is compared with a value of noise which would have been estimated by the last available function ⁇ .
  • An adaptive size cost parameter ⁇ , an adaptive step size cost order L', an adaptive step size cost insensitivity ⁇ ' and an adaptive step size recursive parameter ⁇ are respectively a positive real number greater than one, an integer less, equal or greater than L, a positive real number near to zero and a positive real number less than two which can be set in step 100 in case of step 106 existing.
  • the value of the size cost parameter ⁇ is equal to one, we see that the updating of the value ⁇ is independent of the prediction error value except for its sign.
  • a simple expression of the value ⁇ is given by:
  • the present method relates to a method for producing an estimator of noise which is an unknown function of physical quantities without necessity for said function to be linear comprising steps of:
  • - capturing (101) a sound and associating to said sound by a common index a first vector value of said physical quantities at the same time as the sound is captured; - repeating (102) said step of capturing, incrementing a value of the index each time said step is repeated, and when said index value is or has been incremented a number of times at least equal to a first integer (L) greater than one, each time said index value is incremented:

Abstract

The system produces an estimator of noise which is an unknown function of physical quantities without necessity for said function to be linear. A microphone (21) is arranged for capturing a sound and means (22, 24, 29) are arranged for associating a first vector value of the physical quantities to said sound by a common index at the same time as the sound is captured. A generator (23) and a shift register (29) are arranged for storing captured sounds and associated said first vector value by incrementing a value of the index each time a sound is captured. When said index value is or has been incremented a number of times at least equal to a first integer (L) greater than one, each time said index value is incremented, the generator (23) generates a current sequence of coefficients for a linear combination of functions which satisfy Mercer conditions wherein a first argument of one of the functions is the one of first vector values having the index value corresponding to a rank of the coefficient in the sequence and for setting values of the coefficients so as to a quantity of the last captured sounds equal to said first integer be substantially equal each to an occurrence of the linear combination wherein a second argument of the functions is another one of first vector values having the index value associated to the sound. The estimator is produced by the generator (23) performing the linear combination resulting from the current generated sequence when a next captured sound is not pure noise.

Description

On-line learning method and system for speech denoising.
This invention relates to digital signal processing and in particular to a method of processing a signal such as a speech signal, for example for use in cancellation of non-stationary noise resulting from external events that can be measured or informed.
Akinori Ito, Takashi Kanayama Motoyuki Suzuki and Shozo Makino show an example of usefulness of such methods and systems in their article entitled "Internal Noise Suppression for Speech Recognition by Small Robots" published in pages 2685-2688 of INTERSPEECH 2005. To suppress unstable noise, they must predict the spectrum of the noise frame by frame. To achieve this, they constructed a neural network that predicts the spectrum of the internal noise from the status of joints. At least 10 000 samples are required for a learning stage that points on a slow convergence which can be problematic particularly with databases becoming huge.
There are many other situations wherein speech denoising is useful like for the examples described in the international patent application WO2006/032760. Here are calculated one or more noise reduction filters from an estimated power spectral density (PSD) of the noise. The estimation of the PSD is not per se an object of WO2006/032760.
UK patent application GB2406487 discloses a modified affϊne projection algorithm for non stationary-signal. The affine projection algorithm (APA) presents fast convergence features and seems to be well adapted for filtering an echo which is correlated with the speech signal. The problem of the APA is that it is not applicable for filtering noise which is uncorrelated with the speech and which is a function of other changing physical values than voice, particularly when said function is non linear.
To avoid the problems of the prior art, an object of the invention is a method or an apparatus for generating an estimator of noise which is an unknown function of physical quantities without necessity for said function to be linear and that with good quality of a fast convergence. Particularly the method according to the invention comprises steps of capturing a sound and associating to the sound by a common index a first vector value of physical quantities inducing the sound at the same time as the sound is captured. Step of capturing is repeated by incrementing a value of the index each time said step is repeated When said index value is or has been incremented a number of times at least equal to a first integer (L) greater than one, each time said index value is incremented, the method further comprises the steps of:
- generating (107) a current sequence of coefficients for a linear combination of functions which satisfy Mercer conditions wherein a first argument of one of the functions is the first vector value of the one of the capturing steps having the index value corresponding to a rank of the coefficient in the sequence and
- setting values of the coefficients so as to a quantity of the last captured sounds equal to said first integer be substantially equal each to an occurrence of the linear combination wherein a second argument of the functions is the first vector value of the capturing step having the index value associated to the sound.
The linear combination resulting from the current generated sequence is performed to produce the estimator when a next captured sound is not pure noise.
Prefered modes of implementation of the invention are now described with reference to the drawings wherein:
- figure 1 is schematic representation of a device according to the invention;
- figure 2 presents steps of a first method implementation according to the invention;
- figure 3 presents steps of a second method implementation according to the invention.
Figure 1 is a schematic representation of a device according to the invention for restitution of a signal s which is emitted in a noisy environment. The signal s is for instance a sound of voice type or other that needs to be cleaned from noise for exploitation purpose.
The device comprises a microphone 21 for capture of sound and a de-noising module 20 for providing an estimate Λs of the signal s. Therefore a first contact of a switch 22 between the microphone 21 and the de-noising module is arranged to connect the microphone 21 with the de -noising module 20 when the signal s is present so as to supply the de-noising module 20 with a received signal r which comprises in that case the signal s and a noise y. The detection of the presence of the signal s is not an object of the invention, it can be realized by a voice activity detection (VAD) system, a camera detecting a person or any other system like for example simply a button.
When the signal s is not present, said first contact of the switch 22 is arranged to normally connect the microphone 21 to an estimator generator 23 so as to supply the estimator generator 23 with the noise y so long as the signal s is not present. The estimator generator is arranged according to the invention for providing an estimator ((X1, x !, K} that can be used by the de-noising module 20 for subtracting an estimation of noise Ay from the signal r so as to provide the estimate As of the signal s. For that purpose, a second contact of the switch 22 is arranged for connecting the estimator generator 23 to the de-noising module 20 the same time as the first contact of the switch 22 is connecting the microphone 21 to the de-noising module 20. In a normal state, the second contact of the switch 22 is arranged to loop the estimator generator 23 on itself so as to adapt said estimator in real time according to the received noise y.
The estimator is provided for giving an estimation of noise that is a function of data which are collected in a vector x threw an input 24 of the device. The value of each component of the vector is given for instance by a sensor 25, 26, 27, 28 connected to the input 24. Here four sensors are represented but it will be easily understood that the invention can be implemented with any number of sensors more or less than four including a sole one sensor, in which case the vector x is simply a scalar x. The type of data is any that suits for an estimation of noise resulting from an event measurable by such data and liable to create or to contribute to the noise received by the microphone 21. For a non limitative illustration purpose only, the data can be an angle of a moving arm of a robot, a speed or acceleration, a spent power of a motor, a sound captured by another microphone.
A third contact of the switch 22 is arranged for connecting the input 24 to the de- noising module when the first contact is connecting the microphone 21 to the de- noising module. In that way, a real time estimation of noise Ay can be calculated with help of the estimator so as for the de-noising module to elaborate the estimate As in a similar way but not necessary the same as the one taught in WO2006/032760.
When not connecting the input 24 to the de-noising module 20, the third contact of the switch 22 is arranged to connect the input 24 to a shift register 29. An output of each cell is arranged to be connected to the estimator generator 23 when said cell receives from a preceding cell or from the input 24 a value x i of vector wherein a index i is comprised between 1 and n, 1 for the oldest value and n for the last one which is received threw the input 24. The manner for shifting the values in the register is not essential for the method according to the invention, it can be by means of a clock of the device in a manner usually known in the art for sampling or every time a new value is detected. Useful feature of the invention is that a noise is sampled at the same time as a new value Xn of vector shifts the preceding ones in the register.
The estimator generator 23 is arranged for starting a process of constructing the estimator when receiving from the shift register 29 a predetermined number L of values X 1 with their index i less or equal than n and greater than n-L. The process is executed by running the now explained steps of a method implementing the invention.
Referring now to figure 2, the number L is predetermined by setting its value in an initialization step 100. The determination of said value per se is not in the scope of the invention, it can result from theoretical considerations or more practically from testing the device by a user providing successively different values of L up to achieve a more acceptable result on the estimation of the signal s by the de-noising module 20.
The estimator {α1? x l5 K} is provided by the estimator generator 23 for calculating an estimate Ay of noise in the form:
ΛJ :=/«(*) = ∑>!K(^)
The noise estimation function f relates to the vector x by a linear combination of expressions of a kernel function K when applied each to the current vector x and to a past value X 1 of it. Said kernel function K to be used is any function that satisfies the Mercer condition. Mathematical sciences define that the Mercer conditions are satisfied when for any number n of complex values a; or a, and of vectors X 1 or x _, with real values, ) gives a non negative real value. We can easy check that for
Figure imgf000007_0001
example the Gaussian function Y^\xi ,xj )= e 2σ2 satisfies the Mercer conditions.
Therefore this Gaussian function can be used for implementing the invention. Other Mercer conditions satisfying kernels are known and can also be used according to the best suited solution in the context of the exploitation of the device. Here is a non limitative list for illustrative purpose only: - a polynomial kernel in the form of K[X1 , x} ) = (l + X1 ■ x} f
- an exponential kernel in the form of κ(x; , x J = e β"
- a sigmoidal kernel in the form of κ(x;,xy J= tanh(^ox; • X7 + β0 J.
The coefficients (X1 are upgraded in real time by a loop of steps 101 to 108 wherein step 101 is triggered again for each new received value of data x considered as a supplementary last received data x n the same time as a received noise y considered as the last received noise yn. Each time step 101 is executed, a supplementary coefficient αn is created with a value initialized to zero. The loop is executed for each value of n, said value being initialized in step 100 and incremented in step 103 or 108 to be ready for a following execution of the loop. Step 102 tests if number n is greater or equal to
L so as to furnish coefficients (X1 for a total number n of received data at least equal to
L. So long as number n is less than L, step 103 is branched on step 101.
Considering a index i comprised between n-L+1 and n, the L last coefficients (X1 of the estimator are calculated in step 107 by using the following formula:
Figure imgf000007_0002
Wherein Dy,(n) is a distance separating a noise y, from an estimation of that noise which is done with the estimation function fusing coefficients αm(n-l) with the values they currently had when executing step 101. The noise y, is the one which was or is measured along the time when n was or is equal to j in step 101.
Figure imgf000008_0001
The set of coefficients θi(n) on the left side of the setting symbol ":=" is for the coefficients generated by the current execution of the loop with rank n wherein the coefficients (X1(Ji-I) on the right side are those initialized to zero and or generated by a preceding execution of the loop with rank n-1.
In step 107,
Figure imgf000008_0002
is a coefficient on line n-i, column n-j in an inverse matrix of a h=k=L-\ kernel matrix %i » (ϊt) generated in step 104. Any known method of the art can be h=k=0 used for obtaining the inverse matrix of the kernel matrix. In step 104 executed before step 107 in case of a positive response to the test of step 102, the kernel matrix for the loop of rank n is generated by the formula: h=k=L-\
Zh,k (n) := K(x»-^)+ ^Λ (3) h=k=0 Wherein ζh k is a matrix regularization parameter. In other words it is a value that is equal to zero when the index h is different of the index k and is equal to a constant when the two indexes h and k are equal. The regularization parameter assures that the h=k=L-\ matrix %r » (ϊl) has an inverse. h=k=0
The parameters p and μn are for improving the efficiency of the method and will be explained later. Without said parameters or with p and μn constants respectively equal to 0 and to 1 which is the same, the formulae used in step 107 are similar to the following ones:
Figure imgf000008_0003
DyM)Ay1 -∑ccm(n -l)-κ(xm, xj (5) m=\ Mathematical considerations show that a setting of coefficients (X1 according to formula (4) induces that for every j in the range of n-L+1 to n:
yj == ∑aι(n)-κ(xι,xj ) (6) ι=l
It is interesting to note from formula (5) that by an execution of a following loop for a new value of noise Jn+1, the equation (6) has for effect that for every j in the range of n-L+1 to n, the distance DyJ(Ji +1) is equal to zero. The only distance which is different from zero is Dyn+ι(n+l) which is given by:
Dyn+l(n + (7)
Figure imgf000009_0001
Because the kernel function K satisfies the Mercer condition, it can be shown that greater is L, faster the distance Dyn(n) is decreasing, in other words faster the method is converging.
Advantageously, the method comprises a step 105 wherein the coefficients (X1 are divided by (1+p) with a regularization parameter or forgetting factor p having a value greater than zero. Therewith, each time step 105 is executed after step 102, the coefficients (X1 are decreasing but in such a manner as to preserve the ratio between coefficients of any pair. Because divided many times, rather old coefficients, that is with the smaller indexes, become immaterial after a sufficient number of executions of the loop. The coefficients can be swapped out of the memory of the device executing the method, saving so much storage as time computing resources of it. For instance on figure 2, the coefficients (X1 with i less than n-L+1 are specifically divided by (1+p) in step 105. The coefficients (X1 with i greater than n-L are divided by (1+p) in step 107 according to formulae (1) and (2). In the following, every time the regularization parameter p is present in a formula, it will be understood that p is null, being the same as not being present for implementations without step 105.
Advantageously also, the formula (1) used in step 107 comprises a step size μ(n) which in this case is initialized in step 100 to a value μ(0). The value of the step size μ(n) can be a constant equal to μ(0) for every execution of step 107 or can be varied according to n. In that case the method comprises further a step 106 wherein the step size μ(n) is set to a value μ but limited by a minimum μmin and a maximum μmax. The values μmm and μmax are set in initialization step 100 respectively to a value greater or equal to zero and to a value preferably less or equal than one. A possible formula for achieving that is: μ(n) := min(max( μ ,μmin), μmax)
Before setting μ(n) in step 106, the value μ is updated by the formula:
Figure imgf000010_0001
In the formula, a prediction error is given by:
Figure imgf000010_0002
Wherein a function^ of the vector xn is given by the formula:
Figure imgf000010_0003
We see here that for example when j=0, the prediction error en° is the difference between the last received value of noise en° := yn and a value which would have been predicted or estimated from the n-1 preceding measures with help of the function^. We see also here that not only a last value of noise is compared with an estimation of noise resulting from the current available function but that everyone of the L'-l preceding received values of noise yn_} is compared with a value of noise which would have been estimated by the last available function^.
An adaptive size cost parameter γ, an adaptive step size cost order L', an adaptive step size cost insensitivity ε' and an adaptive step size recursive parameter η are respectively a positive real number greater than one, an integer less, equal or greater than L, a positive real number near to zero and a positive real number less than two which can be set in step 100 in case of step 106 existing. When the value of the size cost parameter γ is equal to one, we see that the updating of the value μ is independent of the prediction error. When furthermore the values of the adaptive step size recursive parameter η and of the regularization factor p are respectively equal to one and zero, a simple expression of the value μ is given by:
Figure imgf000011_0001
In the expression of μ , components βm of a weight gradient are updated according to the formula:
Figure imgf000011_0002
For every index i comprised in the range of n-L+1 to n, the value of a gradient Δ; is given by the formula:
Δ, -
Figure imgf000011_0003
A second mode of implementation of the method according to the invention is now described in reference to figure 3 wherein steps 100 to 103 are similar to the ones of the first mode of implementation previously described in reference to figure 2.
Considering an index i comprised between n-L+1 and n, the L last coefficients (X1 of the estimator are calculated in step 207 by using the following formula:
Figure imgf000011_0004
Wherein Xn^ (n) and λn_t (n) are Lagrange multipliers such as for the distance Dy } (n) being less than ε for every j being in the range of n-L+1 to n, in other words:
- e ≤ iyj - ∑«> -i)-κ(χ ffl,χ 7 )W
V m=\ J
More precisely, the Lagrange multipliers are calculated according the following sequence. A step 204 preceding step 206 is similar to previously described step 104 in that a
h=L-\ k=L-\ kernel matrix K (χ X t Ms calculated for having LxL coefficients, each h=0 k=0
coefficient of rank h, k, being equal to a kernel function of vectors xn_h and xn_k .
Furthermore a quadratic matrix Q(n) is constructed for having 2Lx2L coefficients given by:
Qh,k - Qh+LML - iXn-k r~ζh,k
Figure imgf000012_0001
Figure imgf000012_0002
A linear vector p(n) having L components Pk and L components pk+L, is given by the formula:
Figure imgf000012_0003
Wherein when k=n-j and 0 otherwise.
The matrix Q(n) and the linear vector p(n) are then input in a quadratic programming library that is arranged to produce in output values of X+ ^1 and A~_: in the form of a vector A having 2L positive components such that:
Aτ(n) = {λlιa;JM+ι + Aτp(n)jλ
Figure imgf000012_0004
Any quadratic programming library arranged for calculating such an argument of a maximum value is adapted like for example the GQP library available on http : //www, gnu.org/softw are/gsl .
In formula (T), the step size μ(n) is not necessary. It can be a constant which when equal to 1 is the same has not being present. A variable step size improves the method. In that case the method comprises further a step 205 wherein the step size μ(n) is set to a value μ but limited by a minimum μmin and a maximum μmax. The values ^n and μmax are set in initialization step 100 respectively to a value greater or equal to zero and to a value preferably less or equal than one. A possible formula for achieving that is: μ(n) := min(max(μ ,μmin), μmax)
Before setting μ(n) in step 205, the value μ is updated by the formula:
Figure imgf000013_0001
In the formula, a prediction error en J is given by:
Wherein a function^ of the vector xn _ is given by the formula:
Figure imgf000013_0002
We see here that for example when j=0, the prediction error en° is the difference between the last received value of noise yn and a value which would have been predicted or estimated from the n-1 preceding measures with help of the function fn. We see also here that not only a last value of noise is compared with an estimation of noise resulting from the current available function but that everyone of the L'-l preceding received values of noise y is compared with a value of noise which would have been estimated by the last available function^.
An adaptive size cost parameter γ, an adaptive step size cost order L', an adaptive step size cost insensitivity ε' and an adaptive step size recursive parameter η are respectively a positive real number greater than one, an integer less, equal or greater than L, a positive real number near to zero and a positive real number less than two which can be set in step 100 in case of step 106 existing. When the value of the size cost parameter γ is equal to one, we see that the updating of the value μ is independent of the prediction error value except for its sign. When furthermore the values of the adaptive step size recursive parameter η is equal to one, a simple expression of the value μ is given by:
Figure imgf000014_0001
In other words, the present method relates to a method for producing an estimator of noise which is an unknown function of physical quantities without necessity for said function to be linear comprising steps of:
- capturing (101) a sound and associating to said sound by a common index a first vector value of said physical quantities at the same time as the sound is captured; - repeating (102) said step of capturing, incrementing a value of the index each time said step is repeated, and when said index value is or has been incremented a number of times at least equal to a first integer (L) greater than one, each time said index value is incremented:
- generating (107) a current sequence of coefficients for a linear combination of functions which satisfy Mercer conditions wherein a first argument of one of the functions is the first vector value of the one of the capturing steps having the index value corresponding to a rank of the coefficient in the sequence and
- setting values of the coefficients so as to a quantity of the last captured sounds equal to said first integer be substantially equal each to an occurrence of the linear combination wherein a second argument of the functions is the first vector value of the capturing step having the index value associated to the sound;
- performing the linear combination resulting from the current generated sequence to produce the estimator when a next captured sound is not pure noise.

Claims

Claims:
1. Method for producing an estimator of noise f, said noise y being an unknown function of physical quantities x, said estimator of noise being a linear combination of kernel functions K which satisfy Mercer conditions, said method comprising the steps of:
- capturing (101) a sound (yn) and associating to said sound a vector value (Xn) of said physical quantities using a common index n for the time of capture;
- repeating (102) said step of capturing, incrementing the value of the index each time said step is repeated, and when said index value has been incremented a number of times at least equal to a first integer L greater than one, the step of repeating the step of capturing further comprising the steps of:
- defining an estimator of noise fn for the current index value using a sequence of coefficients ((X1) for a linear combination of the kernel functions K according to:
Figure imgf000015_0001
wherein a first argument of one of said kernel functions is the vector value Xf from the capturing step of rank i, and x a vector value of the physical quantities; - setting the values of the coefficients ((X1) so as to:
yj - ∑α i(n)- κ(i χi,χ .iJ ≤ ε i=l for the L last captured sounds y,, with j e {n-L+l,...,n}, ε being a fixed value, - using the current estimator of noise fn when a next captured sound is not pure noise.
2. Method according to Claim 1 wherein when a previous sequence was generated in relation with a preceding capturing step, the values of the coefficients are set for the current sequence being at a minimum distance of the previous sequence according to a predetermined metric associated with sequences.
3. Method according to Claim 1 or 2 wherein one of the last captured sounds is considered substantially equal to said occurrence of the linear combination when a difference between the said one sound and the occurrence is in a sufficiently small interval comprising zero and wherein the values of the (L) more recently generated coefficients comprise a difference (λ^ —λ~) between two Lagrange multipliers, a first one and a second one corresponding respectively to a positive limit and to a negative limit of said small interval.
4. Method according to Claim 3 wherein said difference between two Lagrange multipliers is multiplied by a step size which is updated according to said Lagrange multipliers.
5. Method according to Claim 1 or 2 wherein one of the last captured sounds is considered substantially equal to said occurrence of the linear combination when a difference between the said one sound and the occurrence is equal to zero and wherein the values of the (L) more recently generated coefficients comprise a difference between a last captured sound and the linear combination associated with a preceding value of said common index.
6. Method according to Claim 5 wherein said difference between the last captured sound and the linear combination is multiplied by a step size which is updated according to another difference which is between the last captured sound and a previous occurrence of the linear combination.
7. Method according to anyone of the preceding Claims wherein said coefficients are multiplied by a forgetting factor having a value less than one each time said common index is incremented.
8. System for producing an estimator of noise, said noise being an unknown function of physical quantities, said estimator of noise being a linear combination of kernel functions K which satisfy Mercer conditions, said system comprising: - a microphone (21) arranged for capturing a sound and means (22, 24, 29) arranged for associating to said sound a vector value of said physical quantities using a common index for the time of capture; - a generator (23) and a shift register (29) arranged for storing captured sounds and associating said vector value while incrementing the index value each time a sound is captured, the generator and the shift register being further arranged to, when said index value has been incremented a number of times at least equal to a first integer (L) greater than one:
- define an estimator of noise fn for the current index value using a sequence of coefficients ((X1) for a linear combination of the kernel functions K according to:
wherein a first argument of one of said kernel functions is the vector value Xf from the capturing step of rank i, and x a vector value of the physical quantities;
- setting the values of the coefficients ((X1) so as to:
Figure imgf000017_0001
for the L last captured sounds y,, with j e {n-L+l,...,n}, ε being a fixed value, the generator and the shift register being further arranged to provide an estimator of noise using the current estimator of noise fn when a next captured sound is not pure noise.
9. System according to Claim 8 wherein the generator (23) is arranged for setting the values of the coefficients of the current sequence being at a minimum distance of a previous sequence according to a predetermined metric associated with sequences.
10. System according to Claim 8 or 9 wherein one of the last captured sounds is considered substantially equal to said occurrence of the linear combination when a difference between the said one sound and the occurrence is in a sufficiently small interval comprising zero and wherein the values of the (L) more recently generated coefficients comprise a difference (λ^ —λ~) between two Lagrange multipliers, a first one and a second one corresponding respectively to a positive limit and to a negative limit of said small interval.
11. Method according to Claim 8 or 9 wherein one of the last captured sounds is considered substantially equal to said occurrence of the linear combination when a difference between the said one sound and the occurrence is equal to zero and wherein the values of the (L) more recently generated coefficients comprise a difference between a last captured sound and the linear combination associated with a preceding value of said common index.
12. A computer program providing computer executable instructions, which when loaded onto a computer causes the computer to computer to perform the method according to Claims 1 to 7.
13. A medium bearing the computer program according to Claim 12.
PCT/EP2007/064515 2006-12-21 2007-12-21 On-line learning method and system for speech denoising WO2008074893A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP06301278A EP1936608A1 (en) 2006-12-21 2006-12-21 On-line learning method and system for speech denoising
EP06301278.5 2006-12-21

Publications (1)

Publication Number Publication Date
WO2008074893A1 true WO2008074893A1 (en) 2008-06-26

Family

ID=37891750

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2007/064515 WO2008074893A1 (en) 2006-12-21 2007-12-21 On-line learning method and system for speech denoising

Country Status (2)

Country Link
EP (1) EP1936608A1 (en)
WO (1) WO2008074893A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000014731A1 (en) * 1998-09-09 2000-03-16 Ericsson Inc. Apparatus and method for transmitting an improved voice signal over a communications device located in a vehicle with adaptive vibration noise cancellation
US20050187763A1 (en) * 2004-02-23 2005-08-25 General Motors Corporation Dynamic tuning of hands-free algorithm for noise and driving conditions

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000014731A1 (en) * 1998-09-09 2000-03-16 Ericsson Inc. Apparatus and method for transmitting an improved voice signal over a communications device located in a vehicle with adaptive vibration noise cancellation
US20050187763A1 (en) * 2004-02-23 2005-08-25 General Motors Corporation Dynamic tuning of hands-free algorithm for noise and driving conditions

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
A. ITO ET AL.: "Internal Noise Suppression for Speech Recognition by Small Robots", PROC. EUROSPEECH / INTERSPEECH, 4 September 2005 (2005-09-04), Lisbon, Portugal, pages 2685 - 2688, XP002430339 *
SCHÖLKOPF: "Statistical Learning and Kernel Methods", TECHNICAL REPORT MICROSOFT RESEARCH MSR TR, no. MSR-TR-2000-23, 29 February 2000 (2000-02-29), pages 1 - 8, XP002413484 *

Also Published As

Publication number Publication date
EP1936608A1 (en) 2008-06-25

Similar Documents

Publication Publication Date Title
EP3776534B1 (en) Systems, methods, and computer-readable media for improved real-time audio processing
Zhao et al. Variable step-size LMS algorithm with a quotient form
CN110767223B (en) Voice keyword real-time detection method of single sound track robustness
EP2877993B1 (en) Method and device for reconstructing a target signal from a noisy input signal
EP3142106A1 (en) Apparatus and method for generating acoustic model, and apparatus and method for speech recognition
US11062208B2 (en) Update management for RPU array
US20080293372A1 (en) Optimum Nonlinear Correntropy Filted
CN111027686A (en) Landslide displacement prediction method, device and equipment
JP5490828B2 (en) Linear system coefficient estimation method, integrated circuit using the same, touch panel system, and electronic device
Albu et al. The Gauss-Seidel fast affine projection algorithm
KR101729634B1 (en) Keyboard typing detection and suppression
Ding et al. Convergence analysis of estimation algorithms for dual-rate stochastic systems
Bose et al. Framework for automated earthquake event detection based on denoising by adaptive filter
Li et al. Data-driven discovery of block-oriented nonlinear models using sparse null-subspace methods
Kang et al. A novel recursive modal parameter estimator for operational time-varying structural dynamic systems based on least squares support vector machine and time series model
WO2008074893A1 (en) On-line learning method and system for speech denoising
JP2011100031A (en) Information processor, attachment of the same, information processing system, control method of the same, and control program
WO2007142111A1 (en) Noise erasing device and method, and noise erasing program
CN116312616A (en) Processing recovery method and control system for noisy speech signals
US11610598B2 (en) Voice enhancement in presence of noise
Ajay et al. Comparative study of deep learning techniques used for speech enhancement
Dos Santos et al. Identification of LPV systems using successive approximations
Zhong et al. Robust adaptive filtering based on M-estimation-based minimum error entropy criterion
JP4887661B2 (en) Learning device, learning method, and computer program
Likhonina Hand Detection Algorithm: Pre-processing Stage.

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07858121

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 07858121

Country of ref document: EP

Kind code of ref document: A1