EP2309499B1 - Method for optimised filtering of non-stationary interference captured by a multi-microphone audio device, in particular a hands-free telephone device for an automobile. - Google Patents

Method for optimised filtering of non-stationary interference captured by a multi-microphone audio device, in particular a hands-free telephone device for an automobile. Download PDF

Info

Publication number
EP2309499B1
EP2309499B1 EP10167065A EP10167065A EP2309499B1 EP 2309499 B1 EP2309499 B1 EP 2309499B1 EP 10167065 A EP10167065 A EP 10167065A EP 10167065 A EP10167065 A EP 10167065A EP 2309499 B1 EP2309499 B1 EP 2309499B1
Authority
EP
European Patent Office
Prior art keywords
noise
signal
speech
microphones
incidence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
EP10167065A
Other languages
German (de)
French (fr)
Other versions
EP2309499A1 (en
Inventor
Guillaume Vitte
Julie Seris
Guillaume Pinot
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Parrot SA
Original Assignee
Parrot SA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Parrot SA filed Critical Parrot SA
Publication of EP2309499A1 publication Critical patent/EP2309499A1/en
Application granted granted Critical
Publication of EP2309499B1 publication Critical patent/EP2309499B1/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/005Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2499/00Aspects covered by H04R or H04S not otherwise provided for in their subgroups
    • H04R2499/10General applications
    • H04R2499/13Acoustic transducers and sound field adaptation in vehicles

Definitions

  • the invention relates to the treatment of speech in a noisy environment.
  • microphone microphone
  • noise that is a disruptive element that can go, in some cases, to make incomprehensible the speaker's words. It is the same if one wants to implement speech recognition techniques, because it is very difficult to perform a form recognition on words embedded in a high noise level.
  • Some of these devices provide for the use of several microphones, usually two microphones, and use the average of the signals picked up, or other more complex operations, to obtain a signal with a lower level of interference.
  • a so-called beamforming technique makes it possible to create by software means a directivity which improves the signal / noise ratio, but the performances of this technique are very limited when only two microphones are used (concretely, it is considered that such a method provides good results only if you have a network of at least eight microphones).
  • conventional techniques are especially adapted to the filtering of diffuse noise, stationary, coming from the surroundings of the device and found at comparable levels in the signals picked up by the two microphones.
  • a nonstationary noise or "transient” that is to say a noise evolving unpredictably as a function of time, will not be discriminated from the speech and will not be attenuated.
  • a difficulty in filtering these nonstationary noises lies in the fact that their temporal and spatial characteristics are very close to those of speech, hence the difficulty on the one hand, of estimating the presence of a speech (because the speaker do not speak all the time) and on the other hand to extract the useful speech signal in a very noisy environment such as a passenger compartment of a motor vehicle.
  • this reference will be used to, firstly, calculate a probability of absence / presence of speech and, on the other hand, denoise the signal picked up by the microphones.
  • the invention aims, in general, a method of denoising a noisy acoustic signal picked up by two microphones of a multi-microphone audio device operating in a noisy environment, including a hands-free telephone device for a vehicle automobile.
  • the noisy acoustic signal includes a speech component derived from a directional speech source and a noise noise component, said noise component itself including a directional non-stationary side noise component.
  • Such a method is for example disclosed by I. Cohen and B. Berdugo, Two-Channel Signal Detection and Speech Enhancement Based on the Transient Beam-to-Reference Ratio, Proc. ICASSP 2003, Hong Kong, pp. 233-236, Apr. 2003 .
  • the Figure 1 is a block diagram showing the different modules and functions implemented by the method of the invention as well as their interactions.
  • the method of the invention is implemented by software means, which can be broken down and schematized by a number of illustrated blocks 10 to 36 Figure 1 .
  • the signal that one wishes to denoise comes from a plurality of signals picked up by a network of microphones (which, in the minimum configuration, can be simply a network of two microphones, as in the example shown) arranged in a configuration predetermined.
  • these two microphones can for example be installed on the ceiling of a car interior, about 5 cm from each other; and have the main lobe of their driver oriented directivity diagram. This direction, considered a priori known, will be designated the direction of incidence of the useful signal.
  • “Lateral noise” shall be termed a directional non-stationary noise whose direction of incidence is remote from that of the wanted signal, and the "preferred cone” shall be called the direction or angular sector of the space where the useful signal source is located (the speech of the speaker) in relation to the network of microphones. When a sound source will manifest outside the preferred cone, it will be a lateral noise, which we seek to mitigate.
  • the noisy signals picked up by the two microphones x 1 ( n ) and x 2 ( n ) are transposed in the frequency domain (blocks 10) by a short-term Fourier Transform (FFT) calculation of which the result is denoted respectively X 1 ( k, l ) and X 2 ( k, l ), where k is the index of the frequency band and l is the index of the time frame.
  • FFT short-term Fourier Transform
  • the signals from the two microphones are also applied to a module 12 implementing a predictive LMS algorithm schematized by the block 14 and giving, after calculation of a short-term Fourrier transform (block 16) a signal Y ( k, l ) which will be used to calculate a first noise reference Ref 1 ( k, l ) executed by a block 18, essentially on a spatial coherence criterion.
  • Another noise reference Ref 2 ( k, l ) is calculated by a block 20, essentially on an angular blocking criterion), from the signals X 1 ( k, 1 ) and X 2 ( k, 1 ) obtained directly, in the frequency domain, from the signals x 1 ( n ) and x 2 ( n ) .
  • a block 22 effects the selection of one or the other of the noise references Ref 1 ( k, l ) or Ref 2 ( k, l ) as a function of the result of a calculation of the angle of incidence of the signals operated by the block 24 from the signals X 1 ( k, l ) and X 2 ( k, l ).
  • the chosen noise reference, Ref ( k, l ) is used as referent noise channel of a block 26 for calculating a probability of speech absence. operated on a noisy signal X ( k, l ) resulting from a combination, performed by the block 28, of the two signals X 1 ( k, 1 ) and X 2 ( k, 1 ).
  • Block 26 also takes into account the respective pseudo-stationary noise components of the reference noise channel and the noisy signal, components estimated by blocks 30 and 32.
  • the result q ( k, l ) of the speech absence probability calculation and the noisy signal X ( k, l ) are inputted to an OM-LSA gain control algorithm (block 34) whose result I ⁇ ( k, l ) is subjected (block 36) to an inverse Fourier transformation (iFFT) to obtain in the time domain an estimate ( t ) the de-noiseed speech signal.
  • iFFT inverse Fourier transformation
  • the system provides for this purpose to use a predictive filter 14 type LMS ( Least Mean Squares, Least Mean Squares ) having for inputs the signals x 1 ( n ) and x 2 ( n ) captured by the pair of microphones. Note y ( n ) the output of the LMS and e ( n ) the prediction error.
  • LMS Least Mean Squares, Least Mean Squares
  • This predictive filter is used to predict from x 2 ( n ) the speech component in x 1 ( n ). Indeed, being more coherent spatially, speech will be better predicted by the adaptive filter than noise.
  • E ( k, l ), X 1 ( k, l ) and Y ( k, l ) being the short-term Fourier transforms (TFCTs) of e ( k, l ), x 1 ( k , l ) and y ( k, l ).
  • Ref 1 ( k, l ) X 1 k l - X 1 k l ⁇ Y k l X 1 k l
  • this noise channel integrates a part of nonstationary noises, in particular those which have a low statistical correlation and which are not predictable between the two microphones.
  • angle of incidence ⁇ S of the speech is known, for example defined as being the angle between the mediator of the pair of microphones and the reference direction corresponding to the source of useful speech.
  • any lateral noises are thus allowed to pass, by spatially blocking the speech signal.
  • This selection involves an estimation of the angle of incidence ⁇ ( k, l ) of the signals.
  • the reference noise channel Ref ( k, l ) is calculated by spatial coherence, which allows Integrate non-stationary non-directional noise.
  • the reference noise channel Ref ( k, l ) is calculated according to a different method, by spatial blocking, so as to efficiently introduce in this channel non-stationary directional and powerful noises.
  • the signals X n (k, l) (the TFCTs of the signals picked up by the microphones) can be combined with one another by a simple pre-filtering technique.
  • beamforming of the Delay and Sum type which is applied to obtain a partially denoised combined signal X ( k, l ):
  • X k l 1 2 ⁇ X 1 k l + d 2 k ⁇ .
  • the angle ⁇ S is zero and it is a simple average that is made on the two microphones. It should also be noted that, specifically, the number of microphones being limited, this treatment provides only a slight improvement in the signal / noise ratio, of the order of 1 dB only.
  • the purpose of this step is to calculate an estimate of the pseudo-stationary noise component present in the noise reference Ref ( k, l ) (block 30) and, in the same way, the pseudo-stationary noise component present in the signal to denoise X ( k, l ) (block 32).
  • the estimate of the pseudo-stationary noise component being indeed a fairly well resolved conventional problem.
  • Different methods are effective and usable for this purpose, in particular an algorithm for estimating the energy of the pseudo-stationary minimum-recursive averaging noise component (MCRA) as described by I. Cohen and B. Berdugo, Noise Estimation by Minima Controlled Recursive Averaging for Robust Speech Enhancement, IEEE Signal Processing Letters, Vol. 9, No. 1, pp. 12-15, Jan. 2002 .
  • MCRA pseudo-stationary minimum-recursive averaging noise component
  • the operator S is an estimate of the instantaneous energy
  • the operator M is an estimate of the pseudo-stationary energy (estimate made by the blocks 30 and 32).
  • S - M provides an estimate of the transient parts of the analyzed signal, also called transients.
  • the two signals analyzed here are the combined noisy signal X ( k, l ) and the signal of the reference noise channel Ref ( k, l ).
  • the numerator will therefore highlight the speech and noise transients, while the denominator will extract only the transients of noise in the reference noise channel.
  • the ratio ⁇ ( k, l ) will tend towards a high limit Q max ( k ), whereas conversely, in the absence of speech but in the presence of nonstationary noise, this ratio will approach the low limit ⁇ min ( k ), where k is the frequency band. This will make it possible to differentiate between speech and nonstationary noises.
  • the constants ⁇ X and ⁇ Ref used in this algorithm are in fact transient detection thresholds.
  • the parameters ⁇ X , ⁇ Ref and ⁇ min ( k ) and ⁇ max ( k ) are all chosen to correspond to typical situations, close to reality.
  • the probability q ( k, l ) of absence of speech calculated at block 26 will be used as an input parameter in a technique (in itself known) of denoising. It has the advantage of making it possible to identify the periods of absence of speech even in the presence of a nonstationary, non-directive or directive noise.
  • the probability of speechlessness is a crucial estimator for the proper functioning of a denoising structure such as we will use, because it underlies the good estimate of noise and the calculation of an efficient denoising gain.
  • OM-LSA Optimally Modified Log Spectral Amplitude denoising method
  • OM-LSA Optimally Modified Log Spectral Amplitude denoising method
  • LSA Log-Spectral Amplitude
  • the "OM-LSA” Optimally-Modified Log-Spectral Amplitude ) algorithm improves the calculation of the LSA gain to be applied by weighting it by the conditional probability of presence of speech.
  • the probability q ( k, l ) modulates the forgetfulness factor in the estimation of the noise, which is updated more rapidly on the signal noisy X ( k, l ) when the probability of absence of speech is high, this mechanism conditioning the quality of ⁇ Noise ( k, l ) ,
  • the probability q ( k, l ) plays a large role here in determining the gain G OM-LSA ( k, l ).
  • this probability is zero, the gain is equal to G min and a maximum noise reduction is applied: if, for example, a value of 20 dB is chosen for G min , the non-stationary noises previously detected are attenuated by 20 dB.
  • the last step is to apply to the signal ( k, l ) an inverse fast Fourier transform iFFT to obtain in the time domain the denoised speech signal ( t ) sought.

Landscapes

  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Quality & Reliability (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Otolaryngology (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Mobile Radio Communication Systems (AREA)
  • Fittings On The Vehicle Exterior For Carrying Loads, And Devices For Holding Or Mounting Articles (AREA)

Abstract

The method involves combining signals picked up by two microphones to make a noisy combined signal, and calculating a probability that speech is absent from the noisy combined signal on the basis of respective spectral energy levels of the noisy combined signal and of a referent noise signal. Noise is selectively reduced by applying variable gain that is specific to each frequency band and to each time frame on the basis of the probability that speech is absent and on the basis of the noisy combined signal.

Description

L'invention concerne le traitement de la parole en milieu bruité.The invention relates to the treatment of speech in a noisy environment.

Elle concerne notamment, mais de façon non limitative, le traitement des signaux de parole captés par des dispositifs de téléphonie pour véhicules automobiles.It concerns in particular, but without limitation, the processing of speech signals received by telephone devices for motor vehicles.

Ces appareils comportent un microphone ("micro") sensible captant non seulement la voix de l'utilisateur, mais également le bruit environnant, bruit qui constitue un élément perturbateur pouvant aller, dans certains cas, jusqu'à rendre incompréhensibles les paroles du locuteur. Il en est de même si l'on veut mettre en oeuvre des techniques de reconnaissance vocale, car il est très difficile d'opérer une reconnaissance de forme sur des mots noyés dans un niveau de bruit élevé.These devices include a microphone ("microphone") sensitive sensing not only the voice of the user, but also the surrounding noise, noise that is a disruptive element that can go, in some cases, to make incomprehensible the speaker's words. It is the same if one wants to implement speech recognition techniques, because it is very difficult to perform a form recognition on words embedded in a high noise level.

Cette difficulté liée aux bruits environnants est particulièrement contraignante dans le cas des dispositifs "mains-libres". En particulier, la distance importante entre le micro et le locuteur entraîne un niveau relatif de bruit élevé qui rend difficile l'extraction du signal utile noyé dans le bruit. De plus, le milieu très bruité typique de l'environnement automobile présente des caractéristiques spectrales non stationnaires, c'est-à-dire qui évoluent de manière imprévisible en fonction des conditions de conduite : passage sur des chaussées déformées ou pavées, autoradio en fonctionnement, etc.This difficulty related to surrounding noise is particularly restrictive in the case of devices "hands-free". In particular, the large distance between the microphone and the speaker causes a high relative level of noise that makes it difficult to extract the useful signal embedded in the noise. In addition, the highly noisy environment typical of the automotive environment has non-stationary spectral characteristics, that is to say that evolve unpredictably depending on the driving conditions: passage on deformed or paved roads, car radio operating etc.

Certains de ces dispositifs prévoient l'utilisation de plusieurs micros, généralement deux micros, et utilisent la moyenne des signaux captés, ou d'autres opérations plus complexes, pour obtenir un signal avec un niveau de perturbations moindre. En particulier, une technique dite beamforming permet de créer par des moyens logiciels une directivité qui améliore le rapport signal/bruit, mais les performances de cette technique sont très limitées lorsque seulement deux micros sont utilisés (concrètement, on estime qu'une telle méthode ne fournit de bons résultats qu'à condition de disposer d'un réseau d'au moins huit micros).Some of these devices provide for the use of several microphones, usually two microphones, and use the average of the signals picked up, or other more complex operations, to obtain a signal with a lower level of interference. In particular, a so-called beamforming technique makes it possible to create by software means a directivity which improves the signal / noise ratio, but the performances of this technique are very limited when only two microphones are used (concretely, it is considered that such a method provides good results only if you have a network of at least eight microphones).

Par ailleurs, les techniques classiques sont surtout adaptées au filtrage des bruits diffus, stationnaires, provenant des alentours du dispositif et se retrouvant à des niveaux comparables dans les signaux captés par les deux micros.Moreover, conventional techniques are especially adapted to the filtering of diffuse noise, stationary, coming from the surroundings of the device and found at comparable levels in the signals picked up by the two microphones.

En revanche, un bruit non stationnaire ou "transient", c'est-à-dire un bruit évoluant de manière imprévisible en fonction du temps, ne sera pas discriminé de la parole et ne sera donc pas atténué.On the other hand, a nonstationary noise or "transient", that is to say a noise evolving unpredictably as a function of time, will not be discriminated from the speech and will not be attenuated.

Or, dans un environnement automobile ces bruits non stationnaires et directifs sont très fréquents : coup de klaxon, passage d'un scooter, dépassement par une voiture, etc.However, in an automotive environment these nonstationary and directive noises are very common: blow of horn, passage of a scooter, overtaking by a car, etc.

Une difficulté du filtrage de ces bruits non stationnaires tient au fait que leurs caractéristiques temporelles et spatiales sont très proches de celles de la parole, d'où la difficulté d'une part, d'estimer la présence d'une parole (car le locuteur ne parle pas tout le temps) et d'autre part d'extraire le signal utile de parole dans un environnement très bruité tel qu'un habitacle de véhicule automobile.A difficulty in filtering these nonstationary noises lies in the fact that their temporal and spatial characteristics are very close to those of speech, hence the difficulty on the one hand, of estimating the presence of a speech (because the speaker do not speak all the time) and on the other hand to extract the useful speech signal in a very noisy environment such as a passenger compartment of a motor vehicle.

L'un des buts de la présente invention est de proposer un dispositif mains-libres multi-microphone, notamment un système qui mette en oeuvre seulement deux microphones, permettant :

  • de distinguer de façon efficace les bruits non stationnaires de la parole ; et
  • d'adapter le débruitage à la présence et aux caractéristiques des bruits non stationnaires détectés, sans altérer la parole éventuellement présente, afin de traiter le signal bruité de la manière la plus efficace.
One of the aims of the present invention is to propose a multi-microphone hands-free device, in particular a system which implements only two microphones, allowing:
  • to effectively distinguish nonstationary noises from speech; and
  • to adapt the denoising to the presence and characteristics of the non-stationary noises detected, without altering the speech possibly present, in order to treat the noisy signal in the most effective manner.

Le point de départ de l'invention consiste à associer (i) une analyse de cohérence spatiale du signal capté par les deux micros, à (ii) une analyse de la direction d'incidence de ces signaux. L'invention repose en effet sur deux constatations, à savoir que :

  • la parole présente généralement une cohérence spatiale supérieure au bruit ; et par ailleurs que
  • la direction d'incidence de la parole est généralement bien définie, et peut être supposée connue (dans le cas d'un véhicule automobile, elle est définie par la position du conducteur, vers lequel est tourné le micro).
The starting point of the invention consists in associating (i) a spatial coherence analysis of the signal picked up by the two microphones, to (ii) an analysis of the direction of incidence of these signals. The invention is based on two observations, namely that:
  • speech generally has a higher spatial coherence than noise; and besides that
  • the direction of incidence of speech is generally well defined, and can be assumed to be known (in the case of a motor vehicle, it is defined by the position of the driver, to which the microphone is turned).

Ces deux propriétés seront utilisées pour calculer deux références de bruit selon des méthodes différentes :

  • une première référence de bruit calculée en fonction de la cohérence spatiale des signaux captés - une telle référence sera intéressante dans la mesure où elle intègre les bruits non stationnaires peu directifs (accrocs dans le ronronnement du moteur, etc.) ; et
  • une seconde référence de bruit calculée en fonction de la direction principale d'incidence des signaux - cette caractéristique est en effet déterminable lorsque l'on utilise un réseau de plusieurs micros (au moins deux), conduisant à une référence de bruit intégrant surtout les bruits non stationnaires directifs (coups de klaxon, passage d'un scooter, dépassement par une voiture, etc.).
These two properties will be used to calculate two noise references according to different methods:
  • a first noise reference calculated according to the spatial coherence of the signals captured - such a reference will be interesting insofar as it incorporates non-stationary non-directional noises (snags in the purring of the engine, etc.); and
  • a second noise reference calculated as a function of the main direction of incidence of the signals - this characteristic is indeed determinable when using a network of several microphones (at least two), leading to a noise reference integrating especially the noises non-stationary directional (honking, passing a scooter, overtaking by a car, etc.).

Ces deux références de bruit seront utilisées en alternance selon la nature du bruit présent, en fonction de la direction d'incidence des signaux :

  • de manière générale, la première référence de bruit (celle calculée par cohérence spatiale) sera utilisée par défaut ;
  • en revanche, lorsque la direction principale d'incidence du signal sera éloignée de celle du signal utile (la direction du locuteur, supposée connue a priori) - c'est-à-dire en présence d'un bruit directif assez puissant - la seconde référence de bruit sera utilisée de façon à introduire majoritairement dans cette dernière les bruits non stationnaires directifs et puissants.
These two noise references will be used alternately according to the nature of the noise present, as a function of the direction of incidence of the signals:
  • in general, the first noise reference (that calculated by spatial coherence) will be used by default;
  • on the other hand, when the main direction of incidence of the signal will be distant from that of the useful signal (the direction of the speaker, assumed to be known a priori ) - that is to say in the presence of a rather powerful directional noise - the second Noise reference will be used to introduce predominantly directional and powerful non-stationary noise into the latter.

Une fois la référence de bruit ainsi sélectionnée, cette référence sera utilisée pour, d'une part, calculer une probabilité d'absence/présence de parole et pour, d'autre part, débruiter le signal capté par les micros.Once the noise reference thus selected, this reference will be used to, firstly, calculate a probability of absence / presence of speech and, on the other hand, denoise the signal picked up by the microphones.

Plus précisément, l'invention vise, de façon générale, un procédé de débruitage d'un signal acoustique bruité capté par deux microphones d'un dispositif audio multi-microphone opérant dans un milieu bruité, notamment un dispositif téléphonique "mains libres" pour véhicule automobile. Le signal acoustique bruité comprend une composante utile de parole issue d'une source de parole directive et une composante parasite de bruit, cette composante de bruit incluant elle-même une composante de bruit latéral non stationnaire directif.More specifically, the invention aims, in general, a method of denoising a noisy acoustic signal picked up by two microphones of a multi-microphone audio device operating in a noisy environment, including a hands-free telephone device for a vehicle automobile. The noisy acoustic signal includes a speech component derived from a directional speech source and a noise noise component, said noise component itself including a directional non-stationary side noise component.

Une tel procédé est par exemple divulgué par I. Cohen et B. Berdugo, Two-Channel Signal Detection and Speech Enhancement Based on the Transient Beam-to-Reference Ratio, Proc. ICASSP 2003, Hong-Kong, pp. 233-236, Apr. 2003 .Such a method is for example disclosed by I. Cohen and B. Berdugo, Two-Channel Signal Detection and Speech Enhancement Based on the Transient Beam-to-Reference Ratio, Proc. ICASSP 2003, Hong Kong, pp. 233-236, Apr. 2003 .

De façon caractéristique de l'invention, ce procédé comporte, dans le domaine fréquentiel pour une pluralité de bandes de fréquences définies pour des trames temporelles successives de signal, les étapes de traitement du signal suivantes :

  1. a) calcul d'une première référence de bruit par analyse de cohérence spatiale des signaux captés les deux microphones, ce calcul comprenant un filtrage linéaire prédictif appliqué aux signaux captés par les deux microphones et comprenant une soustraction avec compensation du déphasage entre le signal capté et le signal de sortie du filtre prédictif ;
  2. b) calcul d'une seconde référence de bruit par analyse des directions d'incidence des signaux captés par les deux microphones, ce calcul comprenant le blocage spatial des composantes des signaux captés dont la direction d'incidence est située à l'intérieur d'un cône de référence défini de part et d'autre d'une direction prédéterminée d'incidence du signal utile ;
  3. c) estimation d'une direction principale d'incidence des signaux captés par les deux microphones ;
  4. d) sélection comme signal de bruit référent de l'une ou l'autre des références de bruit calculées aux étapes a) et b), en fonction de la direction principale estimée à l'étape c) ;
  5. e) combinaison des signaux captés par les deux microphones en un signal combiné bruité ;
  6. f) calcul d'une probabilité d'absence de parole dans le signal combiné bruité, à partir des niveaux respectifs d'énergie spectrale du signal combiné bruité et du signal de bruit référent ;
  7. g) à partir de la probabilité d'absence de parole calculée à l'étape f) et du signal combiné bruité, réduction sélective du bruit par application d'un gain variable propre à chaque bande de fréquences et à chaque trame temporelle.
In a characteristic manner of the invention, this method comprises, in the frequency domain for a plurality of defined frequency bands. for successive time frames of signal, the following signal processing steps:
  1. a) calculating a first noise reference by spatial coherence analysis of the signals picked up by the two microphones, this calculation comprising a predictive linear filtering applied to the signals picked up by the two microphones and comprising a subtraction with compensation of the phase shift between the signal picked up and the output signal of the predictive filter;
  2. b) calculating a second noise reference by analyzing the directions of incidence of the signals picked up by the two microphones, this calculation comprising the spatial blocking of the components of the signals picked up whose direction of incidence is located inside a reference cone defined on either side of a predetermined direction of incidence of the useful signal;
  3. c) estimating a principal direction of incidence of the signals picked up by the two microphones;
  4. d) selecting as a reference noise signal from one or other of the noise references calculated in steps a) and b), depending on the principal direction estimated in step c);
  5. e) combining the signals picked up by the two microphones into a noisy combined signal;
  6. f) calculating a probability of absence of speech in the noisy combined signal from the respective spectral energy levels of the noisy combined signal and the reference noise signal;
  7. g) from the probability of absence of speech calculated in step f) and the noisy combined signal, selective noise reduction by applying a variable gain specific to each frequency band and each time frame.

Selon diverses caractéristiques subsidiaires avantageuses :

  • le filtrage linéaire prédictif comprend l'application d'un algorithme de prédiction linéaire de type moindres carrés moyens LMS ;
  • l'estimation de la direction principale d'incidence de l'étape c) comprend les sous-étapes successives suivantes : c1) partition de l'espace en une pluralité de secteurs angulaires ; c2) pour chaque secteur, évaluation d'un estimateur de direction d'incidence à partir des signaux captés par les deux microphones ; et c3) à partir des valeurs d'estimateurs calculées à l'étape c2), estimation de ladite direction principale d'incidence ;
  • la sélection de l'étape d) est une sélection de la seconde référence de bruit comme signal de bruit référent si la direction principale estimée à l'étape c) est située hors d'un cône de référence défini de part et d'autre d'une direction prédéterminée d'incidence du signal utile ;
  • la combinaison de l'étape e) comprend un préfiltrage de type fixed beamforming ;
  • le calcul de probabilité d'absence de parole de l'étape f) comprend l'estimation de composantes de bruit pseudo-stationnaire respectives contenues dans le signal combiné bruité et dans le signal de bruit référent, la probabilité d'absence de parole étant calculée à partir également de ces composantes de bruit pseudo-stationnaire respectives ;
  • la réduction sélective du bruit de l'étape g) est un traitement par application d'un gain à amplitude log-spectrale modifié optimisé OM-LSA.
According to various advantageous subsidiary features:
  • predictive linear filtering comprises the application of a LMS mean least squares linear prediction algorithm;
  • the estimate of the main direction of incidence of step c) comprises the following successive sub-steps: c1) partitioning the space into a plurality of angular sectors; c2) for each sector, evaluation of an incidence direction estimator from the signals captured by the two microphones; and c3) from the estimator values calculated in step c2), estimating said main direction of incidence;
  • the selection of step d) is a selection of the second noise reference as a reference noise signal if the main direction estimated in step c) is located outside a reference cone defined on either side of a predetermined direction of incidence of the useful signal;
  • the combination of step e) comprises prefiltering of fixed beamforming type;
  • the speech absence probability calculation of step f) comprises the estimation of respective pseudo-stationary noise components contained in the noisy combined signal and in the reference noise signal, the probability of absence of speech being calculated from also these respective pseudo-stationary noise components;
  • the selective noise reduction of step g) is a processing by applying an OM-LSA optimized modified log-spectral amplitude gain.

On va maintenant décrire un exemple de mise en oeuvre du procédé de l'invention en référence à la figure annexée.An example embodiment of the method of the invention will now be described with reference to the appended figure.

La Figure 1 est un schéma par blocs montrant les différents modules et fonctions mis en oeuvre par le procédé de l'invention ainsi que leurs interactions.The Figure 1 is a block diagram showing the different modules and functions implemented by the method of the invention as well as their interactions.

Le procédé de l'invention est mis en oeuvre par des moyens logiciels, qu'il est possible de décomposer et schématiser par un certain nombre de blocs 10 à 36 illustrés Figure 1.The method of the invention is implemented by software means, which can be broken down and schematized by a number of illustrated blocks 10 to 36 Figure 1 .

Ces traitements sont mis en oeuvre sous forme d'algorithmes appropriés exécutés par un microcontrôleur ou un processeur numérique de signal. Bien que, pour la clarté de l'exposé, ces divers traitements soient présentés sous forme de modules distincts, ils mettent en oeuvre des éléments communs et correspondent en pratique à une pluralité de fonctions globalement exécutées par un même logiciel.These processes are implemented in the form of appropriate algorithms executed by a microcontroller or a digital signal processor. Although, for the sake of clarity, these various treatments are presented as separate modules, they implement common elements and correspond in practice to a plurality of functions globally executed by the same software.

Le signal que l'on souhaite débruiter est issu d'une pluralité de signaux captés par un réseau de micros (qui, dans la configuration minimale, peut être simplement un réseau de deux micros, comme dans l'exemple illustré) disposés selon une configuration prédéterminée. En pratique, ces deux micros peuvent par exemple être installés sur le plafonnier d'un habitacle de voiture, à environ 5 cm l'un de l'autre ; et avoir le lobe principal de leur diagramme de directivité orienté vers le conducteur. Cette direction, considérée comme a priori connue, sera désignée direction d'incidence du signal utile.The signal that one wishes to denoise comes from a plurality of signals picked up by a network of microphones (which, in the minimum configuration, can be simply a network of two microphones, as in the example shown) arranged in a configuration predetermined. In practice, these two microphones can for example be installed on the ceiling of a car interior, about 5 cm from each other; and have the main lobe of their driver oriented directivity diagram. This direction, considered a priori known, will be designated the direction of incidence of the useful signal.

On appellera "bruit latéral" un bruit non stationnaire directif dont la direction d'incidence est éloignée de celle du signal utile, et on appellera "cône privilégié" la direction ou secteur angulaire de l'espace où se trouve la source de signal utile (la parole du locuteur) par rapport au réseau de micros. Lorsqu'une source sonore se manifestera en dehors du cône privilégié, il s'agira donc d'un bruit latéral, que l'on cherchera à atténuer. Comme illustré sur la Figure 1, les signaux bruités captés par les deux micros x 1(n) et x 2(n) font l'objet d'une transposition dans le domaine fréquentiel (blocs 10) par un calcul de transformée de Fourrier à court terme (FFT) dont le résultat est noté respectivement X 1(k,l) et X2 (k,l), k étant l'indice de la bande de fréquence et l étant l'indice de la trame temporelle. Les signaux issus des deux micros sont également appliqués à un module 12 mettant en oeuvre un algorithme LMS prédictif schématisé par le bloc 14 et donnant, après calcul d'une transformée de Fourrier à court terme (bloc 16) un signal Y(k,l) qui servira au calcul d'une première référence de bruit Ref 1(k,l) exécuté par un bloc 18, essentiellement sur un critère de cohérence spatiale."Lateral noise" shall be termed a directional non-stationary noise whose direction of incidence is remote from that of the wanted signal, and the "preferred cone" shall be called the direction or angular sector of the space where the useful signal source is located ( the speech of the speaker) in relation to the network of microphones. When a sound source will manifest outside the preferred cone, it will be a lateral noise, which we seek to mitigate. As illustrated on the Figure 1 , the noisy signals picked up by the two microphones x 1 ( n ) and x 2 ( n ) are transposed in the frequency domain (blocks 10) by a short-term Fourier Transform (FFT) calculation of which the result is denoted respectively X 1 ( k, l ) and X 2 ( k, l ), where k is the index of the frequency band and l is the index of the time frame. The signals from the two microphones are also applied to a module 12 implementing a predictive LMS algorithm schematized by the block 14 and giving, after calculation of a short-term Fourrier transform (block 16) a signal Y ( k, l ) which will be used to calculate a first noise reference Ref 1 ( k, l ) executed by a block 18, essentially on a spatial coherence criterion.

Une autre référence de bruit Ref 2(k,l) est calculée par un bloc 20, essentiellement sur un critère de blocage angulaire), à partir des signaux X 1(k,l) et X 2(k,l) directement obtenus, dans le domaine fréquentiel, à partir des signaux x 1(n) et x 2(n). Another noise reference Ref 2 ( k, l ) is calculated by a block 20, essentially on an angular blocking criterion), from the signals X 1 ( k, 1 ) and X 2 ( k, 1 ) obtained directly, in the frequency domain, from the signals x 1 ( n ) and x 2 ( n ) .

Un bloc 22 opère la sélection de l'une ou l'autre des références de bruit Ref 1(k,l) ou Ref 2(k,l) en fonction du résultat d'un calcul de l'angle d'incidence des signaux opéré par le bloc 24 à partir des signaux X 1(k,l) et X2 (k,l). La référence de bruit choisie, Ref(k,l), est utilisée comme canal de bruit référent d'un bloc 26 de calcul d'une probabilité d'absence de parole opérée sur un signal bruité X(k,l) résultant d'une combinaison, opérée par le bloc 28, des deux signaux X 1(k,l) et X 2(k,l). Le bloc 26 prend également en compte les composantes de bruit pseudo-stationnaire respectives du canal de bruit référent et du signal bruité, composantes estimées par les blocs 30 et 32.A block 22 effects the selection of one or the other of the noise references Ref 1 ( k, l ) or Ref 2 ( k, l ) as a function of the result of a calculation of the angle of incidence of the signals operated by the block 24 from the signals X 1 ( k, l ) and X 2 ( k, l ). The chosen noise reference, Ref ( k, l ), is used as referent noise channel of a block 26 for calculating a probability of speech absence. operated on a noisy signal X ( k, l ) resulting from a combination, performed by the block 28, of the two signals X 1 ( k, 1 ) and X 2 ( k, 1 ). Block 26 also takes into account the respective pseudo-stationary noise components of the reference noise channel and the noisy signal, components estimated by blocks 30 and 32.

Le résultat q(k,l) du calcul de probabilité d'absence de parole et le signal bruité X(k,l) sont appliqués en entrée d'un algorithme de contrôle de gain OM-LSA (bloc 34) dont le résultat I (k,l) est soumis (bloc 36) à une transformation de Fourrier inverse (iFFT) pour obtenir dans le domaine temporel une estimée

Figure imgb0001
(t) du signal de parole débruité.The result q ( k, l ) of the speech absence probability calculation and the noisy signal X ( k, l ) are inputted to an OM-LSA gain control algorithm (block 34) whose result I Ŝ ( k, l ) is subjected (block 36) to an inverse Fourier transformation (iFFT) to obtain in the time domain an estimate
Figure imgb0001
( t ) the de-noiseed speech signal.

On va maintenant décrire en détail chacune des étapes du traitement.We will now describe in detail each step of the treatment.

Transformée de Fourier des signaux captés par les micros (blocs 10)Fourier transform of the signals picked up by the microphones (blocks 10)

Le signal dans le domaine temporel xn (t) issu de chacun des N micros (N =1,2 dans l'exemple illustré) est numérisé, découpé en trames de T points temporels, fenêtre temporellement par une fenêtre de type Hanning, puis la transformée de Fourier rapide FFT (transformée à court terme) Xn (k,l) est calculée pour chacun de ces signaux : X n k l = a n , d n k × S k l + V n k l

Figure imgb0002

avec : d n k = e - i 2 πf k τ n
Figure imgb0003

l
étant l'indice de la trame temporelle,
k
étant l'indice de la bande de fréquences, et
fk
étant la fréquence centrale de la bande de fréquences indicée par k,
S(k,l)
désignant la source de signal utile,
an et τ n
désignant l'atténuation et le délai subis par le signal utile capté au niveau du micro n, et
Vn (k,l)
désignant le bruit capté par le micro n.
The signal in the time domain x n ( t ) from each of the N micros ( N = 1.2 in the example shown) is digitized, divided into frames of T time points, window temporally by a Hanning type window, then the Fast Fourier Transform FFT (short-term transform) X n ( k, l ) is calculated for each of these signals: X not k l = at not , d not k × S k l + V not k l
Figure imgb0002

with: d not k = e - i 2 πf k τ not
Figure imgb0003
l
being the index of the time frame,
k
being the index of the frequency band, and
f k
being the center frequency of the frequency band indexed by k ,
S ( k, l )
designating the useful signal source,
a n and τ n
designating the attenuation and the delay experienced by the useful signal picked up at the level of the microphone n , and
V n ( k, l )
designating the noise picked up by the microphone n.

Calcul d'une première référence de bruit par cohérence spatiale (bloc 12)Calculation of a first noise reference by spatial coherence (block 12)

L'idée fondamentale sur laquelle repose l'invention est que, dans un environnement de télécommunications, la parole est un signal émis par une source bien localisée, relativement proche des micros et presque entièrement captée en chemin direct. À l'inverse, les bruits stationnaires et non stationnaires, qui proviennent surtout des alentours de l'utilisateur, peuvent être associés à des sources éloignées, en grand nombre et possédant une corrélation statistique inférieure à la parole entre les deux micros.The fundamental idea on which the invention is based is that, in a telecommunications environment, speech is a signal emitted by a well localized source, relatively close to microphones and almost entirely captured in direct path. Conversely, stationary and nonstationary noises, which come mainly from the user's surroundings, can be associated with distant sources, in large numbers and having a statistical correlation lower than the speech between the two microphones.

Dans un environnement de télécommunications, la parole est donc plus cohérente spatialement que le bruit.In a telecommunications environment, speech is therefore more spatially coherent than noise.

Partant de ce principe, il est possible d'exploiter la propriété de cohérence spatiale pour construire un canal de bruit de référence plus riche et plus adapté qu'avec un beamformer. Le système prévoit à cet effet d'utiliser un filtre prédictif 14 de type LMS (Least Mean Squares, moindres carrés moyens) ayant pour entrées les signaux x 1(n) et x 2(n) captés par le couple de micros. On notera y(n) la sortie du LMS et e(n) l'erreur de prédiction.From this principle, it is possible to exploit the spatial coherence property to build a richer and more suitable reference noise channel than with a beamformer. The system provides for this purpose to use a predictive filter 14 type LMS ( Least Mean Squares, Least Mean Squares ) having for inputs the signals x 1 ( n ) and x 2 ( n ) captured by the pair of microphones. Note y ( n ) the output of the LMS and e ( n ) the prediction error.

Ce filtre prédictif est utilisé pour prédire à partir de x 2(n) la composante parole qui se trouve dans x 1(n). En effet, étant plus cohérente spatialement, la parole sera mieux prédite par le filtre adaptatif que le bruit.This predictive filter is used to predict from x 2 ( n ) the speech component in x 1 ( n ). Indeed, being more coherent spatially, speech will be better predicted by the adaptive filter than noise.

Une première possibilité consiste à prendre pour le canal de bruit référent la transformée de Fourier de l'erreur de prédiction : E k l = X 1 k l - Y k l

Figure imgb0004
A first possibility consists in taking the Fourier transform of the prediction error for the reference noise channel: E k l = X 1 k l - Y k l
Figure imgb0004

E(k,l), X 1(k,l) et Y(k,l) étant les transformées de Fourier à court terme (TFCT) respectives de e(k,l), x 1(k,l) et y(k,l). E ( k, l ), X 1 ( k, l ) and Y ( k, l ) being the short-term Fourier transforms (TFCTs) of e ( k, l ), x 1 ( k , l ) and y ( k, l ).

On constate cependant en pratique un certain déphasage entre X 1(k,l) et Y(k,l) dû à une convergence imparfaite de l'algorithme LMS, ce qui empêche une bonne discrimination entre parole et bruit.However, in practice there is a certain phase shift between X 1 ( k, l ) and Y ( k, l ) due to an imperfect convergence of the LMS algorithm, which prevents good discrimination between speech and noise.

Pour pallier ce défaut, il est possible de définir le premier signal de bruit référent Réf 1(k,l) par: Ref 1 k l = X 1 k l - X 1 k l Y k l X 1 k l

Figure imgb0005
To overcome this defect, it is possible to define the first referent noise signal Ref 1 ( k, l ) by: Ref 1 k l = X 1 k l - X 1 k l Y k l X 1 k l
Figure imgb0005

À la différence de nombreuses méthodes classiques d'estimation du bruit, aucune hypothèse de stationnarité n'est utilisée sur le bruit pour calculer ce premier canal de bruit de référence Réf 1(k,l). L'un des avantages est par conséquent que ce canal de bruit intègre une partie des bruits non stationnaires, en particulier ceux qui ont une faible corrélation statistique et qui ne sont pas prédictibles entre les deux micros.Unlike many conventional noise estimation methods, no stationarity hypothesis is used on noise to compute this first reference noise channel Ref 1 ( k, l ). One of the advantages is therefore that this noise channel integrates a part of nonstationary noises, in particular those which have a low statistical correlation and which are not predictable between the two microphones.

Calcul d'une seconde référence de bruit par blocage spatial (bloc 20)Calculation of a second noise reference by spatial blocking (block 20)

Dans un environnement de télécommunications, il est possible de rencontrer des bruits dont la source est bien localisée et relativement proche des micros. Il s'agit en général de bruits ponctuels assez puissants (passage d'un scooter, dépassement par une voiture, etc.), et qui peuvent être gênants.In a telecommunications environment, it is possible to encounter noises whose source is well localized and relatively close to the microphones. In general, it is about occasional noise rather powerful (passage of a scooter, overtaking by a car, etc.), and which can be troublesome.

Les hypothèses utilisées pour le calcul du premier canal de bruit référent ne sont pas vérifiées sur ce type de bruit ; en revanche, ces bruits ont la particularité d'avoir une direction d'incidence bien définie et distincte de la direction d'incidence de la parole.The assumptions used for calculating the first reference noise channel are not checked on this type of noise; on the other hand, these noises have the peculiarity of having a direction of incidence well defined and distinct from the direction of incidence of the speech.

Pour exploiter cette propriété, on supposera que l'angle d'incidence θ S de la parole est connu, par exemple défini comme étant l'angle entre la médiatrice du couple de micros et la direction de référence correspondant à la source de parole utile.To exploit this property, it will be assumed that the angle of incidence θ S of the speech is known, for example defined as being the angle between the mediator of the pair of microphones and the reference direction corresponding to the source of useful speech.

Plus précisément, on opère une partition de l'espace en secteurs angulaires qui décrivent l'espace, et dont chacun correspond à une direction définie par un angle θ j, j ∈ [1,M], avec par exemple M = 19, donnant la collection d'angles {-90°,-80°...,0°,...+80°,+90°}. On notera qu'il n'y a aucun lien entre le nombre Nde micros et le nombre M d'angles testés : par exemple, il est tout à fait possible de tester M = 19 angles avec un seul couple de micros (N=2).More precisely, we perform a partition of the space into angular sectors that describe space, and each of which corresponds to a direction defined by an angle θ j , j ∈ [1, M ], with for example M = 19, giving the collection of angles {-90 °, -80 ° ..., 0 °, ... + 80 °, + 90 °}. Note that there is no link between the number of microphones N and the number M angles tested: for example, it is quite possible to test M = 19 angles with a single pair of microphones ( N = 2). ).

On se donne la partition {A,I} des angles θ j qui sont respectivement "autorisés" et "interdits", les angles θ a A étant "autorisés" en ce qu'ils correspondant à des signaux en provenance d'un cône privilégié centré sur θ S , tandis que les angles θ i I sont "interdits" en ce qu'ils correspondent à des bruits latéraux indésirables.We get the partition { A , I } of angles θ j which are respectively "allowed" and "forbidden", the angles θ a A being "authorized" in that they correspond to signals coming from a cone privileged centered on θ S , while the angles θ i I are "prohibited" in that they correspond to unwanted side noises.

Le second canal de bruit référent Réf 2(k,l) est défini de la manière suivante : Ref 2 k l = 1 A θ a A X 1 k l - X 2 k l × e i 2 π . f k . d . sin θ a c

Figure imgb0006

X 1(k,l)
étant la TFCT du signal enregistré par le micro d'indice 1,
X2(k,l)
étant la TFCT du signal enregistré par le micro d'indice 2,
fk
étant la fréquence centrale de la bande de fréquences k,
l
étant la trame,
d
étant la distance entre les deux micros,
c
étant la célérité du son, et
|A|
étant le nombre d'angles "autorisés" du cône privilégié.
The second reference noise channel Ref 2 ( k, l ) is defined as follows: Ref 2 k l = 1 AT Σ θ at AT X 1 k l - X 2 k l × e i 2 π . f k . d . sin θ at vs
Figure imgb0006
X 1 ( k, l )
being the TFCT of the signal recorded by the micro of index 1,
X 2 (k, l)
being the TFCT of the signal recorded by the micro of index 2,
f k
being the center frequency of the frequency band k ,
l
being the frame,
d
being the distance between the two pickups,
vs
being the speed of sound, and
| A |
being the number of "allowed" angles of the privileged cone.

Dans chaque terme de cette somme, on retranche au signal du micro d'indice 1 le signal du micro d'indice 2 déphasé d'un angle θ a qui appartient à A (sous-collection des angles "autorisés"). Ainsi, dans chaque terme on bloque spatialement les signaux ayant une direction de propagation θ a "autorisée". Ce blocage spatial est effectué pour tous les angles autorisés.In each term of this sum, the signal of the micro of index 1 is subtracted from the signal of the micro of index 2 out of phase by an angle θ a which belongs to A (sub-collection of "allowed" angles). Thus, in each term, signals having a propagation direction θ a "allowed" are spatially blocked. This spatial blocking is performed for all authorized angles.

Dans ce second canal de bruit référent Réf 2(k,l), on laisse donc passer les éventuels bruits latéraux (bruits non stationnaires directifs), en bloquant spatialement le signal de parole.In this second referent noise channel Ref 2 ( k, l ), any lateral noises (non-stationary directional noises) are thus allowed to pass, by spatially blocking the speech signal.

Choix de la référence de bruit en fonctionChoice of the noise reference in function de la direction d'incidence des signaux (blocs 22 et 24)the direction of signal incidence (blocks 22 and 24)

Cette sélection implique une estimation de l'angle d'incidence θ̂(k,l) des signaux. θ ^ k l = argmax θ j , j 1 M P 1 , 2 θ j k l

Figure imgb0007

avec : P 1 , 2 θ j k l = E X 1 k l . X 2 k l . e - 2 π f k τ j
Figure imgb0008

et τ j = d c sin θ j
Figure imgb0009
This selection involves an estimation of the angle of incidence θ ( k, l ) of the signals. θ ^ k l = argmax θ j , j 1 M P 1 , 2 θ j k l
Figure imgb0007

with: P 1 , 2 θ j k l = E X 1 k l . X ~ 2 k l . e - 2 π f k τ j
Figure imgb0008

and τ j = d vs sin θ j
Figure imgb0009

Le canal de bruit référent sélectionné Ref(k,l) va dépendre de la détection d'un angle "autorisé" ou "interdit" pour la trame l et la bande de fréquence k :

  • si θ(k,l) est "autorisé" (θ(k,l) ∈ A), alors Ref(k,l) = Ref 1(k,l)
  • si θ(k,l) est "interdit" θ(k,l) ∈ I), alors Ref(k,l) = Ref 2(k,l)
  • si θ(k,l) n'est pas défini, alors Ref(k,l) = Ref 1(k,l)
The selected reference noise channel Ref ( k, l ) will depend on the detection of an "allowed" or "forbidden" angle for the frame l and the frequency band k :
  • if θ ( k, l ) is "allowed" (θ ( k, l ) ∈ A ), then Ref ( k, l ) = Ref 1 ( k, l )
  • if θ ( k, l ) is "forbidden" θ ( k, l ) ∈ I ), then Ref ( k, l ) = Ref 2 ( k, l )
  • if θ ( k, l ) is not defined, then Ref ( k, l ) = Ref 1 ( k, l )

Ainsi, dans le cas d'un angle "autorisé" détecté, ou en l'absence de signaux directifs à l'entrée des micros, le canal de bruit référent Ref (k,l) est calculé par cohérence spatiale, ce qui permet d'intégrer les bruits non stationnaires peu directifs.Thus, in the case of a detected "allowed" angle, or in the absence of directional signals at the input of the microphones, the reference noise channel Ref ( k, l ) is calculated by spatial coherence, which allows Integrate non-stationary non-directional noise.

En revanche si un angle "interdit" est détecté, cela signifie qu'un bruit directif et assez puissant est présent. Dans ce cas, le canal de bruit référent Ref (k,l) est calculé suivant une méthode différente, par blocage spatial, de façon à introduire efficacement dans ce canal les bruits non stationnaires directifs et puissants.On the other hand if a "forbidden" angle is detected, it means that a directional noise and rather powerful is present. In this case, the reference noise channel Ref ( k, l ) is calculated according to a different method, by spatial blocking, so as to efficiently introduce in this channel non-stationary directional and powerful noises.

Constitution d'un signal combiné partiellement débruité (bloc 28)Constitution of a combined signal partially de-denuded (block 28)

Les signaux Xn(k,l) (les TFCT des signaux captés par les micros) peuvent être combinés entre eux par une technique simple de préfiltrage par beamforming du type Delay and Sum, qui est appliquée pour obtenir un signal combiné X(k,l) partiellement débruité : X k l = 1 2 X 1 k l + d 2 k . X 2 k l

Figure imgb0010

avec : d 2 k = e i 2 π f k τ s avec τ s = d c sin θ s
Figure imgb0011
The signals X n (k, l) (the TFCTs of the signals picked up by the microphones) can be combined with one another by a simple pre-filtering technique. beamforming of the Delay and Sum type , which is applied to obtain a partially denoised combined signal X ( k, l ): X k l = 1 2 X 1 k l + d 2 k ~ . X 2 k l
Figure imgb0010

with: d 2 k = e i 2 π f k τ s with τ s = d vs sin θ s
Figure imgb0011

Lorsque le système considéré comporte, comme dans le présent exemple, deux micros dont la médiatrice coupe la source, l'angle θ S est nul et il s'agit d'une simple moyenne qui est faite sur les deux micros. Il est par ailleurs à noter que, concrètement, le nombre de micros étant limité, ce traitement ne procure qu'une faible amélioration du rapport signal/bruit, de l'ordre de 1 dB seulement.When the system considered comprises, as in the present example, two microphones whose mediator cuts the source, the angle θ S is zero and it is a simple average that is made on the two microphones. It should also be noted that, specifically, the number of microphones being limited, this treatment provides only a slight improvement in the signal / noise ratio, of the order of 1 dB only.

Estimation du bruit pseudo-stationnaire (blocs 30 et 32)Estimation of pseudo-stationary noise (blocks 30 and 32)

Cette étape a pour objet de calculer une estimation de la composante de bruit pseudo-stationnaire présente dans la référence de bruit Ref (k,l) (bloc 30) et, de la même manière, la composante de bruit pseudo-stationnaire présente dans le signal à débruiter X(k,l) (bloc 32). Il existe de très nombreuses publications sur ce sujet, l'estimation de la composante de bruit pseudo-stationnaire étant en effet un problème classique assez bien résolu. Différentes méthodes sont efficaces et utilisables à cet effet, notamment un algorithme d'estimation de l'énergie de la composante de bruit pseudo-stationnaire à moyennage récursif par contrôle des minima (MCRA) comme celui décrit par I. Cohen et B. Berdugo, Noise Estimation by Minima Controlled Recursive Averaging for Robust Speech Enhancement, IEEE Signal Processing Letters, Vol. 9, No 1, pp. 12-15, Jan. 2002 .The purpose of this step is to calculate an estimate of the pseudo-stationary noise component present in the noise reference Ref ( k, l ) (block 30) and, in the same way, the pseudo-stationary noise component present in the signal to denoise X ( k, l ) (block 32). There are many publications on this subject, the estimate of the pseudo-stationary noise component being indeed a fairly well resolved conventional problem. Different methods are effective and usable for this purpose, in particular an algorithm for estimating the energy of the pseudo-stationary minimum-recursive averaging noise component (MCRA) as described by I. Cohen and B. Berdugo, Noise Estimation by Minima Controlled Recursive Averaging for Robust Speech Enhancement, IEEE Signal Processing Letters, Vol. 9, No. 1, pp. 12-15, Jan. 2002 .

Calcul de la probabilité d'absence de parole (bloc 26)Calculation of the probability of absence of speech (block 26)

Une méthode efficace et reconnue pour estimer la probabilité d'absence de parole dans un environnement bruité est celle du rapport des transients, décrite par I. Cohen et B. Berdugo, Two-Channel Signal Detection and Speech Enhancement Based on the Transient Beam-to-Reference Ratio, Proc. ICASSP 2003, Hong-Kong, pp. 233-236, Apr. 2003 . Le rapport des transients est défini de la manière suivante : Ω k l = S X k l - M X k l S Ref k l - M Ref k l

Figure imgb0012

X(k,l)
étant le signal combiné partiellement débruité,
Ref(k,l)
étant le canal de bruit référent calculé dans la partie précédente,
k
étant la bande de fréquences, et
l
étant la trame
An effective and recognized method for estimating the probability of no speech in a noisy environment is that of the ratio of transients, described by I. Cohen and B. Berdugo, Two-Channel Signal Detection and Speech Enhancement Based on the Transient Beam-to-Reference Ratio, Proc. ICASSP 2003, Hong Kong, pp. 233-236, Apr. 2003 . The ratio of transients is defined as follows: Ω k l = S X k l - M X k l S Ref k l - M Ref k l
Figure imgb0012
X ( k, l )
being the combined signal partially denoised,
Ref ( k, l )
being the reference noise channel calculated in the previous part,
k
being the frequency band, and
l
being the weft

L'opérateur S est une estimation de l'énergie instantanée, et l'opérateur M est une estimation de l'énergie pseudo-stationnaire (estimation effectuée par les blocs 30 et 32). S - M fournit une estimation des parties transitoires du signal analysé, appelés aussi transients.The operator S is an estimate of the instantaneous energy, and the operator M is an estimate of the pseudo-stationary energy (estimate made by the blocks 30 and 32). S - M provides an estimate of the transient parts of the analyzed signal, also called transients.

Les deux signaux analysés sont ici le signal bruité combiné X(k,l) et le signal du canal de bruit référent Ref(k,l). Le numérateur va donc mettre en évidence les transients de parole et de bruits, alors que le dénominateur va extraire uniquement les transients de bruits se trouvant dans le canal de bruit référent.The two signals analyzed here are the combined noisy signal X ( k, l ) and the signal of the reference noise channel Ref ( k, l ). The numerator will therefore highlight the speech and noise transients, while the denominator will extract only the transients of noise in the reference noise channel.

Ainsi, en présence de parole mais en l'absence de bruit non stationnaire, le ratio Ω(k,l) va tendre vers une limite haute Qmax(k), alors qu'à l'inverse, en l'absence de parole mais en présence de bruit non stationnaire, ce ratio va se rapprocher de la limite basse Ωmin(k), k étant la bande de fréquences. Ceci va permettre de réaliser la différenciation entre parole et bruits non stationnaires.Thus, in the presence of speech but in the absence of non-stationary noise, the ratio Ω ( k, l ) will tend towards a high limit Q max ( k ), whereas conversely, in the absence of speech but in the presence of nonstationary noise, this ratio will approach the low limit Ω min ( k ), where k is the frequency band. This will make it possible to differentiate between speech and nonstationary noises.

Dans le cas général, on a : Ω min k Ω k l Ω max k

Figure imgb0013
In the general case, we have: Ω min k Ω k l Ω max k
Figure imgb0013

La probabilité d'absence de parole, notée ici q(k,l), va être calculée de la manière suivante.The probability of no speech, noted here q ( k, l ), will be calculated as follows.

Pour chaque trame / et chaque bande de fréquences k : For each frame / and each frequency band k:

Calcul de S X k l , S Ref k l , M X k l et M Ref k l ;

Figure imgb0014
Calculation of S X k l , S Ref k l , M X k l and M Ref k l ;
Figure imgb0014

  • ii) Si S[X(k,l)] ≥ α XM[X(k,l)], la parole est susceptible d'être présente, l'analyse est poursuivie à l'étape (iii), dans le cas contraire, la parole est absente : alors q(k,l) = 1 ; ii) If S [ X ( k, l )] ≥ α X M [ X ( k, l )] , the speech is likely to be present, the analysis is continued in step (iii), in the case contrary, speech is absent: then q ( k, l ) = 1;
  • iii) Si S[Ref(k,l)] ≥ α RefM[Ref(k,l)], du bruit transitoire est susceptible d'être présent, l'analyse est poursuivie à l'étape (iv), dans le cas contraire, ceci signifie les transients trouvés dans X(k,l) sont tous des transients de parole : alors q(k,l) = 0 ; Calcul du ratio Ω k , l = S X k , l - M X k , l S Ref k , l - M Ref k , l ;
    Figure imgb0015
    (iii) If S [ Ref ( k, l )] ≥ α Ref M [ Ref ( k, l )], transient noise is likely to be present, the analysis is continued in step (iv), in the otherwise, this means the transients found in X ( k, l ) are all speech transients: then q ( k, l ) = 0 ; Calculation of Ω ratio k , l = S X k , l - M X k , l S Ref k , l - M Ref k , l ;
    Figure imgb0015
  • v) Détermination de la probabilité d'absence de parole :v) Determination of the probability of speech absence: q k , l = max min Ω max k , l - Ω k , l Ω max k , l - Ω min k , l , 1 , 0 .
    Figure imgb0016
    q k , l = max min Ω max k , l - Ω k , l Ω max k , l - Ω min k , l , 1 , 0 .
    Figure imgb0016

Les constantes α X et α Ref utilisées dans cet algorithme sont en fait des seuils de détection des parties transitoires. Les paramètres α X , α Ref ainsi que Ωmin(k) et Ωmax(k), sont tous choisis de manière à correspondre à des situations typiques, proches de la réalité.The constants α X and α Ref used in this algorithm are in fact transient detection thresholds. The parameters α X , α Ref and Ω min ( k ) and Ω max ( k ) are all chosen to correspond to typical situations, close to reality.

Réduction de bruit par application d'un gain OM-LSA (bloc 34)Noise reduction by applying an OM-LSA gain (block 34)

La probabilité q(k,l) d'absence de parole calculée au bloc 26 va être utilisée comme paramètre d'entrée dans une technique (en elle-même connue) de débruitage. Elle présente l'avantage de permettre d'identifier les périodes d'absence de parole même en présence d'un bruit non stationnaire, peu directif ou directif. La probabilité d'absence de parole est un estimateur crucial pour le bon fonctionnement d'une structure de débruitage telle que nous allons utiliser, car elle sous-tend la bonne estimation du bruit et le calcul d'un gain de débruitage efficace.The probability q ( k, l ) of absence of speech calculated at block 26 will be used as an input parameter in a technique (in itself known) of denoising. It has the advantage of making it possible to identify the periods of absence of speech even in the presence of a nonstationary, non-directive or directive noise. The probability of speechlessness is a crucial estimator for the proper functioning of a denoising structure such as we will use, because it underlies the good estimate of noise and the calculation of an efficient denoising gain.

On peut avantageusement utiliser une méthode de débruitage de type OM-LSA (Optimally Modified - Log Spectral Amplitude) telle que celle décrite par : I. Cohen, Optimal Speech Enhancement Under Signal Presence Uncertainty Using Log-Spectral Amplitude Estimator, IEEE Signal Processing Letters, Vol. 9, No 4, April 2002 .It is advantageous to use an OM-LSA ( Optimally Modified Log Spectral Amplitude ) denoising method such as that described by: I. Cohen, Optimal Speech Enhancement Under Signal Presence Uncertainty Using Log-Spectral Amplitude Estimator, IEEE Signal Processing Letters, Vol. 9, No 4, April 2002 .

Essentiellement, l'application d'un gain nommé "gain LSA" (Log-Spectral Amplitude) permet de minimiser la distance quadratique moyenne entre le logarithme de l'amplitude du signal estimé et le logarithme de l'amplitude du signal de parole originel. Ce second critère se montre supérieur au premier car la distance choisie est en meilleure adéquation avec le comportement de l'oreille humaine et donne donc qualitativement de meilleurs résultats. Dans tous les cas, l'idée essentielle est de diminuer l'énergie des composantes fréquentielles très parasitées en leur appliquant un gain faible, tout en laissant intactes (par l'application d'un gain égal à 1) celles qui le sont peu ou pas du tout.Essentially, the application of a gain called Log-Spectral Amplitude (LSA) is used to minimize the mean squared distance between the logarithm of the amplitude of the estimated signal and the logarithm of the amplitude of the original speech signal. This second criterion is superior to the first because the distance chosen is in better adequacy with the behavior of the human ear and thus gives qualitatively better results. In all cases, the essential idea is to reduce the energy of the highly parasitized frequency components by applying a low gain, while leaving intact (by the application of a gain equal to 1) those that are little or not at all.

L'algorithme "OM-LSA" (Optimally-Modified Log-Spectral Amplitude) améliore le calcul du gain LSA à appliquer en le pondérant par la probabilité conditionnelle de présence de parole.The "OM-LSA" ( Optimally-Modified Log-Spectral Amplitude ) algorithm improves the calculation of the LSA gain to be applied by weighting it by the conditional probability of presence of speech.

Dans cette méthode, la probabilité d'absence de parole intervient à deux moments importants, pour l'estimation de l'énergie du bruit et pour le calcul du gain final, et la probabilité q(k,l) sera utilisée à ces deux niveaux. Si l'on note λBruit (k,1) l'estimation de la densité spectrale de puissance du bruit, cette estimation est donnée par : λ ^ Bruit k l = α Bruit k l . λ ^ Bruit k , l - 1 + 1 - α Bruit k l X k l 2

Figure imgb0017

avec : α Bruit k l = α B + 1 - α B p spa k l
Figure imgb0018
In this method, the probability of absence of speech occurs at two important moments, for the estimation of the noise energy and for the calculation of the final gain, and the probability q ( k, l ) will be used at these two levels. . If λ Noise (k, 1) is the estimate of the spectral power density of the noise, this estimate is given by: λ ^ Noise k l = α Noise k l . λ ^ Noise k , l - 1 + 1 - α Noise k l X k l 2
Figure imgb0017

with: α Noise k l = α B + 1 - α B p spa k l
Figure imgb0018

On peut noter ici que la probabilité q(k,l) vient moduler le facteur d'oubli dans l'estimation du bruit, qui est mise à jour plus rapidement sur le signal bruité X(k,l) lorsque la probabilité d'absence de parole est forte, ce mécanisme conditionnant entièrement la qualité de λBruit (k,l), It can be noted here that the probability q ( k, l ) modulates the forgetfulness factor in the estimation of the noise, which is updated more rapidly on the signal noisy X ( k, l ) when the probability of absence of speech is high, this mechanism conditioning the quality of λ Noise ( k, l ) ,

Le gain de débruitage GOM-LSA (k,l) est donné par : G OM - LSA k l = G H 1 k l 1 - q k l . G min q k l

Figure imgb0019

G H1(k,l) étant un gain de débruitage (dont le calcul dépend de l'estimation du bruit λ Bruit ) décrit dans l'article précité de Cohen, et
G min étant une constante correspondant au débruitage appliqué lorsque la parole est considérée comme absente.The denoising gain G OM-LSA ( k, l ) is given by: BOY WUT OM - LSA k l = BOY WUT H 1 k l 1 - q k l . BOY WUT min q k l
Figure imgb0019

G H 1 ( k, l ) being a denoising gain (the calculation of which depends on the noise estimate λ Noise ) described in the aforementioned article by Cohen, and
G min being a constant corresponding to the denoising applied when speech is considered as absent.

On notera que la probabilité q(k,l) joue ici un grand rôle dans la détermination du gain GOM-LSA (k,l). Notamment, lorsque cette probabilité est nulle, le gain est égal à G min et une réduction de bruit maximale est appliquée : si par exemple une valeur de 20 dB est choisie pour G min, les bruits non stationnaires précédemment détectés sont atténués de 20 dB.It should be noted that the probability q ( k, l ) plays a large role here in determining the gain G OM-LSA ( k, l ). In particular, when this probability is zero, the gain is equal to G min and a maximum noise reduction is applied: if, for example, a value of 20 dB is chosen for G min , the non-stationary noises previously detected are attenuated by 20 dB.

Le signal débruité

Figure imgb0020
(k,l) en sortie du bloc 34 est donné par : S ^ k l = G OM - LSA k l . X k l
Figure imgb0021
The denuded signal
Figure imgb0020
( k, l ) at the output of block 34 is given by: S ^ k l = BOY WUT OM - LSA k l . X k l
Figure imgb0021

On notera que d'ordinaire une telle structure de débruitage produit un résultat peu naturel et agressif sur les bruits non stationnaires, qui sont confondus avec la parole utile. L'un des intérêts majeurs de l'invention est, au contraire, d'éliminer efficacement ces bruits non stationnaires.It will be noted that usually such a denoising structure produces an unnatural and aggressive result on nonstationary noises, which are confused with useful speech. One of the major advantages of the invention is, on the contrary, to effectively eliminate these nonstationary noises.

D'autre part, dans une variante avantageuse, il est possible d'utiliser dans les expressions données plus haut une probabilité d'absence de parole hybride qhybrid (k,l), qui sera calculée à l'aide de q(k,l) et d'une autre probabilité d'absence de parole qstd (k,l), par exemple évaluée selon la méthode décrite dans le WO 2007/099222 A1 (Parrot SA). On a alors : q hybrid k l = max q k l , q std k l

Figure imgb0022
On the other hand, in an advantageous variant, it is possible to use in the expressions given above a probability of absence of hybrid q hybrid speech ( k, l ) , which will be calculated using q ( k, l ) and another probability of absence of speech q std ( k, l ) , for example evaluated according to the method described in the WO 2007/099222 A1 (Parrot SA). We then have: q hybrid k l = max q k l , q std k l
Figure imgb0022

Reconstitution temporelle du signal (bloc 36)Time reconstitution of the signal (block 36)

La dernière étape consiste à appliquer au signal

Figure imgb0023
(k,l) une transformée de Fourier rapide inverse iFFT pour obtenir dans le domaine temporel le signal de parole débruité
Figure imgb0024
(t) recherché.The last step is to apply to the signal
Figure imgb0023
( k, l ) an inverse fast Fourier transform iFFT to obtain in the time domain the denoised speech signal
Figure imgb0024
( t ) sought.

Claims (7)

  1. Method of denoising a noisy acoustic signal picked up by two microphones of a multi-microphone audio device operating in noisy surroundings, notably a "hands-free" telephonic device for motor vehicle,
    the noisy acoustic signal comprising a useful speech component arising from a directional speech source and a spurious noise component, this noise component itself including a directional non-stationary lateral noise component,
    method characterized in that it comprises, in the frequency domain for a plurality of bands of frequencies defined for successive temporal signal frames, the following signal processing steps:
    a) calculation (18) of a first noise reference by spatial coherence analysis of the signals picked up the two microphones, this calculation comprising a predictive linear filtering applied to the signals picked up by the two microphones and comprising a subtraction with compensation of the phase shift between the signal picked up and the output signal of the predictive filter;
    b) calculation (20) of a second noise reference by analysis of the directions of incidence of the signals picked up by the two microphones, this calculation comprising the spatial blocking of the components of the picked up signals whose direction of incidence is situated inside a reference cone defined on either side of a predetermined direction of incidence of the useful signal;
    c) estimation (24) of a principal direction of incidence (θ(k,l)) of the signals picked up by the two microphones;
    d) selection (22) as referent noise signal (Ref(k,l)) of one or the other of the noise references calculated in steps a) and b), as a function of the principal direction estimated in step c);
    e) combination (28) of the signals picked up by the two microphones into a noisy combined signal (X(k,l));
    f) calculation (26) of a probability of absence of speech (q(k,l)) in the noisy combined signal, on the basis of the respective levels of spectral energy of the noisy combined signal (X(k,l)) and of the referent noise signal (Ref(k,l));
    g) on the basis of the probability of absence of speech (q(k,l)) calculated in step f) and of the noisy combined signal (X(k,l)), selective reduction of the noise (34) by application of a variable gain specific to each band of frequencies and to each temporal frame.
  2. Method of Claim 1, in which the predictive filtering comprises the application of a linear prediction algorithm of LMS least mean squares type.
  3. Method of Claim 1, in which the estimation (24) of the principal direction of incidence of step c) comprises the following successive sub-steps:
    c1) partition of the space into a plurality of angular sectors;
    c2) for each sector, evaluation of an estimator of direction of incidence on the basis of the two signals picked up by the two corresponding microphones; and
    c3) on the basis of the estimator values calculated in step c2), estimation of the said principal direction of incidence.
  4. Method of Claim 1, in which the selection (22) of step d) is a selection of the second noise reference as referent noise signal if the principal direction estimated in step c) is situated outside of a reference cone defined on either side of a predetermined direction of incidence of the useful signal.
  5. Method of Claim 1, in which the combination (28) of step e) comprises a prefiltering of fixed beamforming type.
  6. Method of Claim 1, in which the calculation (26) of probability of absence of speech of step f) comprises the estimation (30, 32) of respective pseudo-stationary noise components contained in the noisy combined signal and in the referent noise signal, the probability of absence of speech (q(k,1)) being calculated on the basis also of these respective pseudo-stationary noise components.
  7. Method of Claim 1, in which the selective reduction of the noise (34) of step g) is a processing by application of a gain having optimized modified log-spectral amplitude OM-LSA.
EP10167065A 2009-09-22 2010-06-23 Method for optimised filtering of non-stationary interference captured by a multi-microphone audio device, in particular a hands-free telephone device for an automobile. Active EP2309499B1 (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
FR0956506A FR2950461B1 (en) 2009-09-22 2009-09-22 METHOD OF OPTIMIZED FILTERING OF NON-STATIONARY NOISE RECEIVED BY A MULTI-MICROPHONE AUDIO DEVICE, IN PARTICULAR A "HANDS-FREE" TELEPHONE DEVICE FOR A MOTOR VEHICLE

Publications (2)

Publication Number Publication Date
EP2309499A1 EP2309499A1 (en) 2011-04-13
EP2309499B1 true EP2309499B1 (en) 2011-10-19

Family

ID=42061020

Family Applications (1)

Application Number Title Priority Date Filing Date
EP10167065A Active EP2309499B1 (en) 2009-09-22 2010-06-23 Method for optimised filtering of non-stationary interference captured by a multi-microphone audio device, in particular a hands-free telephone device for an automobile.

Country Status (5)

Country Link
US (1) US8195246B2 (en)
EP (1) EP2309499B1 (en)
AT (1) ATE529860T1 (en)
ES (1) ES2375844T3 (en)
FR (1) FR2950461B1 (en)

Families Citing this family (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2948484B1 (en) * 2009-07-23 2011-07-29 Parrot METHOD FOR FILTERING NON-STATIONARY SIDE NOISES FOR A MULTI-MICROPHONE AUDIO DEVICE, IN PARTICULAR A "HANDS-FREE" TELEPHONE DEVICE FOR A MOTOR VEHICLE
JP2011191668A (en) * 2010-03-16 2011-09-29 Sony Corp Sound processing device, sound processing method and program
DK2395506T3 (en) * 2010-06-09 2012-09-10 Siemens Medical Instr Pte Ltd Acoustic signal processing method and system for suppressing interference and noise in binaural microphone configurations
JP5594133B2 (en) * 2010-12-28 2014-09-24 ソニー株式会社 Audio signal processing apparatus, audio signal processing method, and program
US9626982B2 (en) * 2011-02-15 2017-04-18 Voiceage Corporation Device and method for quantizing the gains of the adaptive and fixed contributions of the excitation in a CELP codec
FR2976710B1 (en) * 2011-06-20 2013-07-05 Parrot DEBRISING METHOD FOR MULTI-MICROPHONE AUDIO EQUIPMENT, IN PARTICULAR FOR A HANDS-FREE TELEPHONY SYSTEM
GB2493327B (en) * 2011-07-05 2018-06-06 Skype Processing audio signals
US9467775B2 (en) 2011-09-02 2016-10-11 Gn Netcom A/S Method and a system for noise suppressing an audio signal
GB2495130B (en) 2011-09-30 2018-10-24 Skype Processing audio signals
GB2495278A (en) 2011-09-30 2013-04-10 Skype Processing received signals from a range of receiving angles to reduce interference
GB2495472B (en) 2011-09-30 2019-07-03 Skype Processing audio signals
GB2495129B (en) 2011-09-30 2017-07-19 Skype Processing signals
GB2495128B (en) 2011-09-30 2018-04-04 Skype Processing signals
GB2495131A (en) 2011-09-30 2013-04-03 Skype A mobile device includes a received-signal beamformer that adapts to motion of the mobile device
GB2496660B (en) 2011-11-18 2014-06-04 Skype Processing audio signals
GB201120392D0 (en) 2011-11-25 2012-01-11 Skype Ltd Processing signals
ES2687617T3 (en) 2014-02-14 2018-10-26 Telefonaktiebolaget Lm Ericsson (Publ) Comfort noise generation
JP6260504B2 (en) * 2014-02-27 2018-01-17 株式会社Jvcケンウッド Audio signal processing apparatus, audio signal processing method, and audio signal processing program
DE112016007079T5 (en) * 2016-07-21 2019-04-04 Mitsubishi Electric Corporation NOISE REDUCTION DEVICE, ECHO LOCKING DEVICE, ANORGAL NOISE DETECTION DEVICE AND ANTI-TORCH DISPOSAL PROCEDURE
US10366701B1 (en) * 2016-08-27 2019-07-30 QoSound, Inc. Adaptive multi-microphone beamforming
US10462567B2 (en) * 2016-10-11 2019-10-29 Ford Global Technologies, Llc Responding to HVAC-induced vehicle microphone buffeting
DE102017212980B4 (en) * 2017-07-27 2023-01-19 Volkswagen Aktiengesellschaft Method for compensating for noise in a hands-free device in a motor vehicle and hands-free device
US11195540B2 (en) * 2019-01-28 2021-12-07 Cirrus Logic, Inc. Methods and apparatus for an adaptive blocking matrix
CN111640428B (en) * 2020-05-29 2023-10-20 阿波罗智联(北京)科技有限公司 Voice recognition method, device, equipment and medium
CN111933103B (en) * 2020-09-08 2024-01-05 亿咖通(湖北)技术有限公司 Active noise reduction system for vehicle, active noise reduction method and computer storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7103541B2 (en) * 2002-06-27 2006-09-05 Microsoft Corporation Microphone array signal enhancement using mixture models
JP3925734B2 (en) * 2003-03-17 2007-06-06 財団法人名古屋産業科学研究所 Target sound detection method, signal input delay time detection method, and sound signal processing apparatus
KR20060113714A (en) * 2003-11-24 2006-11-02 코닌클리케 필립스 일렉트로닉스 엔.브이. Adaptive beamformer with robustness against uncorrelated noise
DE102004005998B3 (en) * 2004-02-06 2005-05-25 Ruwisch, Dietmar, Dr. Separating sound signals involves Fourier transformation, inverse transformation using filter function dependent on angle of incidence with maximum at preferred angle and combined with frequency spectrum by multiplication
JP4157581B2 (en) * 2004-12-03 2008-10-01 本田技研工業株式会社 Voice recognition device
FR2898209B1 (en) 2006-03-01 2008-12-12 Parrot Sa METHOD FOR DEBRUCTING AN AUDIO SIGNAL

Also Published As

Publication number Publication date
ES2375844T3 (en) 2012-03-06
FR2950461A1 (en) 2011-03-25
US20110070926A1 (en) 2011-03-24
FR2950461B1 (en) 2011-10-21
ATE529860T1 (en) 2011-11-15
US8195246B2 (en) 2012-06-05
EP2309499A1 (en) 2011-04-13

Similar Documents

Publication Publication Date Title
EP2309499B1 (en) Method for optimised filtering of non-stationary interference captured by a multi-microphone audio device, in particular a hands-free telephone device for an automobile.
EP2293594B1 (en) Method for filtering lateral non stationary noise for a multi-microphone audio device
EP2538409B1 (en) Noise reduction method for multi-microphone audio equipment, in particular for a hands-free telephony system
EP2430825B1 (en) Method for selecting a microphone among a plurality of microphones in a speech processing system such as a hands-free telephone device operating in a noisy environment
EP2680262B1 (en) Method for suppressing noise in an acoustic signal for a multi-microphone audio device operating in a noisy environment
EP1830349B1 (en) Method of noise reduction of an audio signal
EP2772916B1 (en) Method for suppressing noise in an audio signal by an algorithm with variable spectral gain with dynamically adaptive strength
CA2436318C (en) Noise reduction method and device
EP2530673B1 (en) Audio device with suppression of noise in a voice signal using a fractional delay filter
EP2057835B1 (en) Method of reducing the residual acoustic echo after echo removal in a hands-free device
FR2909773A1 (en) MULTIVOYAL PASSIVE RADAR PROCESSING METHOD OF FM OPPORTUNITY SIGNAL.
FR2897733A1 (en) Echo discriminating and attenuating method for hierarchical coder-decoder, involves attenuating echoes based on initial processing in discriminated low energy zone, and inhibiting attenuation of echoes in false alarm zone
FR2883656A1 (en) CONTINUOUS SPEECH TREATMENT USING HETEROGENEOUS AND ADAPTED TRANSFER FUNCTION
FR3012928A1 (en) MODIFIERS BASED ON EXTERNALLY ESTIMATED SNR FOR INTERNAL MMSE CALCULATIONS
KR20110068637A (en) Method and apparatus for removing a noise signal from input signal in a noisy environment
FR3012929A1 (en) SPEECH PROBABILITY PRESENCE MODIFIER IMPROVING NOISE REMOVAL PERFORMANCE BASED ON LOG-MMSE
EP3192073B1 (en) Discrimination and attenuation of pre-echoes in a digital audio signal
FR2906070A1 (en) Electronic voice signal preprocessing system for hands free mobile telephone, has non coherent filtering stage filtering output of coherent filtering stage such that signal is successively subjected to coherent and non coherent filterings
EP2515300B1 (en) Method and system for noise reduction
FR2906071A1 (en) Electronic filter e.g. non-coherent filter, for e.g. hands-free mobile phone in vehicle, has control section limiting calibration gain such that variation threshold of calibration gain does not exceed preset variation threshold
FR3113537A1 (en) Method and electronic device for reducing multi-channel noise in an audio signal comprising a voice part, associated computer program product
EP1605440B1 (en) Method for signal source separation from a mixture signal
Mai et al. Optimal Bayesian Speech Enhancement by Parametric Joint Detection and Estimation
FR3106691A1 (en) SPEECH CONVERSION THROUGH STATISTICAL LEARNING WITH COMPLEX MODELING OF TEMPORAL CHANGES
FR2878399A1 (en) Soundproofing device for e.g. global system for mobile communication system, of e.g. car, has units to apply weight function to coherence between signals captured by microphones, to limit voice signal distortion and suppress estimated noise

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20100623

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME RS

GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

RIC1 Information provided on ipc code assigned before grant

Ipc: H04B 1/12 20060101ALI20110401BHEP

Ipc: G10L 21/02 20060101AFI20110401BHEP

Ipc: H04R 3/00 20060101ALI20110401BHEP

GRAS Grant fee paid

Free format text: ORIGINAL CODE: EPIDOSNIGR3

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO SE SI SK SM TR

REG Reference to a national code

Ref country code: GB

Ref legal event code: FG4D

Free format text: NOT ENGLISH

REG Reference to a national code

Ref country code: CH

Ref legal event code: EP

REG Reference to a national code

Ref country code: IE

Ref legal event code: FG4D

REG Reference to a national code

Ref country code: DE

Ref legal event code: R096

Ref document number: 602010000284

Country of ref document: DE

Effective date: 20111229

REG Reference to a national code

Ref country code: NL

Ref legal event code: T3

REG Reference to a national code

Ref country code: ES

Ref legal event code: FG2A

Ref document number: 2375844

Country of ref document: ES

Kind code of ref document: T3

Effective date: 20120306

LTIE Lt: invalidation of european patent or patent extension

Effective date: 20111019

REG Reference to a national code

Ref country code: AT

Ref legal event code: MK05

Ref document number: 529860

Country of ref document: AT

Kind code of ref document: T

Effective date: 20111019

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: LT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20111019

Ref country code: NO

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20120119

Ref country code: IS

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20120219

REG Reference to a national code

Ref country code: IE

Ref legal event code: FD4D

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: HR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20111019

Ref country code: SI

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20111019

Ref country code: GR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20120120

Ref country code: SE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20111019

Ref country code: PT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20120220

Ref country code: LV

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20111019

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: CY

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20111019

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: EE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20111019

Ref country code: SK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20111019

Ref country code: DK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20111019

Ref country code: CZ

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20111019

Ref country code: BG

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20120119

Ref country code: IE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20111019

PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: PL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20111019

Ref country code: RO

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20111019

26N No opposition filed

Effective date: 20120720

REG Reference to a national code

Ref country code: DE

Ref legal event code: R097

Ref document number: 602010000284

Country of ref document: DE

Effective date: 20120720

BERE Be: lapsed

Owner name: PARROT

Effective date: 20120630

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: ES

Payment date: 20120628

Year of fee payment: 3

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: AT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20111019

Ref country code: MC

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20120630

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: MK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20111019

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: BE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20120630

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: FI

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20111019

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: MT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20111019

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: AL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20111019

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: TR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20111019

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: SM

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20111019

Ref country code: LU

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20120623

REG Reference to a national code

Ref country code: ES

Ref legal event code: FD2A

Effective date: 20140707

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: HU

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20100623

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: ES

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20130624

REG Reference to a national code

Ref country code: CH

Ref legal event code: PL

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: CH

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20140630

Ref country code: LI

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20140630

REG Reference to a national code

Ref country code: FR

Ref legal event code: PLFP

Year of fee payment: 6

REG Reference to a national code

Ref country code: DE

Ref legal event code: R081

Ref document number: 602010000284

Country of ref document: DE

Owner name: PARROT AUTOMOTIVE, FR

Free format text: FORMER OWNER: PARROT, PARIS, FR

REG Reference to a national code

Ref country code: GB

Ref legal event code: 732E

Free format text: REGISTERED BETWEEN 20151029 AND 20151104

REG Reference to a national code

Ref country code: FR

Ref legal event code: TP

Owner name: PARROT AUTOMOTIVE, FR

Effective date: 20151201

REG Reference to a national code

Ref country code: NL

Ref legal event code: PD

Owner name: PARROT AUTOMOTIVE; FR

Free format text: DETAILS ASSIGNMENT: VERANDERING VAN EIGENAAR(S), OVERDRACHT; FORMER OWNER NAME: PARROT

Effective date: 20151102

REG Reference to a national code

Ref country code: FR

Ref legal event code: PLFP

Year of fee payment: 7

REG Reference to a national code

Ref country code: FR

Ref legal event code: PLFP

Year of fee payment: 8

REG Reference to a national code

Ref country code: FR

Ref legal event code: PLFP

Year of fee payment: 9

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: NL

Payment date: 20180625

Year of fee payment: 9

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: IT

Payment date: 20180622

Year of fee payment: 9

REG Reference to a national code

Ref country code: NL

Ref legal event code: MM

Effective date: 20190701

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: IT

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20190623

Ref country code: NL

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20190701

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: FR

Payment date: 20230523

Year of fee payment: 14

Ref country code: DE

Payment date: 20230523

Year of fee payment: 14

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: GB

Payment date: 20230523

Year of fee payment: 14