EP3828886A1 - Method and system for separating the voice component and the noise component in an audio flow - Google Patents
Method and system for separating the voice component and the noise component in an audio flow Download PDFInfo
- Publication number
- EP3828886A1 EP3828886A1 EP20209511.3A EP20209511A EP3828886A1 EP 3828886 A1 EP3828886 A1 EP 3828886A1 EP 20209511 A EP20209511 A EP 20209511A EP 3828886 A1 EP3828886 A1 EP 3828886A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- signal
- module
- voice
- generate
- frame
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
Definitions
- the invention relates to a method and a system for separating, in real time in an audio stream, the part of the stream associated with a voice or speech, from another part of the stream containing the noise.
- the invention finds its application in a context where one or more people are talking in a noisy environment (hubbub, engine noise, ventilation, etc.).
- the speech signal superimposed on the noisy signals is digitized into an audio stream by a sound sensor.
- the invention also relates to a method and a system for enhancing a real-time voice signal in an audio stream from a method of separating audio sources in delayed time.
- the patent application US 20190066713 discloses a method of obtaining, by a device, a combined sound signal for combined signals from multiple sound sources in an area in which a person is located.
- the processing implemented uses deep neural networks.
- An example of a method for separating several voices in an audio signal comprises the steps described below and not shown for reasons of simplification.
- the incoming audio signal is denoted by X , it has the length L.
- the signal is transmitted to an encoder M 1 which transforms X into a tensor X (1) of dimensions F ⁇ T where T is a divisor of L and F a number of filters given by the designer.
- the encoder M 1 consists of a 1D Convolution with F filters. The coefficients of the convolution kernels are adjusted during a learning phase.
- the tensor is transmitted on the one hand to a multiplier for future use and on the other hand to a separation module.
- the separation module is divided into two submodules M 2 and M 4 .
- the first submodule M 2 transforms the tensor X (1) into a tensor X (2) of dimensions F ⁇ T.
- the first submodule M 2 consists of a normalization layer, a 1x1 convolution and a stack of 1D-Conv modules known from the prior art and whose parameters are set during a learning phase.
- the second submodule M 4 transforms X (2) into an X (4) tensor of dimensions 2F ⁇ T.
- the second submodule M 4 connects a non-linearity, a 1x1 convolution and a sigmoid function.
- the coefficients of the 1x1 convolution are set during a learning phase.
- X (1) is concatenated to itself to form a tensor of dimensions 2F x T which is multiplied by X (4) to form X (5) .
- the module M 5 takes as input X (5) and outputs two signals of length L by means of a 1D deconvolution, the parameters of which are adjusted during a learning phase.
- the digital parameters defining the processing of the different modules are obtained in a prior learning phase on a database.
- the figure 1 illustrates an application to the separation of signals of different types, by separating the voice channel and the noise channel.
- data is represented in the form of tensors.
- the data is modified by a succession of modules.
- the data is projected into an abstract space generally defined by its dimensions.
- the present invention implements the following treatments:
- the signal (input stream X ) is split into N frames of length L, with X N the nth frame.
- the method carries out the following treatments:
- the frame X N is encoded by a network of 1D convolutions.
- the result is a tensor X NOT 1 of dimensions F x T with F the number of filters given by the designer,
- the result X NOT 1 is then transformed by a module M 2, 101.
- the result X NOT 2 is a tensor of dimensions F x T.
- the modulus M 4 estimates, 103, from X NOT 2 , a tensor X NOT 4 of dimensions 2F x T.
- X NOT 1 is concatenated to itself, 104, to form a tensor of dimensions 2F x T which is multiplied by X NOT 4 to train X NOT 5 .
- the M 5 module from X NOT 5 produces a tensor of dimension 2 x T, 105, from which we obtain two outputs of dimensions 1 x T X N , 0 and X N , 1 which are respectively the voice channel and the noise channel.
- One of the objectives of the present invention is to provide a method and a device making it possible to separate, in real time, voices from the background noise in an audio stream, or denoising of the voice in an audio stream, in particular by taking counts information from previous frames. This improves performance and processing latency.
- the method thus enables the propagation of “global information” on the signal, its updating and its use from frame to frame.
- N 0 is set arbitrarily, for example it is identically zero.
- the figure 2 illustrates an example of a device allowing the implementation of the method according to the invention.
- the signal from which it is necessary to extract (separate) the voice (s) from the noise contained in the audio stream is received on an audio sensor 10.
- the audio sensor is connected to a set of equipment or Hardware modules 20 configured to separate the voice from the noise which will be detailed in figure 3 .
- the figure 3 illustrates a first variant embodiment for separating a voice from the noise in an audio signal, the processing being carried out at the level of the assembly 20. This separation is carried out in real time. Modules similar to the diagram of the figure 1 bear the same references.
- the assembly also includes a module M 3 , the function of which is detailed below.
- the audio signal received on the sensor is during a first step separated into N frames X 1 . .... X N.
- Each frame X N is associated with a tensor I N which is of constant dimension, independent of the index of the frame.
- the method will update the value of the tensor I N from frame to frame and the joint use of X N and I N to estimate X N , 0 and X N, 1 .
- the frame X N is transmitted to a first module M 1 , 100, which generates a signal X NOT 1 .
- the tensor I N -1 obtained during the previous step for the processing of the frame X N - 1 is transmitted in a module M 3 , 201.
- M 3 generates a tensor I N , 202, which will be used during the processing of the frame X N +1 .
- Encoder M 3 takes a signal as input X NOT 2 , 203, result of signal transformation X NOT 1 by a module M 2 and performs the concatenation of X NOT 2 and I N , in order to generate a signal X NOT 3 of dimension 2F x T, M 3 : X NOT 2 I NOT - 1 - > X NOT 3 I NOT .
- the signal X NOT 3 , 204 is transmitted to a module M 4 in order to generate a signal X NOT 4 which is combined, 104, with the signal X NOT 1 , the signal resulting from the combination is decoded by a decoder M 5 , 105, in order to generate a first voice signal X N , 0 and a second noise signal X N, 1 .
- the steps implemented by the method according to the invention are as follows:
- I N has dimension F x F
- the method and the device according to the invention allow real-time separation of the voice from the noise in an audio signal received on a sensor in real time and without degrading the parameters specific to the voice.
- the digital parameters defining the processing of the different modules are set in a prior learning phase on a database.
- the invention allows real-time operation with a controllable latency / quality compromise, so as not to degrade the audio signal which does not contain noise, and makes it possible to enhance the noise in a signal which does not contain words (voice).
- the method makes it possible in particular to preprocess the audio signal of speech to improve the quality of the voice processing / enhancement bricks (compression, analysis).
Abstract
L'invention concerne un procédé et un système pour séparer en temps réel dans un flux audio la composante voix et la composante bruit.The invention relates to a method and a system for separating in real time in an audio stream the voice component and the noise component.
Description
L'invention concerne un procédé et un système permettant de séparer, en temps réel dans un flux audio, la partie du flux associée à une voix ou à de la parole, d'une autre partie du flux contenant les bruits.The invention relates to a method and a system for separating, in real time in an audio stream, the part of the stream associated with a voice or speech, from another part of the stream containing the noise.
L'invention trouve son application dans un contexte où une ou plusieurs personnes parlent dans un environnement bruité (brouhaha, bruit de moteur, ventilation, etc.). Le signal de la parole superposé aux signaux bruyants est numérisé dans un flux audio par un capteur sonore.The invention finds its application in a context where one or more people are talking in a noisy environment (hubbub, engine noise, ventilation, etc.). The speech signal superimposed on the noisy signals is digitized into an audio stream by a sound sensor.
L'invention concerne aussi un procédé et un système pour rehausser un signal de voix en temps réel dans un flux audio à partir d'un procédé de séparation de sources audio en temps différé.The invention also relates to a method and a system for enhancing a real-time voice signal in an audio stream from a method of separating audio sources in delayed time.
L'état de l'art connu du demandeur se divise en deux catégories, les approches dites classiques et les approches possibles par l'intelligence artificielle connue sous la dénomination anglo-saxonne de « deep learning ».The state of the art known to the applicant is divided into two categories, the so-called conventional approaches and the approaches possible by artificial intelligence known under the Anglo-Saxon name of “deep learning”.
Dans l'approche de « deep learning », des approches traitent directement du problème de séparation voix/bruit de fond, d'autres concernent la séparation signal/signal, voix/voix.In the “deep learning” approach, some approaches directly deal with the problem of voice / background noise separation, others relate to signal / signal, voice / voice separation.
La demande de brevet
Un exemple de procédé pour séparer plusieurs voix dans un signal audio selon l'art antérieur comporte les étapes décrites ci-après et non représentées pour des raisons de simplification. Le signal audio entrant est noté X, il a pour longueur L. Le signal est transmis à un encodeur M 1 qui transforme X en un tenseur X (1)de dimensions F × T où T est un diviseur de L et F un nombre de filtres donné par le concepteur. L'encodeur M 1 consiste en une Convolution 1D à F filtres. Les coefficients des noyaux de convolution sont réglés lors d'une phase d'apprentissage. Le tenseur est transmis d'une part à un multiplicateur pour une utilisation future et d'autre part à un module de séparation. Le module de séparation est divisé en deux sous-modules M 2 et M 4. Le premier sous-module M 2 transforme le tenseur X (1) en un tenseur X (2) de dimensions F × T. Le premier sous-module M 2 est constitué d'une couche de normalisation, une convolution 1x1 et un empilement de modules 1D-Conv connus de l'art antérieur et dont les paramètres sont réglés lors d'une phase d'apprentissage.An example of a method for separating several voices in an audio signal according to the prior art comprises the steps described below and not shown for reasons of simplification. The incoming audio signal is denoted by X , it has the length L. The signal is transmitted to an encoder M 1 which transforms X into a tensor X (1) of dimensions F × T where T is a divisor of L and F a number of filters given by the designer. The encoder M 1 consists of a 1D Convolution with F filters. The coefficients of the convolution kernels are adjusted during a learning phase. The tensor is transmitted on the one hand to a multiplier for future use and on the other hand to a separation module. The separation module is divided into two submodules M 2 and M 4 . The first submodule M 2 transforms the tensor X (1) into a tensor X (2) of dimensions F × T. The first submodule M 2 consists of a normalization layer, a 1x1 convolution and a stack of 1D-Conv modules known from the prior art and whose parameters are set during a learning phase.
Le deuxième sous-module M 4 transforme X (2) en X (4) tenseur de dimensions 2F × T. Pour cela, le deuxième sous-module M 4 enchaîne une non-linéarité, une convolution 1x1 et une fonction sigmoîde. Les coefficients de la convolution 1x1 sont réglés lors d'une phase d'apprentissage.The second submodule M 4 transforms X (2) into an X (4) tensor of dimensions 2F × T. For this, the second submodule M 4 connects a non-linearity, a 1x1 convolution and a sigmoid function. The coefficients of the 1x1 convolution are set during a learning phase.
X (1) est concaténé à lui-même pour former un tenseur de dimensions 2F x T qui est multiplié à X (4) pour former X (5). X (1) is concatenated to itself to form a tensor of dimensions 2F x T which is multiplied by X (4) to form X (5) .
Le module M 5 prend pour entrée X (5) et donne en sortie deux signaux de longueur L au moyen d'une déconvolution 1D dont les paramètres sont réglés lors d'une phase d'apprentissage.The module M 5 takes as input X (5) and outputs two signals of length L by means of a 1D deconvolution, the parameters of which are adjusted during a learning phase.
Les paramètres numériques définissant les traitements des différents modules sont obtenus dans une phase préalable d'apprentissage sur une base de données.The digital parameters defining the processing of the different modules are obtained in a prior learning phase on a database.
En remplaçant une des voix par du bruit, il est immédiat d'utiliser les méthodes décrites dans l'état de l'art pour séparer la voix du bruit de fond dans un signal audio et, en conservant uniquement la sortie contenant le signal de voix, de rehausser la voix d'un signal bruité.By replacing one of the voices with noise, it is immediate to use the methods described in the state of the art to separate the voice from the background noise in an audio signal and, by keeping only the output containing the voice signal , to enhance the voice with a noisy signal.
La
Tel que décrit, l'état de l'art ne permet pas directement le traitement en temps réel d'un flux audio.As described, the state of the art does not directly allow real-time processing of an audio stream.
Le document de
Dans le domaine technique du « deep learning », les données sont représentées sous forme de tenseurs. Les données sont modifiées par une succession de modules. En sortie de chaque module, les données sont projetées dans un espace abstrait défini en général par ses dimensions.In the technical field of "deep learning", data is represented in the form of tensors. The data is modified by a succession of modules. At the output of each module, the data is projected into an abstract space generally defined by its dimensions.
Pour ce faire la présente invention met en œuvre les traitements suivants :To do this, the present invention implements the following treatments:
Le signal (flux d'entrée X) est découpé en N trames de longueur L, avec XN la nième trame. Le procédé exécute les traitements suivants :The signal (input stream X ) is split into N frames of length L, with X N the nth frame. The method carries out the following treatments:
La trame XN est encodée par un réseau de convolutions 1D. Le résultat est un tenseur
T un diviseur de L dépendant de la taille des filtres F, 100. Le résultat
Ces étapes sont réitérées sur chaque nouvelle trame. Les paramètres sont appris sur une base de données de sons. L'inconvénient de ce procédé est qu'il n'utilise pas les informations des trames précédentes pour traiter la trame courante. Ceci entraîne notamment une qualité dégradée et une forte latence dans les traitements, du fait de la durée des trames.These steps are reiterated on each new frame. The parameters are learned from a sound database. The disadvantage of this method is that it does not use the information from the previous frames to process the current frame. This leads in particular to degraded quality and high latency in processing, due to the duration of the frames.
L'un des objectifs de la présente invention est d'offrir un procédé et un dispositif permettant de séparer, en temps réel, des voix du bruit de fond dans un flux audio, ou débruitage de la voix dans un flux audio, notamment en tenant compte des informations issues des trames précédentes. Ceci permet d'améliorer les performances et la latence de traitement. Le procédé permet ainsi la propagation de « l'information globale » sur le signal, sa mise à jour et son exploitation de trame en trame.One of the objectives of the present invention is to provide a method and a device making it possible to separate, in real time, voices from the background noise in an audio stream, or denoising of the voice in an audio stream, in particular by taking counts information from previous frames. This improves performance and processing latency. The method thus enables the propagation of “global information” on the signal, its updating and its use from frame to frame.
L'invention concerne un procédé pour séparer en temps réel de la voix du bruit dans un signal audio reçu sur un récepteur équipé d'un capteur audio caractérisé en ce qu'il comporte au moins les étapes suivantes :
- On sépare le flux audio reçu en N trames XN ,
- Pour chaque trame XN on associe un tenseur contenant des informations sur l'ensemble du flux audio,
- On transmet la trame XN à un premier module M1 qui génère un
signal - Le tenseur I N-1 obtenu lors de l'étape précédente pour le traitement de la trame XN - 1 est transmis à un module M3,
- Le module M3 prend en entrée un
signal signal - Le signal
signal - Le signal résultant de la combinaison est décodé par un décodeur M5 afin de générer un premier signal de voix X N,0 et un deuxième signal X N,1.
- The audio stream received is separated into N frames X N ,
- For each frame X N we associate a tensor containing information on the whole audio stream,
- The frame X N is transmitted to a first module M 1 which generates a signal
- The tensor I N -1 obtained during the previous step for the processing of the frame X N - 1 is transmitted to a module M 3 ,
- The M 3 module takes a signal as
input result - The signal
- The signal resulting from the combination is decoded by a decoder M 5 in order to generate a first voice signal X N , 0 and a second signal X N, 1 .
Pour traiter une trame N on suppose que la trame N - 1 a été traitée précédemment et que les quantités résultant de ce traitement ont été stockées. Pour la trame 0, I 0 est fixé arbitrairement par exemple il est identiquement nul.To process an N frame, it is assumed that the N - 1 frame has been processed previously and that the quantities resulting from this processing have been stored. For frame 0, I 0 is set arbitrarily, for example it is identically zero.
L'invention concerne aussi un dispositif pour séparer de la voix du bruit dans un signal audio reçu sur un récepteur équipé d'un capteur audio caractérisé en ce qu'il comporte au moins les éléments suivants :
- Un premier module M1 recevant des trames d'un signal contenant de la voix et du bruit,
- Le premier module à une sortie reliée à un deuxième module M2 configuré pour générer un signal transmis à un troisième module M3 qui reçoit une valeur de tenseur associée à une trame précédente XN - 1 pour générer un tenseur IN associé à la trame courante et un signal
- A first module M 1 receiving frames of a signal containing voice and noise,
- The first module has an output connected to a second module M 2 configured to generate a signal transmitted to a third module M 3 which receives a tensor value associated with a previous frame X N - 1 to generate a tensor I N associated with the frame current and a signal
Le module M3 inséré entre le module M2 et le module M4 prend en entrée un tenseur homogène en dimensions à celui fourni en sortie du module M2 et fournit en sortie un tenseur homogène en dimensions à celui que prend en entrée le module M4. Une entrée IN - 1 supplémentaire est fournie en entrée du module M3 pour le traitement de la trame numéro N et le module M3 fournit en sortie additionnelle le tenseur IN .
- Un module M4 qui combine le signal
- Un décodeur M5 configuré pour générer un premier signal de voix X N,0 et un deuxième signal de bruit X N,1 à partir du signal
- An M 4 module which combines the signal
- A decoder M 5 configured to generate a first voice signal X N , 0 and a second noise signal X N, 1 from the signal
D'autres caractéristiques, détails et avantages de l'invention ressortiront à la lecture de la description faite en référence aux dessins annexés donnés à titre d'exemple non limitatifs et qui représentent, respectivement :
- [
Fig.1 ], une illustration de l'art antérieur, - [
Fig.2 ], un exemple de système permettant la mise en œuvre du procédé selon l'invention, - [
Fig.3 ] une illustration des étapes mises en œuvre par le procédé selon l'invention.
- [
Fig. 1 ], an illustration of the prior art, - [
Fig. 2 ], an example of a system allowing the implementation of the method according to the invention, - [
Fig. 3 ] an illustration of the steps implemented by the method according to the invention.
La
Le signal dont il faut extraire (séparer) la ou les voix du bruit contenu dans le flux audio est reçu sur un capteur audio 10. Le capteur audio est relié à un ensemble d'équipements ou modules Hardware 20 configurés pour séparer la voix du bruit qui seront détaillés à la
La
Le signal audio reçu sur le capteur est lors d'une première étape séparé en N trames X 1 .....XN . A chaque trame XN est associé un tenseur IN qui est de dimension constante, indépendante de l'indice de la trame. Le procédé va mettre à jour la valeur du tenseur IN de trame en trame et l'utilisation jointe de XN et IN pour estimer X N,0 et X N,1.The audio signal received on the sensor is during a first step separated into N frames X 1 . .... X N. Each frame X N is associated with a tensor I N which is of constant dimension, independent of the index of the frame. The method will update the value of the tensor I N from frame to frame and the joint use of X N and I N to estimate X N , 0 and X N, 1 .
La trame XN est transmise à un premier module M1, 100, qui génère un signal
M 3 génère un tenseur IN , 202, qui sera utilisé lors du traitement de la trame X N+1. M 3 generates a tensor I N , 202, which will be used during the processing of the frame X N +1 .
Le codeur M3 prend en entrée un signal
Le signal
Dans un mode de réalisation, les étapes mises en œuvre par le procédé selon l'invention sont les suivantes :In one embodiment, the steps implemented by the method according to the invention are as follows:
Pour tout N, IN est de dimension F x FFor all N, I N has dimension F x F
AN est un tenseur F x F défini par
- a.
IN = I N-1 + λ(AN - I N-1) avec λ un facteur de gain 0et 1 donné par l'utilisateur
BN = Softmax(I N-1) - a. La fonction softmax est classique en machine learning ; à un vecteur de K nombres, (v 1 ...v K ) elle associe un vecteur de K nombre (w 1 ... wK ) avec pour tout
- b. Pour calculer BN , la fonction softmax est appliquée indépendamment à toutes les lignes de IN ,
- at.
I N = I N -1 + λ ( A N - I N -1 ) with λ again factor 0 and 1 given by the user
B N = Softmax ( I N -1 ) - at. The softmax function is classic in machine learning; to a vector of K numbers, ( v 1 ... v K ) it associates a vector of K number ( w 1 ... w K ) with for all
- b. To calculate B N , the softmax function is applied independently to all the lines of I N ,
Le procédé et le dispositif selon l'invention permettent une séparation en temps réel de la voix du bruit dans un signal audio reçu sur un capteur en temps réel et sans dégrader les paramètres propres à la voix.The method and the device according to the invention allow real-time separation of the voice from the noise in an audio signal received on a sensor in real time and without degrading the parameters specific to the voice.
Les paramètres numériques définissant les traitements des différents modules sont réglés dans une phase préalable d'apprentissage sur une base de données.The digital parameters defining the processing of the different modules are set in a prior learning phase on a database.
L'invention permet un fonctionnement en temps réel avec un compromis latence/qualité contrôlable, de ne pas dégrader le signal audio qui ne contient pas de bruit, et permet de rehausser le bruit dans un signal ne contenant pas de paroles (de voix).The invention allows real-time operation with a controllable latency / quality compromise, so as not to degrade the audio signal which does not contain noise, and makes it possible to enhance the noise in a signal which does not contain words (voice).
Le procédé permet notamment de prétraiter le signal audio de la parole pour améliorer la qualité de briques de traitement / valorisation de la voix (compression, analyse).The method makes it possible in particular to preprocess the audio signal of speech to improve the quality of the voice processing / enhancement bricks (compression, analysis).
L'ajout dans la chaîne de traitement d'un module M3 permet d'améliorer la qualité de mise en place d'une stratégie trame par trame pour la mise en temps réel des traitements.The addition in the processing chain of a module M 3 makes it possible to improve the quality of implementation of a frame-by-frame strategy for real-time processing.
Claims (2)
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
FR1913283A FR3103619B1 (en) | 2019-11-27 | 2019-11-27 | METHOD AND SYSTEM FOR SEPARATE IN AN AUDIO STREAM THE VOICE COMPONENT AND THE NOISE COMPONENT |
Publications (1)
Publication Number | Publication Date |
---|---|
EP3828886A1 true EP3828886A1 (en) | 2021-06-02 |
Family
ID=70918486
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP20209511.3A Pending EP3828886A1 (en) | 2019-11-27 | 2020-11-24 | Method and system for separating the voice component and the noise component in an audio flow |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP3828886A1 (en) |
FR (1) | FR3103619B1 (en) |
SG (1) | SG10202011769TA (en) |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190066713A1 (en) | 2016-06-14 | 2019-02-28 | The Trustees Of Columbia University In The City Of New York | Systems and methods for speech separation and neural decoding of attentional selection in multi-speaker environments |
-
2019
- 2019-11-27 FR FR1913283A patent/FR3103619B1/en active Active
-
2020
- 2020-11-24 EP EP20209511.3A patent/EP3828886A1/en active Pending
- 2020-11-26 SG SG10202011769TA patent/SG10202011769TA/en unknown
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190066713A1 (en) | 2016-06-14 | 2019-02-28 | The Trustees Of Columbia University In The City Of New York | Systems and methods for speech separation and neural decoding of attentional selection in multi-speaker environments |
Non-Patent Citations (3)
Title |
---|
MIMILAKIS STYLIANOS IOANNIS ET AL: "A recurrent encoder-decoder approach with skip-filtering connections for monaural singing voice separation", 2017 IEEE 27TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), IEEE, 25 September 2017 (2017-09-25), pages 1 - 6, XP033263882, DOI: 10.1109/MLSP.2017.8168117 * |
MIMILAKIS STYLIANOS LOANNIS ET AL., A RECURRENT ENCODER-DECODER APPROACH WITH SKIP-FILTERING CONNECTIONS FOR MONAURAL SINGING VOICE SÉPARATION, 25 September 2017 (2017-09-25), pages 1 - 6 |
STEPHENSON CORY ET AL: "Monaural speaker separation using source-contrastive estimation", 2017 IEEE INTERNATIONAL WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS), IEEE, 3 October 2017 (2017-10-03), pages 1 - 6, XP033257056, DOI: 10.1109/SIPS.2017.8110005 * |
Also Published As
Publication number | Publication date |
---|---|
FR3103619B1 (en) | 2022-06-24 |
FR3103619A1 (en) | 2021-05-28 |
SG10202011769TA (en) | 2021-06-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP0919096B1 (en) | Method for cancelling multichannel acoustic echo and multichannel acoustic echo canceller | |
EP2005420B1 (en) | Device and method for encoding by principal component analysis a multichannel audio signal | |
CA2436318C (en) | Noise reduction method and device | |
EP2691952B1 (en) | Allocation, by sub-bands, of bits for quantifying spatial information parameters for parametric encoding | |
EP2772916B1 (en) | Method for suppressing noise in an audio signal by an algorithm with variable spectral gain with dynamically adaptive strength | |
FR2808917A1 (en) | METHOD AND DEVICE FOR VOICE RECOGNITION IN FLUATING NOISE LEVEL ENVIRONMENTS | |
EP0998166A1 (en) | Device for audio processing,receiver and method for filtering the wanted signal and reproducing it in presence of ambient noise | |
EP1395981B1 (en) | Device and method for processing an audio signal | |
WO2017103418A1 (en) | Adaptive channel-reduction processing for encoding a multi-channel audio signal | |
CA3053032A1 (en) | Method and apparatus for dynamic modifying of the timbre of the voice by frequency shift of the formants of a spectral envelope | |
FR2690551A1 (en) | Quantization method of a predictor filter for a very low rate vocoder. | |
EP3025514B1 (en) | Sound spatialization with room effect | |
EP3025342B1 (en) | Method for suppressing the late reverberation of an audible signal | |
EP3828886A1 (en) | Method and system for separating the voice component and the noise component in an audio flow | |
FR3060830A1 (en) | SUB-BAND PROCESSING OF REAL AMBASSIC CONTENT FOR PERFECTIONAL DECODING | |
WO2023165946A1 (en) | Optimised encoding and decoding of an audio signal using a neural network-based autoencoder | |
FR2817694A1 (en) | SPATIAL LOWERING METHOD AND DEVICE FOR DARK AREAS OF AN IMAGE | |
EP2515300A1 (en) | Method and System for noise reduction | |
FR3085784A1 (en) | DEVICE FOR ENHANCING SPEECH BY IMPLEMENTING A NETWORK OF NEURONES IN THE TIME DOMAIN | |
EP4315328A1 (en) | Estimating an optimized mask for processing acquired sound data | |
US20220405547A1 (en) | Residual normalization for improved neural network classifications | |
EP0812070B1 (en) | Method and device for compression encoding of a digital signal | |
WO2013053631A1 (en) | Method and device for separating signals by iterative spatial filtering | |
EP1605440A1 (en) | Method for signal source separation from a mixture signal | |
FR2667472A1 (en) | REAL-TIME BINARY SEGMENTATION DEVICE OF DIGITAL IMAGES. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION HAS BEEN PUBLISHED |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
17P | Request for examination filed |
Effective date: 20211109 |
|
RBV | Designated contracting states (corrected) |
Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: EXAMINATION IS IN PROGRESS |
|
17Q | First examination report despatched |
Effective date: 20230306 |
|
P01 | Opt-out of the competence of the unified patent court (upc) registered |
Effective date: 20230517 |