EP3040989A1 - Improved method of separation and computer program product - Google Patents

Improved method of separation and computer program product Download PDF

Info

Publication number
EP3040989A1
EP3040989A1 EP15198713.8A EP15198713A EP3040989A1 EP 3040989 A1 EP3040989 A1 EP 3040989A1 EP 15198713 A EP15198713 A EP 15198713A EP 3040989 A1 EP3040989 A1 EP 3040989A1
Authority
EP
European Patent Office
Prior art keywords
matrix
rev
contribution
spectrogram
term
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
EP15198713.8A
Other languages
German (de)
French (fr)
Other versions
EP3040989B1 (en
Inventor
Romain Hennequin
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Audionamix
Original Assignee
Audionamix
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Audionamix filed Critical Audionamix
Priority to US14/984,089 priority Critical patent/US9711165B2/en
Publication of EP3040989A1 publication Critical patent/EP3040989A1/en
Application granted granted Critical
Publication of EP3040989B1 publication Critical patent/EP3040989B1/en
Not-in-force legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • G10L21/0308Voice signal separating characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0316Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
    • G10L21/0364Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude for improving intelligibility
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech

Definitions

  • the present invention relates to methods of separating a plurality of contributions into a mixing acoustic signal, and in particular to separating a voice contribution from a background musical contribution into an acoustic signal. mixture.
  • a soundtrack of a song includes a vocal contribution (the lyrics sung by one or more singers) and a musical contribution (accompanying music played by one or more instruments).
  • a soundtrack of a film includes a vocal contribution (dialogues between actors) superimposed on a musical contribution (special sound effects and / or background music).
  • this one results from the superposition of the dry voice, or pure in what follows, corresponding to the recording of the sound emitted by the singer and which propagated directly towards the microphone recording, and reverberation, corresponding to the recording of the sound emitted by the singer but which has propagated indirectly to the recording microphone, that is to say by reflection, possibly multiple, on the walls from the recording room.
  • Reverb consisting of the echoes of the pure voice at a given moment, spreads over a time interval that can be significant (for example three seconds).
  • the vocal contribution results from the superposition of the pure voice at this moment and the different echoes of the pure voice at previous moments.
  • the type of algorithm proposed by this document applies only to multichannel signals and does not allow a correct extraction of reverb effects, which can be found in music.
  • the reverberation that affects this component is distributed in the different components obtained after the separation.
  • the separate vocal component loses its richness and the accompanying music component is not of good quality.
  • reverb can be caused by the conditions under which sound is taken, but can also be artificially added during the post-production of the soundtrack, mainly for aesthetic reasons.
  • the invention therefore aims to overcome this problem.
  • the invention therefore relates to a separation method and a program product according to the claims.
  • the separation method 100 uses a mixing temporal acoustic signal w ( t ), to deliver a vocal acoustic signal y ( t ) and a musical acoustic signal z ( t ).
  • the signals are all acoustic signals, so that the qualifier of acoustics will be omitted in what follows.
  • These signals are time signals. They depend on time t .
  • the acoustic mix signal is a source soundtrack, or at least an extract from a soundtrack.
  • the acoustic mixing signal w ( t ) comprises a first so-called specific contribution and a second so-called accompanying contribution.
  • the first contribution is a vocal contribution and corresponds to words sung by a singer.
  • the second contribution is a musical contribution and corresponds to the musical accompaniment of the singer.
  • the vocal acoustic signal y ( t ) corresponds to the only vocal contribution, isolated from the rest of the mixing signal w ( t ), and the musical acoustic signal z ( t ) corresponds to the only musical contribution, isolated from the rest of the mixing signal w (t).
  • the pure speech signal x (t) is the free-field signal and the impulse response r ( t ) is characteristic of the acoustic environment of the recording.
  • the first step 110 of the method 100 consists of sampling the mixing signal w ( t ) and calculating a spectrogram V of the mixing signal w (t).
  • a spectrogram is defined as the absolute value (or the square of the absolute value) of the short-term Fourier transform of a sampled signal.
  • Other time-frequency transformations are possible, such as a constant Q transform, or a short-term Fourier transform followed by frequency filtering (using a Mel or Bark scale filter bank, for example).
  • the spectrogram For each time sampling step, the spectrogram comprises a frequency frame, indicating for each frequency sampling step, the instantaneous power of the signal.
  • the spectrogram V is therefore a matrix F x U, positive real numbers
  • U represents the total number of frames that subdivided the signal duration of the mixture w (t).
  • F is the total number of frequency sampling steps, which is generally between 200 and 2000.
  • the method 100 then comprises a first part in which the voice signal is considered as a pure vocal signal, without reverberation.
  • the mixing signal modeling spectrogram is the sum of the spectrogram of the speech signal V y , and the spectrogram of the musical signal V z .
  • V y is the spectrogram of the signal y (t), considered unaffected by reverberation.
  • This modeling is finally the usual modeling in the context of the methods of decomposition by factorization in non-negative matrices.
  • â refers to a quantity which is an estimate of the quantity a.
  • W F 0 is a matrix of harmonic atoms, which is predefined and specific to speech signals
  • H F 0 is an activation matrix indicating at each moment the harmonic atoms of the matrix W F 0 which are activated.
  • W K is a matrix of filtering atoms
  • H K is an activation matrix indicating at each instant the filtering atoms of the matrix W K that are activated.
  • the operator ⁇ corresponds to the term-to-term matrix multiplication of two matrices (also called Hadamard product).
  • the first part of the process then consists in estimating the matrices H F 0 , W K , H K , W R and H R.
  • V ⁇ there + V ⁇ z ⁇ f , t d V ft
  • b at b - log at b - 1
  • beta-divergence is defined by: d ⁇ at
  • step 120 the cost function C is thus minimized so as to determine the optimum value of each parameter of each matrix.
  • This minimization is performed by iterations, with multiplicative updating rules which are successively applied to each of the parameters of the matrices H F 0 , W K , H K , W R and H R.
  • update rules are for example elaborated by considering the gradient (that is to say the partial derivative) of the cost function C with respect to each parameter. More precisely, the gradient of the cost function with respect to the parameter considered is written in the form of a difference between two positive terms, and the corresponding updating rule is a multiplication of the parameter considered by the ratio of these two terms. .
  • the update rules are as follows: H F 0 ⁇ H F 0 ⁇ W F 0 T W K H K ⁇ V ⁇ V ⁇ ⁇ ⁇ - 2 W F 0 T W K H K ⁇ V ⁇ ⁇ ⁇ - 1 H K ⁇ H K ⁇ W K T W F 0 H F 0 ⁇ V ⁇ V ⁇ ⁇ ⁇ - 2 W K T W F 0 H F 0 ⁇ V ⁇ ⁇ ⁇ - 1 W K ⁇ W K ⁇ W F 0 H F 0 ⁇ V ⁇ V ⁇ ⁇ ⁇ - 2 H K T W F 0 H F 0 ⁇ V ⁇ ⁇ ⁇ - 1 H K T H R ⁇ H R ⁇ W R T V ⁇ V ⁇ ⁇ ⁇ - 2 W R T V ⁇ ⁇ ⁇ - 1 W R ⁇ W R ⁇ V ⁇ V ⁇ ⁇ ⁇ - 2 W R T V ⁇ ⁇ ⁇ - 1 W R ⁇ W
  • step 130 the matrix H F 0 is constrained by using a tracking algorithm such as the Viterbi "tracking" algorithm in order to select, for each time step, the frequency step in which we find a maximum power, without being too far in frequency power maxima selected for previous time steps.
  • a tracking algorithm such as the Viterbi "tracking" algorithm
  • step 140 the coefficients of the matrix H F 0 which are at a frequency distance greater than a reference distance are set to 0.
  • the speech signal is considered to be affected by reverberation.
  • the first part of the process allows to obtain initial values for the parameters which will be estimated by successive iterations during the implementation of the second part of the process. Other ways of defining the initial values of these parameters are conceivable.
  • * t denotes a line-by-line convolution operator as explained in the right-hand side of the equation above.
  • the reverberation matrix R has T time step (of the same length a step of sampling mixed signal), and F no sampling frequency.
  • T is predetermined by the user and is generally between 20 and 200, for example 100.
  • V ⁇ x W F 0 H F 0 ⁇ W K H K
  • V ⁇ rev , there + V ⁇ z ⁇ f , t d V ft
  • b at b - log at b - 1
  • the cost function of the second part is similar to that used in the first part.
  • step 220 the cost function C is then minimized so as to determine the optimum value of each parameter of each matrix, in particular the parameters of the reverberation matrix.
  • the update rules are developed from the partial derivative of the cost function C with respect to each relevant parameter. They therefore depend on the form chosen for the cost function, in particular the divergence used in this cost function. The rules above are therefore specific to the use of beta-divergence.
  • the update rule of the reverberation matrix R is general in the sense that it does not depend on the modeling chosen for the spectrogram V x of the pure signal or that of the background sound spectrogram V z .
  • the iterations start from the matrix H ' F 0 determined in the first part of the method. It should be noted that, since the update rules are multiplicative, the coefficients of the matrix H F 0 initially set to 0 will remain at 0 during the minimization of the cost function in the second part of the method.
  • step 230 conventional adapted processes (in particular a Wiener filtering type treatment) are applied to the above spectrograms to obtain in particular the spectrograms of interest V x , V z . Then, in step 240, an inverse transformation of that of step 110 is performed on these spectrograms to obtain the output signals, pure speech signal x (t) and musical signal z (t).
  • step 230 conventional adapted processes (in particular a Wiener filtering type treatment) are applied to the above spectrograms to obtain in particular the spectrograms of interest V x , V z .
  • step 240 an inverse transformation of that of step 110 is performed on these spectrograms to obtain the output signals, pure speech signal x (t) and musical signal z (t).
  • these acoustic signals are monophonic signals.
  • these signals are stereophonic. More generally, they are multichannel. The person skilled in the art knows how to adapt to stereophonic or multichannel signals the treatments presented for the case of monophonic signals.
  • the preferred embodiment relates to a specific component or interest component that is a voice component.
  • the modeling of the reverberation of a component is general and applies to any type of component.
  • the background sound component can also be affected by reverberation.
  • any type of non-negative non-reverberated sound spectrograms may also be used instead of those used above.
  • the mixture comprises two components. Generalization to any number of components is straightforward.
  • SDR signal-to-distortion ratio
  • SAR Signal to Artefact Ratio
  • SIR Signal-to-interference ratio
  • the method according to the invention therefore improves the results obtained, whatever the way of analyzing them.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Stereophonic System (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
  • Auxiliary Devices For Music (AREA)

Abstract

Procédé consistant à séparer, dans un signal de mélange ( w ( t )), une contribution spécifique pure x ( t ) et une contribution de fond sonore z ( t ) en utilisant un spectrogramme de modélisation du signal de mélange V correspondant à la somme d'un spectrogramme d'une contribution spécifique réverbérée V rev,y et d'un spectrogramme de la contribution de fond sonore V z , le spectrogramme de la contribution spécifique réverbérée dépendant du spectrogramme de la contribution pure V x selon le modèle : V ^ f , t rev . y = ˆ‘ Ä = 1 T V ^ f , t - Ä + 1 x R f , Ä où R est une matrice de réverbération, f est un pas de fréquence, t est un pas de temps, et Ä un entier entre 1 et T ; et en minimisant une fonction de coût ( C ) entre le spectrogramme du signal de mélange et le spectrogramme de modélisation du signal de mélange.A method of separating, in a mixing signal (w(t)), a pure specific contribution x(t) and a background contribution z(t) using a spectrogram modeling the mixing signal V corresponding to the sum a spectrogram of a reverberant specific contribution V rev,y and a spectrogram of the background sound contribution V z , the spectrogram of the reverberant specific contribution depending on the spectrogram of the pure contribution V x according to the model: V ^ f, t rev. y = Ä = 1 T V ^ f , t − Ä + 1 x R f , Ä where R is a reverb matrix, f is a frequency step, t is a time step, and Ä an integer between 1 and T; and minimizing a cost function (C) between the mixing signal spectrogram and the modeling mixing signal spectrogram.

Description

La présente invention a pour domaine celui des procédés de séparation d'une pluralité de contributions dans un signal acoustique de mélange, et, en particulier, la séparation d'une contribution vocale, d'une contribution musicale de fond sonore, dans un signal acoustique de mélange.The present invention relates to methods of separating a plurality of contributions into a mixing acoustic signal, and in particular to separating a voice contribution from a background musical contribution into an acoustic signal. mixture.

Une bande son d'une chanson comporte une contribution vocale (les paroles chantées par un ou plusieurs chanteurs) et une contribution musicale (la musique d'accompagnement jouée par un ou plusieurs instruments).A soundtrack of a song includes a vocal contribution (the lyrics sung by one or more singers) and a musical contribution (accompanying music played by one or more instruments).

Une bande son d'un film comporte une contribution vocale (les dialogues entre acteurs) superposée à une contribution musicale (les effets spéciaux sonores et/ou une musique de fond).A soundtrack of a film includes a vocal contribution (dialogues between actors) superimposed on a musical contribution (special sound effects and / or background music).

Il est connu des algorithmes de séparation permettant de séparer la contribution vocale, de la contribution musicale, dans une bande son originale.Separation algorithms are known to separate the vocal contribution from the musical contribution into an original soundtrack.

Par exemple l'article de Jean-Louis Durrieu et al. "An iterative approach to monaural musical mixture de-soloing," in International Conference on Acoustics, Speech, and Signal Processing (ICASSP),Taipei, Taiwan, April 2009, pp. 105 - 108 divulgue un algorithme de séparation du type algorithme de séparation de sources sous-déterminée fondé sur une décomposition en matrices non-négatives, permettant de séparer la contribution vocale de la contribution de fond sonore.For example the article of Jean-Louis Durrieu et al. "An iterative approach to monaural musical mixture of soloing," in International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Taipei, Taiwan, April 2009, pp. 105 - 108 discloses a separation algorithm of the underdetermined source separation algorithm based on non-negative matrix decomposition, to separate the voice contribution from the background contribution.

Cependant, les algorithmes de séparation connus ne permettent pas de prendre correctement en compte le phénomène de réverbération affectant les composantes du mélange.However, the known separation algorithms do not make it possible to correctly take into account the reverberation phenomenon affecting the components of the mixture.

Dans le cas particulier d'une composante vocale, celle-ci résulte de la superposition de la voix sèche, ou pure dans ce qui suit, correspondant à l'enregistrement du son émis par le chanteur et qui s'est propagé directement vers le microphone d'enregistrement, et de la réverbération, correspondant à l'enregistrement du son émis par le chanteur mais qui s'est propagé indirectement vers le microphone d'enregistrement, c'est-à-dire par réflexion, éventuellement multiples, sur les parois de la salle d'enregistrement. La réverbération, constituée des échos de la voix pure à un instant donné, s'étale sur un intervalle de temps pouvant être important (par exemple trois secondes). Dit autrement, à un instant donné, la contribution vocale résulte de la superposition de la voix pure à cet instant et des différents échos de la voix pure à des instants précédents.In the particular case of a vocal component, this one results from the superposition of the dry voice, or pure in what follows, corresponding to the recording of the sound emitted by the singer and which propagated directly towards the microphone recording, and reverberation, corresponding to the recording of the sound emitted by the singer but which has propagated indirectly to the recording microphone, that is to say by reflection, possibly multiple, on the walls from the recording room. Reverb, consisting of the echoes of the pure voice at a given moment, spreads over a time interval that can be significant (for example three seconds). In other words, at a given moment, the vocal contribution results from the superposition of the pure voice at this moment and the different echoes of the pure voice at previous moments.

Or, les algorithmes de séparation connus ne prennent pas en compte les effets à long terme de la réverbération affectant une composante du mélange. Le document de Ngoc Q. K. Duong, Emmanuel Vincent, et Remi Gribonval, "Underdetermined reverberant audio source separation using a full-rank spatial covariance model," IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, no. 7, pp. 1830 - 1840, Sept 2010 s'intéresse aux effets instantanés de diffusions spatiales de la réverbération, mais ne modélise pas les effets de mémoire, c'est-à-dire la prise en compte du temps de latence entre l'enregistrement d'un son et l'enregistrement des échos associé à ce son. Ainsi, le type d'algorithme proposé par ce document ne s'applique qu'à des signaux multicanaux et ne permet pas une extraction correcte des effets de réverbération, que l'on peut trouver dans la musique. Dans le cas d'une composante vocale, la réverbération qui affecte cette composante est répartie dans les différentes composantes obtenues à l'issue de la séparation. La composante vocale séparée perd de sa richesse et la composante musicale d'accompagnement n'est pas de bonne qualité.However, the known separation algorithms do not take into account the long-term effects of the reverberation affecting a component of the mixture. The document Ngoc QK Duong, Emmanuel Vincent, and Remi Gribonval, "Underdetermined reverberant audio source separation using a full-rank spatial covariance model," IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, no. 7, pp. 1830 - 1840, Sept 2010 is concerned with the instantaneous effects of spatial reverberation, but does not model the effects of memory, ie, taking into account the latency between the recording of a sound and the recording of echoes associated with this sound. Thus, the type of algorithm proposed by this document applies only to multichannel signals and does not allow a correct extraction of reverb effects, which can be found in music. In the case of a voice component, the reverberation that affects this component is distributed in the different components obtained after the separation. The separate vocal component loses its richness and the accompanying music component is not of good quality.

Il est à noter que la réverbération peut avoir pour cause les conditions dans lesquelles est réalisée la prise de son, mais peut également être ajoutée artificiellement au cours de la post-production de la bande son, essentiellement pour des raisons esthétiques.It should be noted that reverb can be caused by the conditions under which sound is taken, but can also be artificially added during the post-production of the soundtrack, mainly for aesthetic reasons.

Il y a donc un besoin pour un procédé permettant de séparer des contributions dans un mélange, ces contributions intégrant une réverbération du signal sonore pure correspondant. Plus particulièrement, il y a un besoin pour séparer une contribution vocale pure affectée par de la réverbération, d'une contribution musicale de fond sonore, dans un signal sonore.There is therefore a need for a method for separating contributions in a mixture, these contributions integrating a reverberation of the corresponding pure sound signal. More particularly, there is a need to separate a pure vocal contribution affected by reverberation, from a background musical contribution, into a sound signal.

L'invention a donc pour but de pallier ce problème.The invention therefore aims to overcome this problem.

L'invention a donc pour objet un procédé de séparation et un produit programme conformes aux revendications.The invention therefore relates to a separation method and a program product according to the claims.

L'invention sera mieux comprise à la lecture de la description qui va suivre d'un mode de réalisation particulier, donné uniquement à titre d'exemple illustratif et non limitatif, et faite en se référant aux dessins annexés sur lesquels :

  • la figure 1 est une représentation sous forme de blocs des différentes étapes du procédé de séparation selon l'invention ; et,
  • les figures 2 et 3 correspondent à des graphes qui résultent de tests permettant de comparer, selon des critères normatifs connus, les résultats de la mise en oeuvre du procédé de la figure 1.
The invention will be better understood on reading the following description of a particular embodiment, given solely by way of illustrative and nonlimiting example, and with reference to the appended drawings in which:
  • the figure 1 is a block representation of the different steps of the separation process according to the invention; and,
  • the Figures 2 and 3 correspond to graphs which result from tests making it possible to compare, according to known normative criteria, the results of the implementation of the method of the figure 1 .

En se référant à la figure 1, le procédé de séparation 100 utilise un signal acoustique temporel de mélange w(t), pour délivrer un signal acoustique vocal y(t) et un signal acoustique musical z(t).Referring to the figure 1 , the separation method 100 uses a mixing temporal acoustic signal w ( t ), to deliver a vocal acoustic signal y ( t ) and a musical acoustic signal z ( t ).

Les signaux sont tous des signaux acoustiques, de sorte que le qualificatif d'acoustique sera omis dans ce qui suit.The signals are all acoustic signals, so that the qualifier of acoustics will be omitted in what follows.

Ces signaux sont des signaux temporels. Ils dépendent du temps t.These signals are time signals. They depend on time t .

Le signal acoustique de mélange est une bande son source, ou tout au moins un extrait d'une bande son.The acoustic mix signal is a source soundtrack, or at least an extract from a soundtrack.

Le signal acoustique de mélange w(t) comprend une première contribution dite spécifique et une seconde contribution dite d'accompagnement.The acoustic mixing signal w ( t ) comprises a first so-called specific contribution and a second so-called accompanying contribution.

Dans la présente description, la première contribution est une contribution vocale et correspond à des paroles chantées par un chanteur.In the present description, the first contribution is a vocal contribution and corresponds to words sung by a singer.

La seconde contribution est une contribution musicale et correspond à l'accompagnement musical du chanteur.The second contribution is a musical contribution and corresponds to the musical accompaniment of the singer.

Le signal acoustique vocal y(t) correspond à la seule contribution vocale, isolée du reste du signal de mélange w(t), et le signal acoustique musical z(t) correspond à la seule contribution musicale, isolée du reste du signal de mélange w(t). The vocal acoustic signal y ( t ) corresponds to the only vocal contribution, isolated from the rest of the mixing signal w ( t ), and the musical acoustic signal z ( t ) corresponds to the only musical contribution, isolated from the rest of the mixing signal w (t).

Dans le présent mode de réalisation, on considère que seule la contribution vocale est réverbérée.In the present embodiment, it is considered that only the voice contribution is reverberated.

La réverbération est modélisée de la manière suivante : y t = r t * x t

Figure imgb0001
x(t) est le signal vocal pur, c'est-à-dire le signal sonore généré par le chanteur est qui s'est propagé directement vers le microphone d'enregistrement ; et où r(t) est une réponse impulsionnelle, qui est une distribution donnant l'amplitude des échos pour chaque instant d'arrivé de l'écho correspondant sur le microphone d'enregistrement, et où * correspond au produit de convolution.The reverb is modeled as follows: there t = r t * x t
Figure imgb0001
where x (t) is the pure speech signal, that is, the sound signal generated by the singer is that propagated directly to the recording microphone; and where r ( t ) is an impulse response, which is a distribution giving the amplitude of the echoes for each arrival time of the corresponding echo on the recording microphone, and where * is the convolution product.

Le signal vocal pur x(t) est le signal en champ libre et la réponse impulsionnelle r(t) est caractéristique de l'environnement acoustique de l'enregistrement.The pure speech signal x (t) is the free-field signal and the impulse response r ( t ) is characteristic of the acoustic environment of the recording.

Dans le domaine temps fréquence, pour des spectrogrammes non-négatifs, ce modèle de réverbération peut être approximé, tel que proposé dans le document de Rita Singh, Bhiksha Raj, et Paris Smaragdis, "Latent-variable decomposition based dereverberation of monaural and multi-channel signals," in IEEE International Conference on Audio and Speech Signal Processing, Dallas, Texas, USA, March 2010 , par : V f , t rev , y = τ = 1 T V f , t - τ + 1 x R f , τ

Figure imgb0002
V rev,y est le spectrogramme du signal y(t), considéré comme affecté par de la réverbération, Vx est le spectrogramme du signal x(t), R est une matrice FxT de réverbération correspondant au spectrogramme de la réponse impulsionnelle r(t), avec F la dimension fréquentiel et T la dimension temporelle de R. In the time-frequency domain, for non-negative spectrograms, this reverberation model can be approximated, as proposed in the Rita Singh, Bhiksha Raj, and Paris Smaragdis, "Latent-variable decomposition based dereverberation of monaural and multi-channel signals," in IEEE International Conference on Audio and Speech Signal Processing, Dallas, Texas, USA, March 2010 , by : V f , t rev , there = Σ τ = 1 T V f , t - τ + 1 x R f , τ
Figure imgb0002
where V rev , y is the spectrogram of the signal y (t), considered to be affected by reverberation, V x is the spectrogram of the signal x (t), R is a matrix FxT of reverb corresponding to the spectrogram of the impulse response r ( t ), with F the frequency dimension and T the time dimension of R.

La première étape 110 du procédé 100 consiste à échantillonner le signal de mélange w(t) et à calculer un spectrogramme V du signal de mélange w(t). De manière générale, un spectrogramme est défini comme la valeur absolue (ou bien le carré de la valeur absolue) de la transformée de Fourier à court terme d'un signal échantillonné. D'autres transformations temps-fréquence sont envisageables, telles qu'une transformée à Q constant, ou encore une transformée de Fourier à court terme suivie d'un filtrage fréquentiel (en utilisant un banc de filtres en échelle Mel ou Bark par exemple).The first step 110 of the method 100 consists of sampling the mixing signal w ( t ) and calculating a spectrogram V of the mixing signal w (t). In general, a spectrogram is defined as the absolute value (or the square of the absolute value) of the short-term Fourier transform of a sampled signal. Other time-frequency transformations are possible, such as a constant Q transform, or a short-term Fourier transform followed by frequency filtering (using a Mel or Bark scale filter bank, for example).

Pour chaque pas d'échantillonnage temporel, le spectrogramme comporte une trame en fréquence, indiquant pour chaque pas d'échantillonnage en fréquence, la puissance instantanée du signal.For each time sampling step, the spectrogram comprises a frequency frame, indicating for each frequency sampling step, the instantaneous power of the signal.

Le spectrogramme V est donc une matrice F x U, de nombres réels positifsThe spectrogram V is therefore a matrix F x U, positive real numbers

U représente le nombre total de trames qui subdivisent la durée du signal du mélange w(t). F est le nombre total de pas d'échantillonnage en fréquence, qui vaut en général entre 200 et 2000.U represents the total number of frames that subdivided the signal duration of the mixture w (t). F is the total number of frequency sampling steps, which is generally between 200 and 2000.

Le procédé 100 comporte ensuite une première partie dans laquelle le signal vocal est considéré comme un signal vocal pur, sans réverbération.The method 100 then comprises a first part in which the voice signal is considered as a pure vocal signal, without reverberation.

Dans cette première partie, le spectrogramme de modélisation du signal de mélange est la somme du spectrogramme du signal vocal y , et du spectrogramme du signal musical z. V̂y est le spectrogramme du signal y(t), considéré comme non affecté par de la réverbération. Cette modélisation est finalement la modélisation usuelle dans le cadre des méthodes de décomposition par factorisation en matrices non-négatives.In this first part, the mixing signal modeling spectrogram is the sum of the spectrogram of the speech signal V y , and the spectrogram of the musical signal V z . V y is the spectrogram of the signal y (t), considered unaffected by reverberation. This modeling is finally the usual modeling in the context of the methods of decomposition by factorization in non-negative matrices.

Il est à noter que â se réfère à une quantité qui est une estimation de la quantité a. It should be noted that â refers to a quantity which is an estimate of the quantity a.

Ainsi, dans les étapes de la première partir du procédé 100, on cherche à estimer les deux spectrogrammes de sortie dont la somme approxime (signe ≈ dans l'expression suivante) au mieux le spectrogramme du mélange : V V ^ = V ^ y + V ^ z

Figure imgb0003
Thus, in the steps of the first departure of the method 100, it is sought to estimate the two output spectrograms whose sum approximates (sign ≈ in the following expression) at best the spectrogram of the mixture: V V ^ = V ^ there + V ^ z
Figure imgb0003

La modélisation du signal vocal est fondée sur un modèle de production de la voix du type source / filtre, tel que proposé dans le document de Jean-Louis Durrieu et al. "An iterative approach to monaural musical mixture de-soloing," in International Conference on Acoustics, Speech, and Signal Processing (ICASSP),Taipei, Taiwan, April 2009, pp. 105 - 108 : V ^ y = W F 0 H F 0 W K H K

Figure imgb0004
The voice signal modeling is based on a voice / source type production model, as proposed in the Jean-Louis Durrieu et al. "An iterative approach to monaural musical mixture of soloing," in International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Taipei, Taiwan, April 2009, pp. 105 - 108 : V ^ there = W F 0 H F 0 W K H K
Figure imgb0004

Le premier terme de cette modélisation est la source de la voix, qui correspond à l'excitation des cordes vocales : W F0 est une matrice d'atomes harmoniques, qui est prédéfinie et spécifique aux signaux vocaux ; H F0 est une matrice d'activation indiquant à chaque instant les atomes harmoniques de la matrice W F0 qui sont activés.The first term of this modeling is the source of the voice, which corresponds to the excitation of the vocal chords: W F 0 is a matrix of harmonic atoms, which is predefined and specific to speech signals; H F 0 is an activation matrix indicating at each moment the harmonic atoms of the matrix W F 0 which are activated.

Le second terme de cette modélisation est le filtre de la voix, qui correspond au filtrage effectué par le conduit vocal : WK est une matrice d'atomes de filtrage ; HK est une matrice d'activation indiquant à chaque instant les atomes de filtrage de la matrice WK qui sont activés.The second term of this modeling is the filter of the voice, which corresponds to the filtering performed by the vocal tract: W K is a matrix of filtering atoms; H K is an activation matrix indicating at each instant the filtering atoms of the matrix W K that are activated.

L'opérateur ⊙ correspond à la multiplication matricielle terme à terme de deux matrices (aussi dénommé produit d'Hadamard).The operator ⊙ corresponds to the term-to-term matrix multiplication of two matrices (also called Hadamard product).

La modélisation du signal musical est fondée sur un modèle générique de factorisation par matrices non-négatives : V ^ z = W R H R

Figure imgb0005
The modeling of the musical signal is based on a generic model of factorization by non-negative matrices: V ^ z = W R H R
Figure imgb0005

Des colonnes de WR peuvent être vues comme des modèles spectraux élémentaires et HR comme une matrice d'activation de ces modèles élémentaires au fil du temps.Columns of W R can be seen as elemental spectral models and H R as an activation matrix of these elementary models over time.

La première partie du procédé consiste alors à estimer les matrices H F0 , WK, HK, WR et HR. The first part of the process then consists in estimating the matrices H F 0 , W K , H K , W R and H R.

Afin d'estimer les paramètres de ces matrices, une fonction de coût C, fondée sur une divergence d par élément, est utilisée : C = D V | V ^ y + V ^ z = Σ f , t d V ft | V ^ ft y + V ^ ft z

Figure imgb0006
In order to estimate the parameters of these matrices, a cost function C, based on a divergence d by element, is used: VS = D V | V ^ there + V ^ z = Σ f , t d V ft | V ^ ft there + V ^ ft z
Figure imgb0006

Dans le mode de réalisation actuellement envisagé, la divergence d'Itakura-Saito, bien connue de l'homme du métier, est utilisée. Celle-ci est obtenue en fixant la valeur du paramètre de la beta-divergence à β = 0 et s'exprime donc : d a | b = a b - log a b - 1

Figure imgb0007
In the presently contemplated embodiment, the Itakura-Saito divergence, well known to those skilled in the art, is used. This one is obtained by fixing the value of the parameter of the beta-divergence with β = 0 and is thus expressed: d at | b = at b - log at b - 1
Figure imgb0007

Pour mémoire la beta-divergence est définie par : d β a | b = { 1 β β - 1 a β + β - 1 b β - βab β - 1 , β R 0 1 a log a b - a + b , β = 1 a b - log a b - 1 , β = 0

Figure imgb0008
a et b sont deux scalaires réels positifs.For the record, beta-divergence is defined by: d β at | b = { 1 β β - 1 at β + β - 1 b β - βab β - 1 , β R 0 1 at log at b - at + b , β = 1 at b - log at b - 1 , β = 0
Figure imgb0008
where a and b are two real positive scalars.

A l'étape 120, la fonction de coût C est ainsi minimisée de manière à déterminer la valeur optimale de chaque paramètre de chaque matrice. Cette minimisation est effectuée par itérations, avec des règles de mise à jour multiplicatives qui sont successivement appliquées à chacun des paramètres des matrices H F0 , WK, HK, WR et HR. In step 120, the cost function C is thus minimized so as to determine the optimum value of each parameter of each matrix. This minimization is performed by iterations, with multiplicative updating rules which are successively applied to each of the parameters of the matrices H F 0 , W K , H K , W R and H R.

Ces règles de mise à jour sont par exemple élaborées en considérant le gradient (c'est-à-dire la dérivée partielle) de la fonction de coût C par rapport à chaque paramètre. Plus précisément, le gradient de la fonction de coût par rapport au paramètre considéré est écrit sous la forme d'une différence entre deux termes positifs, et la règle de mise à jour correspondante est une multiplication du paramètre considéré par le rapport de ces deux termes.These update rules are for example elaborated by considering the gradient (that is to say the partial derivative) of the cost function C with respect to each parameter. More precisely, the gradient of the cost function with respect to the parameter considered is written in the form of a difference between two positive terms, and the corresponding updating rule is a multiplication of the parameter considered by the ratio of these two terms. .

Cela permet notamment que les paramètres restent non négatifs à chaque mise à jour et deviennent constants lorsque le gradient de la fonction de coût par rapport au paramètre considéré tend vers zéro.This allows in particular that the parameters remain non-negative at each update and become constant when the gradient of the cost function with respect to the parameter considered tends to zero.

De cette manière, les paramètres évoluent vers un minimum local.In this way, the parameters evolve towards a local minimum.

Les règles de mise à jour sont ainsi les suivantes : H F 0 H F 0 W F 0 T W K H K V V ^ β - 2 W F 0 T W K H K V ^ β - 1

Figure imgb0009
H K H K W K T W F 0 H F 0 V V ^ β - 2 W K T W F 0 H F 0 V ^ β - 1
Figure imgb0010
W K W K W F 0 H F 0 V V ^ β - 2 H K T W F 0 H F 0 V ^ β - 1 H K T
Figure imgb0011
H R H R W R T V V ^ β - 2 W R T V ^ β - 1
Figure imgb0012
W R W R V V ^ β - 2 H R T V ^ β - 1 H R T
Figure imgb0013
où ⊙ est un opérateur correspondant au produit terme à terme entre matrices (ou vecteur) ; .⊙(.) est un opérateur correspondant à l'exponentiation terme à terme d'une matrice par un scalaire ; (.)T est la transposée d'une matrice.The update rules are as follows: H F 0 H F 0 W F 0 T W K H K V V ^ β - 2 W F 0 T W K H K V ^ β - 1
Figure imgb0009
H K H K W K T W F 0 H F 0 V V ^ β - 2 W K T W F 0 H F 0 V ^ β - 1
Figure imgb0010
W K W K W F 0 H F 0 V V ^ β - 2 H K T W F 0 H F 0 V ^ β - 1 H K T
Figure imgb0011
H R H R W R T V V ^ β - 2 W R T V ^ β - 1
Figure imgb0012
W R W R V V ^ β - 2 H R T V ^ β - 1 H R T
Figure imgb0013
where ⊙ is an operator corresponding to the term term product between matrices (or vector); .⊙ (.) Is an operator corresponding to the term-exponentiation of a matrix by a scalar; (.) T is the transpose of a matrix.

Pour cette première partie, tous les paramètres sont initialisés avec des valeurs non-négatives choisies de manière aléatoire.For this first part, all parameters are initialized with non-negative values chosen randomly.

Puis, à l'étape 130, la matrice H F0 est contrainte en utilisant un algorithme de suivi tel que l'algorithme de « tracking » de Viterbi afin de sélectionner, pour chaque pas temporel, le pas en fréquence dans lequel on retrouve un maximum de puissance, sans être trop éloigné en fréquence des maxima de puissance sélectionnés pour les pas temporels précédents.Then, in step 130, the matrix H F 0 is constrained by using a tracking algorithm such as the Viterbi "tracking" algorithm in order to select, for each time step, the frequency step in which we find a maximum power, without being too far in frequency power maxima selected for previous time steps.

Puis, à l'étape 140, les coefficients de la matrice H F0 qui sont à une distance en fréquence supérieure à une distance de référence sont fixés à 0.Then, in step 140, the coefficients of the matrix H F 0 which are at a frequency distance greater than a reference distance are set to 0.

Une matrice H' F0 est obtenue.A matrix H ' F 0 is obtained.

Dans la seconde partie du procédé 100, le signal vocal est considéré comme affecté par de la réverbération. Il est à noter que la première partie du procédé permet d'obtenir des valeurs initiales pour les paramètres qui vont être estimés par itérations successives lors de la mise en oeuvre de la seconde partie du procédé. D'autres manières de définir les valeurs initiales de ces paramètres sont envisageables.In the second part of the method 100, the speech signal is considered to be affected by reverberation. It should be noted that the first part of the process allows to obtain initial values for the parameters which will be estimated by successive iterations during the implementation of the second part of the process. Other ways of defining the initial values of these parameters are conceivable.

Dans cette seconde partie, la modélisation du signal vocal considéré comme réverbéré, rev,y, en fonction du signal vocal pure x s'écrit alors : V ^ rev , y f , t = V ^ x * R t f , t = τ = 1 T V f , t - τ + 1 x R f , τ

Figure imgb0014
où * t dénote un opérateur de convolution ligne par ligne tel qu'explicité dans le membre de droite de l'équation ci-dessus.In this second part, the modeling of the vocal signal considered as reverberated, V rev, y , as a function of the pure vocal signal V x is then written: V ^ rev , there f , t = V ^ x * R t f , t = Σ τ = 1 T V f , t - τ + 1 x R f , τ
Figure imgb0014
where * t denotes a line-by-line convolution operator as explained in the right-hand side of the equation above.

La matrice de réverbération R comporte T pas de temps (de même durée qu'un pas d'échantillonnage du signal de mélange), et F pas d'échantillonnage en fréquence. T est prédéterminé par l'utilisateur et vaut généralement entre 20 et 200, par exemple 100.The reverberation matrix R has T time step (of the same length a step of sampling mixed signal), and F no sampling frequency. T is predetermined by the user and is generally between 20 and 200, for example 100.

De plus, comme ci-dessus, le spectrogramme x du signal pure est modélisé par : V ^ x = W F 0 H F 0 W K H K

Figure imgb0015
Moreover, as above, the spectrogram V x of the pure signal is modeled by: V ^ x = W F 0 H F 0 W K H K
Figure imgb0015

La seconde partie du procédé consiste alors à estimer les matrices H F0 , WK, HK, WR, HR et R qui permettent d'approximer le spectrogramme du mélange V : V V ^ rev = V ^ rev , y + V ^ z

Figure imgb0016
The second part of the process then consists in estimating the matrices H F 0 , W K , H K , W R , H R and R which make it possible to approximate the spectrogram of the mixture V : V V ^ rev = V ^ rev , there + V ^ z
Figure imgb0016

Afin d'estimer les paramètres de ces matrices, une fonction de coût C, fondée sur une divergence d par élément, est utilisée : C = D V | V ^ rev , y + V ^ z = Σ f , t d V ft | V ^ ft rev , y + V ^ ft z

Figure imgb0017
In order to estimate the parameters of these matrices, a cost function C, based on a divergence d by element, is used: VS = D V | V ^ rev , there + V ^ z = Σ f , t d V ft | V ^ ft rev , there + V ^ ft z
Figure imgb0017

Dans le mode de réalisation actuellement envisagé, la divergence d'Itakura-Saito, bien connue de l'homme du métier, est utilisée. Celle-ci est obtenue en fixant la valeur du paramètre de la beta-divergence à β = 0 et s'exprime donc : d a | b = a b - log a b - 1

Figure imgb0018
In the presently contemplated embodiment, the Itakura-Saito divergence, well known to those skilled in the art, is used. This one is obtained by fixing the value of the parameter of the beta-divergence with β = 0 and is thus expressed: d at | b = at b - log at b - 1
Figure imgb0018

Avantageusement, la fonction de coût de la seconde partie est similaire à celle utilisée dans la première partie.Advantageously, the cost function of the second part is similar to that used in the first part.

A l'étape 220, la fonction de coût C est alors minimisée de manière à déterminer la valeur optimale de chaque paramètre de chaque matrice, en particulier les paramètres de la matrice de réverbération.In step 220, the cost function C is then minimized so as to determine the optimum value of each parameter of each matrix, in particular the parameters of the reverberation matrix.

Cette minimisation est effectuée par itérations avec des règles de mise à jour multiplicatives, qui sont successivement appliquées à chacun des paramètres des matrices. Pour les matrices de la composante vocale intégrant une réverbération, on a : R R V V ^ rev β - 2 * V ^ x t V ^ rev β - 1 * V ^ x t

Figure imgb0019
H F 0 H F 0 W F 0 T W K H K R * V V ^ rev β - 2 t W F 0 T W K H K R * V ^ rev β - 1 t
Figure imgb0020
H K H K W K T W F 0 H F 0 R * V V ^ rev β - 2 t W K T W F 0 H F 0 R * V ^ rev β - 1 t
Figure imgb0021
W K W K W F 0 H F 0 R * V V ^ rev β - 2 t H K T W F 0 H F 0 R * V ^ rev β - 1 t H K T
Figure imgb0022
où * t désigne l'opérateur de convolution ligne par ligne tel que défini ci-dessus.This minimization is performed by iterations with multiplicative updating rules, which are successively applied to each of the parameters of the matrices. For the matrices of the voice component integrating a reverberation, we have: R R V V ^ rev β - 2 * V ^ x t V ^ rev β - 1 * V ^ x t
Figure imgb0019
H F 0 H F 0 W F 0 T W K H K R * V V ^ rev β - 2 t W F 0 T W K H K R * V ^ rev β - 1 t
Figure imgb0020
H K H K W K T W F 0 H F 0 R * V V ^ rev β - 2 t W K T W F 0 H F 0 R * V ^ rev β - 1 t
Figure imgb0021
W K W K W F 0 H F 0 R * V V ^ rev β - 2 t H K T W F 0 H F 0 R * V ^ rev β - 1 t H K T
Figure imgb0022
where * t denotes the line-by-line convolution operator as defined above.

Pour la composante musicale de fond sonore, on a, comme dans la première partie du procédé : H R H R W R T V V ^ rev β - 2 W R T V ^ rev β - 1

Figure imgb0023
W R W R V V ^ rev β - 2 H R T V ^ rev β - 1 H R T
Figure imgb0024
For the musical background component, we have, as in the first part of the process: H R H R W R T V V ^ rev β - 2 W R T V ^ rev β - 1
Figure imgb0023
W R W R V V ^ rev β - 2 H R T V ^ rev β - 1 H R T
Figure imgb0024

Comme indiqué ci-dessus, les règles de mise à jour sont élaborées à partir de la dérivée partielle de la fonction de coût C par rapport à chaque paramètre pertinent. Elles dépendent donc de la forme retenue pour la fonction de coût, notamment la divergence utilisée dans cette fonction de coût. Les règles ci-dessus sont donc spécifiques de l'utilisation d'une béta-divergence.As indicated above, the update rules are developed from the partial derivative of the cost function C with respect to each relevant parameter. They therefore depend on the form chosen for the cost function, in particular the divergence used in this cost function. The rules above are therefore specific to the use of beta-divergence.

Il est à noter que, puisque ces règles résultent chacune d'une dérivation partielle selon un paramètre spécifique, la règle de mise à jour de la matrice de réverbération R est générale au sens où elle ne dépend pas de la modélisation retenue pour le spectrogramme x du signal pure ou celle du spectrogramme z de fond sonore.It should be noted that, since these rules each result from a partial derivation according to a specific parameter, the update rule of the reverberation matrix R is general in the sense that it does not depend on the modeling chosen for the spectrogram V x of the pure signal or that of the background sound spectrogram V z .

En ce qui concerne la matrice H F0 , les itérations partent de la matrice H' F0 déterminée dans la première partie du procédé. Il est à noter que, puisque les règles de mise à jour sont multiplicatives, les coefficients de la matrice H F0 fixés initialement à 0 resteront à 0 au cours de la minimisation de la fonction de coût dans la seconde partie du procédé.As regards the matrix H F 0 , the iterations start from the matrix H ' F 0 determined in the first part of the method. It should be noted that, since the update rules are multiplicative, the coefficients of the matrix H F 0 initially set to 0 will remain at 0 during the minimization of the cost function in the second part of the method.

Les autres paramètres de la modélisation et, en particulier ceux du spectrogramme de la contribution spécifique réverbérée rev,y sont initialisés avec des valeurs aléatoires.The other parameters of the modeling and, in particular those of the spectrogram of the specific reverberant contribution V rev, are initialized with random values.

Lorsque la distance entre le spectrogramme de mélange V et le spectrogramme estimé = rev,y + z est inférieure à un seuil prédéterminé ou lorsqu'un nombre d'itérations limite fixé à l'avance est atteint, le procédé sort de la boucle d'itération et les valeurs des matrices obtenues, R, H F0 , WK, HK, WR et HR, sont les valeurs finales.When the distance between the mixing spectrogram V and the estimated spectrogram V = V rev, y + V z is less than a predetermined threshold or when a number iterative limit iterations is reached, the process goes out of the iteration loop and the values of the matrices obtained, R, H F 0 , W K , H K , W R and H R , are the final values .

A l'étape 230, des traitements adaptés classiques (en particulier un traitement du type filtrage de Wiener) sont appliqués sur les spectrogrammes précédents pour obtenir notamment les spectrogrammes d'intérêt x, V̂z . Puis, à l'étape 240, une transformation inverse de celle de l'étape 110 est réalisée sur ces spectrogrammes pour obtenir les signaux de sorties, signal vocal pur x(t) et signal musical z(t).In step 230, conventional adapted processes (in particular a Wiener filtering type treatment) are applied to the above spectrograms to obtain in particular the spectrograms of interest V x , V z . Then, in step 240, an inverse transformation of that of step 110 is performed on these spectrograms to obtain the output signals, pure speech signal x (t) and musical signal z (t).

Dans les modes de réalisation décrits ici en détail, ces signaux acoustiques sont des signaux monophoniques. En variante, ces signaux sont stéréophoniques. Plus généralement encore, ils sont multicanaux. L'homme du métier sait comment adapter à des signaux stéréophoniques ou multicanaux les traitements présentés pour le cas de signaux monophoniques.In the embodiments described here in detail, these acoustic signals are monophonic signals. In a variant, these signals are stereophonic. More generally, they are multichannel. The person skilled in the art knows how to adapt to stereophonic or multichannel signals the treatments presented for the case of monophonic signals.

Le mode de réalisation préféré est relatif à une composante spécifique ou d'intérêt qui est une composante vocale. Cependant, la modélisation de la réverbération d'une composante est générale et s'applique à tout type de composante. En particulier, la composante de fond sonore peut également être affectée par une réverbération.The preferred embodiment relates to a specific component or interest component that is a voice component. However, the modeling of the reverberation of a component is general and applies to any type of component. In particular, the background sound component can also be affected by reverberation.

De plus, n'importe quel type de modélisations non-négatives des spectrogrammes des sons non réverbérés peut également être utilisées, en lieu et place de celles utilisées ci-dessus.In addition, any type of non-negative non-reverberated sound spectrograms may also be used instead of those used above.

Par ailleurs, dans le mode de réalisation présenté ci-dessus, le mélange comporte deux composantes. La généralisation à un nombre quelconque de composantes est directe.Moreover, in the embodiment presented above, the mixture comprises two components. Generalization to any number of components is straightforward.

Des tests comparatifs ont été menés afin de comparer les résultats de la mise en oeuvre du présent procédé :

  • le premier procédé est une séparation, fondée sur une méthode de type NMF, sans inclure de modélisation sur la réverbération ;
  • le second procédé est une séparation selon le procédé décrit ci-dessus, c'est-à-dire incluant une modélisation de la réverbération du signal vocal ; et,
  • le troisième procédé est une limité mathématique théorique.
Comparative tests were conducted to compare the results of the implementation of the present process:
  • the first method is a separation, based on an NMF type method, without including modeling on the reverberation;
  • the second method is a separation according to the method described above, that is to say including a modeling of the reverberation of the voice signal; and,
  • the third method is a theoretical mathematical limit.

Afin de quantifier les résultats obtenus pour les différents procédés, des indicateurs standards du domaine de la séparation de sources ont été calculés. Ces indicateurs sont le rapport signal sur distorsion SDR (selon l'acronyme anglais « Signal to Distorsion Ratio »), et qui correspond à un test quantitatif ; le rapport signal sur artefact SAR (selon l'acronyme « Signal to Artefact Ratio »), et qui correspond aux artefacts dans les composantes séparées ; et le rapport signal sur interférence SIR (selon l'acronyme anglais « Signal to Interference Ratio »), et qui correspond aux interférences résiduelles entre les composantes séparées.In order to quantify the results obtained for the various processes, standard indicators of the field of source separation have been calculated. These indicators are the signal-to-distortion ratio (SDR), which corresponds to a quantitative test; the Signal to Artefact SAR (Signal to Artefact Ratio) ratio, which corresponds to the artifacts in the separate components; and the signal-to-interference ratio SIR (according to the acronym English "Signal to Interference Ratio"), which corresponds to the residual interferences between the separate components.

Les résultats sont présentés sur les figures 2 pour le signal vocal et la figure 3 pour le signal musical.The results are presented on figures 2 for the voice signal and the figure 3 for the musical signal.

Le procédé selon l'invention améliore donc les résultats obtenus, quelle que soit la manière de les analyser.The method according to the invention therefore improves the results obtained, whatever the way of analyzing them.

Claims (12)

Procédé de séparation (100), dans un signal acoustique de mélange w(t), d'une contribution spécifique pure, affectée par de la réverbération, et d'une contribution de fond sonore,
caractérisé en ce qu'il consiste à séparer la contribution spécifique pure x(t) et la contribution de fond sonore z(t),
en utilisant un spectrogramme de modélisation du signal acoustique de mélange rev correspondant à la somme d'un spectrogramme d'une contribution spécifique réverbérée rev,y et d'un spectrogramme de la contribution de fond sonore z , le spectrogramme de la contribution spécifique réverbérée dépendant du spectrogramme de la contribution spécifique pure x selon le modèle : V ^ f , t rev , y = τ = 1 T V ^ f , t - τ + 1 x R f , τ
Figure imgb0025

R est une matrice FxT de réverbération, f est un indice de fréquence, t est un indice de temps, et τ un entier entre 1 et T ; et
en calculant de manière itérative une estimation du spectrogramme de la contribution de fond sonore z , du spectrogramme de la contribution spécifique pure x et de la matrice de réverbération R de manière à minimiser une fonction de coût (C) entre un spectrogramme du signal de mélange V et le spectrogramme de modélisation du signal de mélange rev.
A method of separating (100), in a mixing acoustic signal w (t), a pure specific contribution, affected by reverberation, and a background noise contribution,
characterized by separating the pure specific contribution x (t) and the background noise contribution z ( t ),
by using a spectral sound mixing modeling spectrogram V rev corresponding to the sum of a spectrogram of a specific reverberant contribution V rev, y and a spectrogram of the background noise contribution V z , the spectrogram of the contribution specific reverberated dependent spectrogram of pure specific contribution V x according to the model: V ^ f , t rev , there = Σ τ = 1 T V ^ f , t - τ + 1 x R f , τ
Figure imgb0025

where R is a reverberation matrix FxT , f is a frequency index, t is a time index, and τ is an integer between 1 and T ; and
by iteratively calculating an estimate of the spectrogram of the background noise contribution V z , the spectrogram of the pure specific contribution V x and the reverberation matrix R so as to minimize a cost function ( C ) between a signal spectrogram of mixture V and the spectrogram of modeling of the mixing signal V rev .
Procédé selon la revendication 1, caractérisé en ce que la fonction de coût (C) utilise une divergence (d) entre le spectrogramme du signal de mélange et le spectrogramme de modélisation du signal de mélange, notamment la divergence dite beta-divergence définie par : d β a | b = { 1 β β - 1 a β + β - 1 b β - βab β - 1 , β R 0 1 a log a b - a + b , β = 1 a b - log a b - 1 , β = 0
Figure imgb0026

a et b sont deux scalaires réels positifs.
Method according to Claim 1, characterized in that the cost function ( C ) uses a divergence ( d ) between the spectrogram of the mixing signal and the mixing signal modeling spectrogram, in particular the so-called beta-divergence divergence defined by: d β at | b = { 1 β β - 1 at β + β - 1 b β - βab β - 1 , β R 0 1 at log at b - at + b , β = 1 at b - log at b - 1 , β = 0
Figure imgb0026

where a and b are two real positive scalars.
Procédé selon la revendication 2, caractérisé en ce que la minimisation de la fonction de coût met en oeuvre, pour obtenir une estimation de la matrice de réverbération, des règles de mise à jour multiplicatives du type : R R V V ^ rev β - 2 * V ^ x t V ^ rev β - 1 * V ^ x t
Figure imgb0027

Avec rev = rev,y + z ; et où ⊙ est un opérateur correspondant au produit terme à terme entre matrices (ou vecteur) ; .⊙(.) est un opérateur correspondant à l'exponentiation terme à terme d'une matrice par un scalaire ; * t est un opérateur de convolution temporelle entre deux matrices défini par A * t B f , τ = τ = t T A f , τ B f , τ - t + 1 .
Figure imgb0028
Method according to claim 2, characterized in that the minimization of the cost function uses, to obtain an estimate of the reverberation matrix, multiplicative update rules of the type: R R V V ^ rev β - 2 * V ^ x t V ^ rev β - 1 * V ^ x t
Figure imgb0027

With V rev = V rev, y + V z ; and where ⊙ is an operator corresponding to the term term product between matrices (or vector); .⊙ (.) Is an operator corresponding to the term-exponentiation of a matrix by a scalar; * t is a time convolution operator between two matrices defined by AT * t B f , τ = Σ τ = t T AT f , τ B f , τ - t + 1 .
Figure imgb0028
Procédé selon l'une quelconque des revendications précédentes, caractérisé en ce que, la contribution spécifique pure étant une contribution vocale, le spectrogramme de la contribution spécifique pure x est modélisé par : V ^ x = W F 0 H F 0 W K H K
Figure imgb0029

W F0 est une matrice d'atomes harmoniques prédéfinie, H F0 est une matrice d'activation des atomes harmoniques de la matrice W F0 , WK est une matrice d'atomes de filtrage, HK est une matrice d'activation des atomes de filtrage de la matrice WK, et où ⊙ est un opérateur correspondant au produit terme à terme entre matrices.
Method according to one of the preceding claims, characterized in that , the pure specific contribution being a vocal contribution, the spectrogram of the pure specific contribution V x is modeled by: V ^ x = W F 0 H F 0 W K H K
Figure imgb0029

where W F 0 is a predefined harmonic matrix of atoms, H F 0 is an activation matrix of the harmonic atoms of the matrix W F 0 , W K is a matrix of filtering atoms, H K is a matrix of activation of the filter atoms of the matrix W K , and where ⊙ is an operator corresponding to the term term product between matrices.
Procédé selon la revendication 3 et la revendication 4, caractérisé en ce que la minimisation de la fonction de coût met en oeuvre des règles de mise à jour multiplicatives du type : H F 0 H F 0 W F 0 T W K H K R * V V ^ rev β - 2 t W F 0 T W K H K R * V ^ rev β - 1 t
Figure imgb0030
H K H K W K T W F 0 H F 0 R * V V ^ rev β - 2 t W K T W F 0 H F 0 R * V ^ rev β - 1 t
Figure imgb0031
W K W K W F 0 H F 0 R * V V ^ rev β - 2 t H K T W F 0 H F 0 R * V ^ rev β - 1 t H K T
Figure imgb0032

Avec rev = rev'y + z ; et où ⊙ est un opérateur correspondant au produit terme à terme entre matrices (ou vecteur) ; .⊙(.) est un opérateur correspondant à l'exponentiation terme à terme d'une matrice par un scalaire ; (.) T est la transposée d'une matrice ; *t est un opérateur de convolution temporelle entre deux matrices défini par A * t B f , τ =
Figure imgb0033
τ = t T A f , τ B f , τ - t + 1 .
Figure imgb0034
Method according to Claim 3 and Claim 4, characterized in that the minimization of the cost function uses multiplicative updating rules of the type: H F 0 H F 0 W F 0 T W K H K R * V V ^ rev β - 2 t W F 0 T W K H K R * V ^ rev β - 1 t
Figure imgb0030
H K H K W K T W F 0 H F 0 R * V V ^ rev β - 2 t W K T W F 0 H F 0 R * V ^ rev β - 1 t
Figure imgb0031
W K W K W F 0 H F 0 R * V V ^ rev β - 2 t H K T W F 0 H F 0 R * V ^ rev β - 1 t H K T
Figure imgb0032

With V rev = V rev'y + V z ; and where ⊙ is an operator corresponding to the term term product between matrices (or vector); .⊙ (.) Is an operator corresponding to the term-exponentiation of a matrix by a scalar; (.) T is the transpose of a matrix; * t is a time convolution operator between two matrices defined by AT * t B f , τ =
Figure imgb0033
Σ τ = t T AT f , τ B f , τ - t + 1 .
Figure imgb0034
Procédé selon l'une quelconque des revendications précédentes, caractérisé en ce que le spectrogramme de la contribution de fond sonore z est modélisé par une factorisation en matrices non-négatives : V ^ z = W R H R
Figure imgb0035

WR est une matrice de modèles spectraux élémentaires et HR est une matrice d'activation des modèles spectraux élémentaires de la matrice WR.
Method according to any one of the preceding claims, characterized in that the spectrogram of the background noise contribution V z is modeled by a factorization in non-negative matrices: V ^ z = W R H R
Figure imgb0035

where W R is a matrix of elementary spectral models and H R is an activation matrix of the elementary spectral models of the matrix W R.
Procédé selon la revendication 2 et la revendication 6, caractérisé en ce que la minimisation de la fonction de coût met en oeuvre des règles de mise à jour multiplicatives du type : H R H R W R T V V ^ rev β - 2 W R T V ^ rev β - 1
Figure imgb0036
W R W R V V ^ rev β - 2 H R T V ^ rev β - 1 H R T
Figure imgb0037

Avec rev = rev,y + Z ; et où ⊙ est un opérateur correspondant au produit terme à terme entre matrices (ou vecteur) ; .⊙(.) est un opérateur correspondant à l'exponentiation terme à terme d'une matrice par un scalaire ; (.) T est la transposée d'une matrice.
Method according to Claim 2 and Claim 6, characterized in that the minimization of the cost function implements multiplicative updating rules of the type: H R H R W R T V V ^ rev β - 2 W R T V ^ rev β - 1
Figure imgb0036
W R W R V V ^ rev β - 2 H R T V ^ rev β - 1 H R T
Figure imgb0037

With V rev = V rev, y + V Z ; and where ⊙ is an operator corresponding to the term term product between matrices (or vector); .⊙ (.) Is an operator corresponding to the term-exponentiation of a matrix by a scalar; (.) T is the transpose of a matrix.
Procédé selon l'une quelconque des revendications précédentes, caractérisé en ce que, la séparation de la contribution spécifique pure x(t) et la contribution de fond sonore z(t) en utilisant un spectrogramme de modélisation du signal acoustique de mélange rev constituant une seconde partie du procédé, celui-ci comporte une première partie consistant à séparer, dans le signal acoustique de mélange w(t), une contribution spécifique et une contribution du fond sonore, sans tenir compte de la réverbération, des paramètres d'initialisation parmi les paramètres obtenus à l'issue de la première partie du procédé étant utilisés comme valeur initiale des paramètres correspondants dans le spectrogramme de la contribution spécifique réverbérée rev,y de la seconde partie du procédé.Method according to one of the preceding claims, characterized in that , the separation of the pure specific contribution x (t) and the background noise contribution z ( t ) by using a spectroscopy of modeling the acoustic mixing signal V rev constituting a second part of the method, the latter comprises a first part consisting in separating, in the acoustic mixing signal w (t), a specific contribution and a contribution from the background, without taking into account the reverberation, of the initialization parameters among the parameters obtained at the end of the first part of the process being used as the initial value of the corresponding parameters in the spectrogram of the specific reverberant contribution V rev, y of the second part of the process. Procédé selon la revendication 8, caractérisé en ce que la première partie comporte la minimisation d'une fonction de coût mettant en oeuvre un algorithme similaire à celui mis en oeuvre dans la seconde partie.Method according to claim 8, characterized in that the first part comprises the minimization of a cost function implementing an algorithm similar to that implemented in the second part. Procédé selon la revendication 2 et la revendication 9, caractérisé en ce que, pour la minimisation de la fonction de coût, la première partie du procédé met en oeuvre des règles de mise à jour multiplicatives du type : H F 0 H F 0 W F 0 T W K H K V V ^ β - 2 W F 0 T W K H K V ^ β - 1
Figure imgb0038
H K H K W K T W F 0 H F 0 V V ^ β - 2 W K T W F 0 H F 0 V ^ β - 1
Figure imgb0039
W K W K W F 0 H F 0 V V ^ β - 2 H K T W F 0 H F 0 V ^ β - 1 H K T
Figure imgb0040
H R H R W R T V V ^ β - 2 W R T V ^ β - 1
Figure imgb0041
W R W R V V ^ β - 2 H R T V ^ β - 1 H R T
Figure imgb0042

avec : V̂ = V̂x + Z, V̂z = (WRHR ), et x = (W F0 H F0 )⊙(WKHK ) ; où WR est une matrice de modèles spectraux élémentaires et HR est une matrice d'activation des modèles spectraux élémentaires de la matrice WR ; où W F0 est une matrice d'atomes harmoniques prédéfinie, H F0 est une matrice d'activation des atomes harmoniques de la matrice W F0 , WK est une matrice d'atomes de filtrage, HK est une matrice d'activation des atomes de filtrage de la matrice WK ; et où ⊙ est un opérateur correspondant au produit terme à terme entre matrices (ou vecteur) ; .⊙(.) est un opérateur correspondant à l'exponentiation terme à terme d'une matrice par un scalaire ; (.) T est la transposée d'une matrice.
Method according to Claim 2 and Claim 9, characterized in that , for the purpose of minimizing the cost function, the first part of the method implements multiplicative updating rules of the type: H F 0 H F 0 W F 0 T W K H K V V ^ β - 2 W F 0 T W K H K V ^ β - 1
Figure imgb0038
H K H K W K T W F 0 H F 0 V V ^ β - 2 W K T W F 0 H F 0 V ^ β - 1
Figure imgb0039
W K W K W F 0 H F 0 V V ^ β - 2 H K T W F 0 H F 0 V ^ β - 1 H K T
Figure imgb0040
H R H R W R T V V ^ β - 2 W R T V ^ β - 1
Figure imgb0041
W R W R V V ^ β - 2 H R T V ^ β - 1 H R T
Figure imgb0042

with: V = V x + V Z , V z = ( W R H R ), and V x = ( W F 0 H F 0 ) ⊙ ( W K H K ); where W R is a matrix of elementary spectral models and H R is an activation matrix of the elementary spectral models of the matrix W R ; where W F 0 is a predefined harmonic matrix of atoms, H F 0 is an activation matrix of the harmonic atoms of the matrix W F 0 , W K is a matrix of filtering atoms, H K is a matrix of activation of the filter atoms of the matrix W K ; and where ⊙ is an operator corresponding to the term term product between matrices (or vector); .⊙ (.) Is an operator corresponding to the term-exponentiation of a matrix by a scalar; (.) T is the transpose of a matrix.
Procédé selon l'une quelconque des revendications 8 à 10, caractérisé en ce qu'il comporte, dans la première partie du procédé, à la suite de la minimisation de la fonction de coût, l'application d'un algorithme de suivi du maximum de puissance dans la matrice d'activation de la contribution spécifique H F0 , ledit algorithme étant de préférence du type algorithme de Viterbi, puis la mise à zéro de tous les termes de la matrice d'activation de la contribution spécifique H F0 qui sont trop éloignés du maximum de puissance trouvé, les termes de la matrice d'activation de la contribution spécifique H F0 constituant les paramètres d'initialisation qui sont utilisés comme valeur initiale des paramètres correspondants dans le spectrogramme de la contribution spécifique réverbérée rev,y de la seconde partie du procédé, les autres paramètres du spectrogramme de la contribution spécifique réverbérée rev,y étant initialisés avec des valeurs aléatoires.Process according to any one of Claims 8 to 10, characterized in that it comprises, in the first part of the process, following the minimization of the cost function, the application of a maximum tracking algorithm of power in the activation matrix of the specific contribution H F 0 , said algorithm preferably being of the Viterbi algorithm type, then the zeroing of all the terms of the activation matrix of the specific contribution H F 0 which are too far from the maximum power found, the terms of the activation matrix of the specific contribution H F 0 constituting the initialization parameters which are used as the initial value of the corresponding parameters in the spectrogram of the specific reverberated contribution V rev, y of the second part of the process, the other parameters of the spectrogram of the specific reverberant contribution V rev, being initialized with random values. Produit programme d'ordinateur, caractérisé en ce qu'il comporte des instructions propres à être stockées dans la mémoire d'un calculateur et exécutées par le processeur dudit calculateur pour mettre en oeuvre un procédé de séparation conforme à l'une quelconque des revendications précédentes.Computer program product, characterized in that it comprises instructions adapted to be stored in the memory of a computer and executed by the processor of said computer to implement a separation method according to any one of the preceding claims. .
EP15198713.8A 2014-12-31 2015-12-09 Improved method of separation and computer program product Not-in-force EP3040989B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US14/984,089 US9711165B2 (en) 2014-12-31 2015-12-30 Process and associated system for separating a specified audio component affected by reverberation and an audio background component from an audio mixture signal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
FR1463482A FR3031225B1 (en) 2014-12-31 2014-12-31 IMPROVED SEPARATION METHOD AND COMPUTER PROGRAM PRODUCT

Publications (2)

Publication Number Publication Date
EP3040989A1 true EP3040989A1 (en) 2016-07-06
EP3040989B1 EP3040989B1 (en) 2018-10-17

Family

ID=53541694

Family Applications (1)

Application Number Title Priority Date Filing Date
EP15198713.8A Not-in-force EP3040989B1 (en) 2014-12-31 2015-12-09 Improved method of separation and computer program product

Country Status (3)

Country Link
US (1) US9711165B2 (en)
EP (1) EP3040989B1 (en)
FR (1) FR3031225B1 (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR3013885B1 (en) * 2013-11-28 2017-03-24 Audionamix METHOD AND SYSTEM FOR SEPARATING SPECIFIC CONTRIBUTIONS AND SOUND BACKGROUND IN ACOUSTIC MIXING SIGNAL
EP3507993B1 (en) 2016-08-31 2020-11-25 Dolby Laboratories Licensing Corporation Source separation for reverberant environment
EP3324407A1 (en) * 2016-11-17 2018-05-23 Fraunhofer Gesellschaft zur Förderung der Angewand Apparatus and method for decomposing an audio signal using a ratio as a separation characteristic
EP3324406A1 (en) 2016-11-17 2018-05-23 Fraunhofer Gesellschaft zur Förderung der Angewand Apparatus and method for decomposing an audio signal using a variable threshold
EP3573058B1 (en) * 2018-05-23 2021-02-24 Harman Becker Automotive Systems GmbH Dry sound and ambient sound separation
US11546689B2 (en) * 2020-10-02 2023-01-03 Ford Global Technologies, Llc Systems and methods for audio processing

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5195652B2 (en) * 2008-06-11 2013-05-08 ソニー株式会社 Signal processing apparatus, signal processing method, and program
US20130282372A1 (en) * 2012-04-23 2013-10-24 Qualcomm Incorporated Systems and methods for audio signal processing
US9549253B2 (en) * 2012-09-26 2017-01-17 Foundation for Research and Technology—Hellas (FORTH) Institute of Computer Science (ICS) Sound source localization and isolation apparatuses, methods and systems

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
JEAN-LOUIS DURRIEU ET AL.: "An iterative approach to monaural musical mixture de-soloing", INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP),TAIPEI, TAIWAN, April 2009 (2009-04-01), pages 105 - 108, XP031459177
JEAN-LOUIS DURRIEU ET AL: "An iterative approach to monaural musical mixture de-soloing", ACOUSTICS, SPEECH AND SIGNAL PROCESSING, 2009. ICASSP 2009. IEEE INTERNATIONAL CONFERENCE ON, IEEE, PISCATAWAY, NJ, USA, 19 April 2009 (2009-04-19), pages 105 - 108, XP031459177, ISBN: 978-1-4244-2353-8 *
NGOC Q. K. DUONG; EMMANUEL VINCENT; REMI GRIBONVAL: "Underdetermined réverbérant audio source séparation using a full-rank spatial covariance model", IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, vol. 18, no. 7, September 2010 (2010-09-01), pages 1830 - 1840
RITA SINGH ET AL: "Latent-variable decomposition based dereverberation of monaural and multi-channel signals", ACOUSTICS SPEECH AND SIGNAL PROCESSING (ICASSP), 2010 IEEE INTERNATIONAL CONFERENCE ON, IEEE, PISCATAWAY, NJ, USA, 14 March 2010 (2010-03-14), pages 1914 - 1917, XP031697281, ISBN: 978-1-4244-4295-9 *
RITA SINGH; BHIKSHA RAJ; PARIS SMARAGDIS: "Latent-variable décomposition based dereverberation of monaural and multi-channel signals", IEEE INTERNATIONAL CONFÉRENCE ON AUDIO AND SPEECH SIGNAL PROCESSING, DALLAS, TEXAS, USA, March 2010 (2010-03-01)

Also Published As

Publication number Publication date
US20160189731A1 (en) 2016-06-30
FR3031225A1 (en) 2016-07-01
EP3040989B1 (en) 2018-10-17
FR3031225B1 (en) 2018-02-02
US9711165B2 (en) 2017-07-18

Similar Documents

Publication Publication Date Title
EP3040989B1 (en) Improved method of separation and computer program product
Kilgour et al. Fr\'echet Audio Distance: A Metric for Evaluating Music Enhancement Algorithms
Kinoshita et al. A summary of the REVERB challenge: state-of-the-art and remaining challenges in reverberant speech processing research
Smaragdis Convolutive speech bases and their application to supervised speech separation
Mandel et al. Model-based expectation-maximization source separation and localization
WO2005106852A1 (en) Improved voice signal conversion method and system
US10614827B1 (en) System and method for speech enhancement using dynamic noise profile estimation
US20140037095A1 (en) System and method of processing a sound signal including transforming the sound signal into a frequency-chirp domain
Zhang et al. Real and imaginary modulation spectral subtraction for speech enhancement
KR20130108391A (en) Method, apparatus and machine-readable storage medium for decomposing a multichannel audio signal
WO2005106853A1 (en) Method and system for the quick conversion of a voice signal
Fitzgerald et al. Projet—spatial audio separation using projections
Wisdom et al. Enhancement and recognition of reverberant and noisy speech by extending its coherence
JP5580585B2 (en) Signal analysis apparatus, signal analysis method, and signal analysis program
WO2004088633A1 (en) Method for analyzing fundamental frequency information and voice conversion method and system implementing said analysis method
Islam et al. Supervised single channel speech enhancement based on stationary wavelet transforms and non-negative matrix factorization with concatenated framing process and subband smooth ratio mask
Chen et al. A dual-stream deep attractor network with multi-domain learning for speech dereverberation and separation
FR3013885A1 (en) METHOD AND SYSTEM FOR SEPARATING SPECIFIC CONTRIBUTIONS AND SOUND BACKGROUND IN ACOUSTIC MIXING SIGNAL
Mirsamadi et al. Multichannel speech dereverberation based on convolutive nonnegative tensor factorization for ASR applications.
EP3025342B1 (en) Method for suppressing the late reverberation of an audible signal
Zheng et al. Noise-robust blind reverberation time estimation using noise-aware time–frequency masking
Li et al. Jointly Optimizing Activation Coefficients of Convolutive NMF Using DNN for Speech Separation.
Gaultier Design and evaluation of sparse models and algorithms for audio inverse problems
Valin et al. To dereverb or not to dereverb? Perceptual studies on real-time dereverberation targets
Liu et al. Speech enhancement of instantaneous amplitude and phase for applications in noisy reverberant environments

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME

17P Request for examination filed

Effective date: 20161208

RBV Designated contracting states (corrected)

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

INTG Intention to grant announced

Effective date: 20180629

GRAS Grant fee paid

Free format text: ORIGINAL CODE: EPIDOSNIGR3

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

REG Reference to a national code

Ref country code: GB

Ref legal event code: FG4D

Free format text: NOT ENGLISH

REG Reference to a national code

Ref country code: CH

Ref legal event code: EP

REG Reference to a national code

Ref country code: IE

Ref legal event code: FG4D

Free format text: LANGUAGE OF EP DOCUMENT: FRENCH

REG Reference to a national code

Ref country code: DE

Ref legal event code: R096

Ref document number: 602015018225

Country of ref document: DE

Ref country code: AT

Ref legal event code: REF

Ref document number: 1054923

Country of ref document: AT

Kind code of ref document: T

Effective date: 20181115

REG Reference to a national code

Ref country code: NL

Ref legal event code: MP

Effective date: 20181017

REG Reference to a national code

Ref country code: LT

Ref legal event code: MG4D

REG Reference to a national code

Ref country code: AT

Ref legal event code: MK05

Ref document number: 1054923

Country of ref document: AT

Kind code of ref document: T

Effective date: 20181017

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: NL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20181017

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: AT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20181017

Ref country code: PL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20181017

Ref country code: HR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20181017

Ref country code: BG

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20190117

Ref country code: LT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20181017

Ref country code: ES

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20181017

Ref country code: IS

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20190217

Ref country code: NO

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20190117

Ref country code: FI

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20181017

Ref country code: LV

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20181017

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: GR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20190118

Ref country code: RS

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20181017

Ref country code: SE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20181017

Ref country code: AL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20181017

Ref country code: PT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20190217

REG Reference to a national code

Ref country code: DE

Ref legal event code: R097

Ref document number: 602015018225

Country of ref document: DE

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: DK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20181017

Ref country code: CZ

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20181017

Ref country code: IT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20181017

REG Reference to a national code

Ref country code: CH

Ref legal event code: PL

PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: RO

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20181017

Ref country code: LU

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20181209

Ref country code: EE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20181017

Ref country code: SM

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20181017

Ref country code: SK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20181017

Ref country code: MC

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20181017

REG Reference to a national code

Ref country code: IE

Ref legal event code: MM4A

26N No opposition filed

Effective date: 20190718

REG Reference to a national code

Ref country code: BE

Ref legal event code: MM

Effective date: 20181231

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: SI

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20181017

Ref country code: IE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20181209

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: BE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20181231

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: CH

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20181231

Ref country code: LI

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20181231

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: MT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20181017

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: DE

Payment date: 20191210

Year of fee payment: 5

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: FR

Payment date: 20191203

Year of fee payment: 5

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: TR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20181017

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: MK

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20181017

Ref country code: HU

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT; INVALID AB INITIO

Effective date: 20151209

Ref country code: CY

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20181017

REG Reference to a national code

Ref country code: DE

Ref legal event code: R119

Ref document number: 602015018225

Country of ref document: DE

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: FR

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20201231

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: DE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20210701

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: GB

Payment date: 20211117

Year of fee payment: 7

GBPC Gb: european patent ceased through non-payment of renewal fee

Effective date: 20221209

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: GB

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20221209