EP2901447B1 - Method and device for separating signals by minimum variance spatial filtering under linear constraint - Google Patents

Method and device for separating signals by minimum variance spatial filtering under linear constraint Download PDF

Info

Publication number
EP2901447B1
EP2901447B1 EP13770877.2A EP13770877A EP2901447B1 EP 2901447 B1 EP2901447 B1 EP 2901447B1 EP 13770877 A EP13770877 A EP 13770877A EP 2901447 B1 EP2901447 B1 EP 2901447B1
Authority
EP
European Patent Office
Prior art keywords
signal
particular source
mixed
mixed signal
signals
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Not-in-force
Application number
EP13770877.2A
Other languages
German (de)
French (fr)
Other versions
EP2901447A1 (en
Inventor
Sylvain Marchand
Stanislaw GORLOW
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Centre National de la Recherche Scientifique CNRS
Universite des Sciences et Tech (Bordeaux 1)
Original Assignee
Centre National de la Recherche Scientifique CNRS
Universite des Sciences et Tech (Bordeaux 1)
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Centre National de la Recherche Scientifique CNRS, Universite des Sciences et Tech (Bordeaux 1) filed Critical Centre National de la Recherche Scientifique CNRS
Publication of EP2901447A1 publication Critical patent/EP2901447A1/en
Application granted granted Critical
Publication of EP2901447B1 publication Critical patent/EP2901447B1/en
Not-in-force legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/018Audio watermarking, i.e. embedding inaudible data in the audio signal
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • G10L21/028Voice signal separating using properties of sound source
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • G10L21/0308Voice signal separating characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques

Definitions

  • the present invention relates to a method for separating some of the source signals comprising a digital audio overall signal.
  • the invention also relates to a device for implementing this method.
  • Signal mixing consists of summing several signals, called source signals, to obtain one or more composite signals, called mixed signals.
  • the mixing may consist of a simple step of adding the source signals or may also include signal filtering steps before and / or after the addition.
  • the source signals can be mixed differently to form two mixed signals corresponding to the two channels or channels (left and right) of a stereo signal.
  • Separation of sources consists of estimating source signals from the observation of a certain number of different mixed signals formed from these same source signals.
  • the objective is generally to enhance, if possible to completely extract one or more target source signals.
  • the separation of sources is particularly difficult in so-called "under-determined” cases in which there is a number of mixed signals less than the number of source signals present in the mixed signals. The extraction is in this case very difficult or impossible because of the small amount of information available in these mixed signals compared to that present in the source signals.
  • Music signals on compact-disc audio are a particularly representative example because only two stereo channels (ie two mixed left and right signals), which are generally very redundant, are available for a large number of potential speakers. source signals.
  • blind separation is the most general form, in which no information on the source signals nor on the nature of the mixed signals is known a priori.
  • We then make a number of assumptions about these source signals and the mixed signals for example that the source signals are statistically independent
  • we estimate the parameters of a separation system by maximizing a criterion based on these hypotheses (for example maximizing the independence of the signals obtained by the separation device).
  • this method is generally used in cases where there are many mixed signals (at least as much as source signals) and is therefore not applicable to under-determined cases in which the number of mixed signals is less than number of source signals.
  • Computational auditory scene analysis usually consists of modeling partial source signals, but the mixed signal is not explicitly decomposed. This method is based on the mechanisms of the human auditory system to separate the source signals in the same way that our ear does. These include: DPW Ellis, Using knowledge to organize sound: The prediction-driven approach to computational auditory scene analysis, and its application to speechlnon-speech mixture (Speech Communication, 27 (3), pp. 281-298, 1999 ) D. Godsmark and GJBrown, A blackboard architecture for computational auditory scene analysis (Speech Communication, 27 (3), pp. 351-366, 1999 ), in the same way T. Kinoshita, S. Sakai, and H. Tanaka, Musical sound source identification based on frequency adaptation (In Proc. IJCAI Workshop on CASA, pp. 18-24, 1999 ). However, computational auditory scene analysis currently leads to insufficient results in terms of the quality of the separate source signals.
  • Another form of separation relies on a decomposition of the mixture on the basis of suitable functions.
  • Another method of source separation is “informed” source separation: information relating to one or more source signals is transmitted with the mixed signal to the decoder.
  • the decoder is then able, from algorithms and information, to at least partially separate at least one source signal of the mixed signal.
  • An example of informed source separation is described by M. Parvaix and L. Girin (IEEE trans. Audio Speech Lang. Process., Volume 19, pages 1721-1733, August 2011). ).
  • the information transmitted to the decoder indicates in particular the two predominant source signals in the mixed signal, for different frequency zones.
  • such a method is not always suitable when there are more than two source signals contributing simultaneously in the same frequency zone of the mixed signal: in this case, at least one source signal is neglected, thus creating a "spectral hole" in the reconstitution of said source signal.
  • An object of the present invention is therefore to provide a method for separating source signals included in one or more mixed signals, more effectively.
  • the representative values may be the temporal, spectral or spectro-temporal distribution of the particular source signal, or the temporal, spectral or spectro-temporal contribution of the particular source signal in the mixed signal.
  • the representative values of the source signals can thus be in amplitude modulus or in normalized power (that is to say in energy, which corresponds to the square of the modulus of the amplitude): the representative values can thus be the values in modulus of the amplitude or the values of normalized power (or of energy).
  • Representative values may for example be the temporal, spectral or spectro-temporal distribution of the particular source signal, or the temporal, spectral or spectro-temporal contribution of the particular source signal in the mixed signal, for several zones (or points) of a particular source signal. time-frequency plan.
  • the determination of the amplitude or normalized power module of the particular source signal or signals can be done in the time-frequency plane: the amplitude modules and the normalized powers are spectro-temporal values.
  • a transformation or representation in the time-frequency plane consists in representing, in energy (or normalized power) or in modulus of the amplitude (that is to say the square root of the energy), the source signal in function of two parameters, time and frequency. This corresponds to the evolution, in energy or in module, of the frequency content of the source signal as a function of time.
  • a real positive value corresponding to the signal components at this frequency and at this instant is obtained.
  • the method described it is possible to effectively separate, with spatial filtering improved by the information contained in the mixed signal, the particular source signals, without making any assumption on these different signals (with the exception of the statistical hypotheses classics, ie independence of the source signals, zero mean of these source signals, Gaussian distribution).
  • the method is based on the distribution of each source signal between the different channels of the mixed signal to isolate said source signals (spatial filtering).
  • the use of a linear variance minimum variance filter makes it possible to obtain a high-performance spatial separation, by using the modulus of amplitude or the normalized power of the source signal as a constraint.
  • the spatial filtering step is therefore improved by taking into consideration the representative value of the particular source signal of which we know.
  • the filtering is also based on the amplitude module or the normalized power of the particular source signals.
  • the spatial filtering step may comprise modeling a spatial correlation matrix using the amplitude module or the normalized power of the particular source signals and the distribution of said particular source signal between at least two channels of the mixed signal. .
  • the mixed signal comprises values representative of the particular source signal or signals for at least two channels of the mixed signal, and, before performing the spatial filtering, from the mixed signal and from said representative values, particular source signals are determined. , the distribution of each particular source signal between said at least two channels of the mixed signal.
  • the distribution of the particular source signals between the different channels of the mixed signal can be provided during the implementation of the separation method, for example at the same time as the representative values of said particular source signals, or Well can be determined during the separation process from the multichannel mixed signal and the representative values of the particular source signals.
  • the determination of the modulus of the amplitude or of the normalized power of the particular source signal or signals comprises the extraction of the values representative of the particular source signal or signals which have been inserted in the mixed signal, by example by tattoo.
  • the extraction of the representative values results from the transmission of the representative values of the particular source signals, which can be done with the mixed signal, for example when the information is tattooed or inserted in an inaudible manner, in the mixed signal, or by a channel particular of the mixed signal which is dedicated to the transmission of said representative values.
  • the mixed signal is a stereo signal.
  • the mixed signal comprises values representative of the particular source signal or signals for at least two channels of the mixed signal
  • the device comprises a means of determining, from the mixed signal and said representative values of the particular source signals, the distribution of each particular source signal between said at least two channels of the mixed signal.
  • the means for determining the amplitude or the normalized power module comprises means for extracting the values representative of the particular source signal or signals which have inserted into the mixed signal, for example by tattooing.
  • the mixed signal s mix (t) is a stereo signal with a left channel s mix g (t) and a right channel s mix d (t), and comprising p source signals s 1 (t), ..., s p (t).
  • the signals are audio signals.
  • the short-term Fourier transformation is considered to be a transformation in the time-frequency plane.
  • the linear constraint of the spatial filter is the normalized power.
  • ⁇ i (k, m) S i k m 2
  • the representative value of the source signal can thus be
  • the representative value of the source signal can also be determined after processing on the source signal, for example by reducing the frequency resolution of the energy spectrum or by adapting the quantification of the representative values to the sensitivity of the human ear. It is then possible to obtain representative values of the source signals which are less bulky in terms of size, while maintaining a desired sound quality.
  • the representative value of the source signals is the quantized normalized power value (or energy) ⁇ i (k, m).
  • the representative values of the source signals ⁇ i (k, m) are transmitted to the separation device or decoder. They can be by a dedicated channel (associated with the stereo channels to form the mixed signal), or by incorporation in the mixed signal, for example tattooing or using unused bits of the mixed signal.
  • the separation device may comprise means for extracting the representative values, receiving as input the mixed signal and outputting the representative values of the source signals.
  • the separation device can also receive the distributions of the source signals in each channel (or channel) of the mixed signal: a 1 g , ... a p g , a 1 d , ... a p d .
  • These distributions can be transmitted by a dedicated channel (associated with the stereo channels to form the mixed signal, or independent of the stereo channels), or by incorporation into the mixed signal, for example by tattooing or by using unused bits of the mixed signal.
  • the separation device may comprise a means for extracting the distributions of the source signals, receiving as input the mixed signal and providing, as output, the distributions of the source signals.
  • the means for extracting the representative values and the means for extracting the distributions can be one and the same means.
  • the separation device may comprise means for determining the distributions of the source signals: such a determination means may receive as input the mixed signal and the representative values ⁇ i (k, m), and output the distribution of said signal source a i g , a i d .
  • a determination means may receive as input the mixed signal and the representative values ⁇ i (k, m), and output the distribution of said signal source a i g , a i d .
  • FIG 1 schematically an embodiment of a separation device 1 of particular source signals contained in a mixed signal s mix .
  • the separation device 1 receives as input the stereo channels s mix g and s mix d of the mixed signal s mix , and delivers separate particular source signals at least partially s' i , with 1 varying from 1 to p.
  • the purpose of the separation device 1 is to deliver, at least partially, a plurality of particular source signals contained in the mixed signal mix by using the representative values of said particular source signals ⁇ i (k, m).
  • the separation device 1 receives, as input, the channels of the mixed digital audio signal s mix g (t) and s mix d (t), in which are inserted, for example by tattooing, the values representative of the particular source signals ⁇ i (k, m), and possibly the distributions at 1 g , ..., a p g , a 1 d , ..., a p d of particular source signals between the two signal channels Mixed digital audio s mix d (t) and s mix g (t).
  • the separating device 1 comprises a transformation means 2, an extraction means 3, a processing means 4, a filtering means 5, and an inverse transformation means 6.
  • the transformation means 2 receives as input the channels of the mixed digital audio signal s mix g (t) and s mix d (t) and delivers, at the output, the transform of the channels of the mixed signal in the time-frequency plane S mix g (k, m) and S mix d (k, m).
  • the extraction means 3 receives as input the transformation of the channels of the mixed signal in the time-frequency plane S mix d (k, m) and S mix g (k, m), and delivers the representative values ⁇ i (k, m) particular source signals contained in the mixed signal.
  • the extraction means 3 can, if necessary, also deliver the distributions a i g , ..., a p g , a 1 d , ..., a p d of particular source signals between the two channels of the mixed signal.
  • digital audio s mix d (t) and s mix g (t) when these are inserted in the mixed signal.
  • the extraction means 3 thus makes it possible to extract from the mixed signal the representative values that have been added thereto a posteriori, for example by tattooing, and to isolate the mixed signal.
  • the representative values ⁇ i (k, m) are then transmitted to the processing means 4 and, where appropriate, the distributions at 1 g , ..., a p g , a 1 d , ..., a p d are transmitted by means of filtering 5.
  • the extraction means 3 may alternatively receive directly at the input, the channels of the mixed signal s mix d (t) and s mix g (t).
  • the processing means 4 make it possible to process the representative values ⁇ i (k, m) received by the extraction means 3, in order to determine an estimate of the normalized power ⁇ ' i (k, m) of the source signals to be separated, in the time-frequency plane.
  • the estimates of the normalized power ⁇ ' i (k, m) of the source signals to be separated are then transmitted to the filtering means 5.
  • the filtering means 5 makes it possible to obtain an estimation S ' i (k, m) of each particular source signal, by means of spatial filtering.
  • the filtering means 5 makes it possible, in the time-frequency plan, to isolate the particular source signal, by means of spatial filtering with minimum variance under linear stress. More particularly, the filtering means 5 is based on the distribution of said particular source signal between the two channels of the mixed signal to isolate the particular source signal: it is therefore a spatial filtering (in English: "beamforming").
  • the spatial filter uses the normalized power of the particular source signal to be separated as a linear constraint, in order to obtain an estimate closer to the original source signal.
  • Each mixed signal S mix d (k, m) and S mix g (k, m) is then decomposed into estimates of particular source signals S ' 1 (k, m), ..., S' p (k, m) using linear spatial filtering:
  • S ' i k m w ik boy Wut .
  • S mix d k m W ik T .
  • W ik is the spatial filter (or "beamformer") for obtaining the estimate S ' i (k, m) of the i th source signal in the subband k from the mixed signal S mix (k, m) .
  • the resulting filter makes it possible to reduce the contribution of the power spectrum of the other signals.
  • the power of the estimated source signal corresponds to the power of the initial source signal for the different points of the time-frequency plane (which can be verified by re-injecting the solution W ik into the equation defining P ( ⁇ i )).
  • the filtering means 5 makes it possible to spatially decorrelate the i th source signal from the rest of the mixed signal, while adjusting the amplitude of said decorrelated signal to the desired level.
  • the transforms of the estimates of the separate particular source signals are then transmitted to the inverse transformation means 6.
  • the means 6 makes it possible to transform the transforms of the estimates of the separate source signals into time signals s' 1 (t), ..., s'p (t) corresponding, at least partially, to the source signals s 1 (t), ..., s p (t).
  • the method includes a first step 7 in which the mixed signal is transformed in a time-frequency plane. Then, in a step 8, the information tattooed in the mixed signal is extracted, in particular the representative values and the distributions of the source signals between at least two channels of the mixed signal. During a step 9, the normalized powers of the source signals to be separated are determined, and then, in step 10, a minimum variance spatial filtering under linear stress is performed, the constraint being the normalized power of the source signal to be separated. . Finally, in a step 11, an inverse transformation of the transforms of the separate particular source signals is performed so as to obtain, at least partially, the particular source signals.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Mathematical Physics (AREA)
  • Stereophonic System (AREA)

Description

La présente invention concerne un procédé destiné à séparer certains des signaux sources composant un signal global numérique audio. L'invention concerne également un dispositif destiné à mettre en oeuvre ce procédé.The present invention relates to a method for separating some of the source signals comprising a digital audio overall signal. The invention also relates to a device for implementing this method.

Le mixage de signaux consiste à sommer plusieurs signaux, appelés signaux sources, pour obtenir un ou plusieurs signaux composites, appelés signaux mixés. Dans les applications audio notamment, le mixage peut consister en une simple étape d'addition des signaux sources ou peut également comprendre des étapes de filtrage des signaux avant et/ou après l'addition. Par ailleurs, pour certaines applications telles que le compact-disc audio, les signaux sources peuvent être mixés de manière différente pour former deux signaux mixés correspondant aux deux voies ou canaux (gauche et droite) d'un signal stéréo.Signal mixing consists of summing several signals, called source signals, to obtain one or more composite signals, called mixed signals. In audio applications in particular, the mixing may consist of a simple step of adding the source signals or may also include signal filtering steps before and / or after the addition. On the other hand, for some applications such as compact disc audio, the source signals can be mixed differently to form two mixed signals corresponding to the two channels or channels (left and right) of a stereo signal.

La séparation de sources consiste à estimer des signaux sources à partir de l'observation d'un certain nombre de signaux mixés différents formés à partir de ces mêmes signaux sources. L'objectif est généralement de rehausser, voire si possible d'extraire complètement un ou plusieurs signaux sources cibles. La séparation de sources est notamment difficile dans les cas dits « sous-déterminés » dans lesquels on dispose d'un nombre de signaux mixés inférieur au nombre des signaux sources présents dans les signaux mixés. L'extraction est dans ce cas très difficile voire impossible en raison de la faible quantité d'information disponible dans ces signaux mixés par rapport à celle présente dans les signaux sources. Les signaux de musique sur compact-disc audio en sont un exemple particulièrement représentatif car on ne dispose que de deux canaux stéréo (c'est-à-dire deux signaux mixés gauche et droite), généralement très redondants, pour un grand nombre potentiel de signaux sources.Separation of sources consists of estimating source signals from the observation of a certain number of different mixed signals formed from these same source signals. The objective is generally to enhance, if possible to completely extract one or more target source signals. The separation of sources is particularly difficult in so-called "under-determined" cases in which there is a number of mixed signals less than the number of source signals present in the mixed signals. The extraction is in this case very difficult or impossible because of the small amount of information available in these mixed signals compared to that present in the source signals. Music signals on compact-disc audio are a particularly representative example because only two stereo channels (ie two mixed left and right signals), which are generally very redundant, are available for a large number of potential speakers. source signals.

Il existe plusieurs types d'approches dans la séparation de signaux sources : parmi elles la séparation aveugle, l'analyse de scènes auditives computationnelle, et la séparation basée sur des modèles. La séparation aveugle est la forme la plus générale, dans laquelle aucune information sur les signaux sources ni sur la nature des signaux mixés n'est connue à priori. On fait alors un certain nombre d'hypothèses sur ces signaux sources et les signaux mixés (par exemple que les signaux sources sont statistiquement indépendants) et on estime les paramètres d'un système de séparation en maximisant un critère basé sur ces hypothèses (par exemple en maximisant l'indépendance des signaux obtenus par le dispositif de séparation). Cependant, cette méthode est utilisée généralement dans les cas où l'on dispose de nombreux signaux mixés (au moins autant que de signaux sources) et n'est donc pas applicable aux cas sous-déterminés dans lesquels le nombre de signaux mixés est inférieur au nombre de signaux sources.There are several types of approaches in the separation of source signals: among them the blind separation, the analysis of scenes Computational auditory, and model-based separation. Blind separation is the most general form, in which no information on the source signals nor on the nature of the mixed signals is known a priori. We then make a number of assumptions about these source signals and the mixed signals (for example that the source signals are statistically independent) and we estimate the parameters of a separation system by maximizing a criterion based on these hypotheses (for example maximizing the independence of the signals obtained by the separation device). However, this method is generally used in cases where there are many mixed signals (at least as much as source signals) and is therefore not applicable to under-determined cases in which the number of mixed signals is less than number of source signals.

L'analyse de scènes auditives computationnelle consiste généralement en une modélisation des signaux sources en partiels, mais le signal mixé n'est pas décomposé explicitement. Cette méthode se base sur les mécanismes du système auditif humain pour séparer les signaux sources de la même façon que le fait notre oreille. On peut notamment citer : D.P.W. Ellis, Using knowledge to organize sound: The prediction-driven approach to computational auditory scene analysis, and its application to speechlnon-speech mixture (Speech Communication, 27(3), pp. 281-298, 1999 ), D. Godsmark et G.J.Brown, A blackboard architecture for computational auditory scene analysis (Speech Communication, 27(3), pp. 351-366, 1999 ), de même que T. Kinoshita, S. Sakai, et H. Tanaka, Musical sound source identification based on frequency component adaptation (In Proc. IJCAI Workshop on CASA, pp. 18-24, 1999 ). Cependant, l'analyse de scènes auditives computationnelle conduit actuellement à des résultats insuffisants en terme de qualité des signaux sources séparés.Computational auditory scene analysis usually consists of modeling partial source signals, but the mixed signal is not explicitly decomposed. This method is based on the mechanisms of the human auditory system to separate the source signals in the same way that our ear does. These include: DPW Ellis, Using knowledge to organize sound: The prediction-driven approach to computational auditory scene analysis, and its application to speechlnon-speech mixture (Speech Communication, 27 (3), pp. 281-298, 1999 ) D. Godsmark and GJBrown, A blackboard architecture for computational auditory scene analysis (Speech Communication, 27 (3), pp. 351-366, 1999 ), in the same way T. Kinoshita, S. Sakai, and H. Tanaka, Musical sound source identification based on frequency adaptation (In Proc. IJCAI Workshop on CASA, pp. 18-24, 1999 ). However, computational auditory scene analysis currently leads to insufficient results in terms of the quality of the separate source signals.

Une autre forme de séparation repose sur une décomposition du mélange sur une base de fonctions adaptées. Il en existe deux grandes catégories : la décomposition parcimonieuse temporelle et la décomposition parcimonieuse en fréquence.Another form of separation relies on a decomposition of the mixture on the basis of suitable functions. There are two main categories: the temporary parsimonious decomposition and the parsimonious decomposition in frequency.

Pour la première, il s'agit de décomposer la forme d'onde du mélange, et pour l'autre il s'agit de décomposer sa représentation spectrale, en une somme de fonctions élémentaires appelées « atomes » éléments d'un dictionnaire. Divers algorithmes permettent de choisir le type de dictionnaire et la décomposition correspondante la plus vraisemblable. Pour le domaine temporel, on peut citer notamment : L. Benaroya, Représentations parcimonieuses pour la séparation de sources avec un seul capteur (Proc. GRETSI, 2001 ), ou P.J. Wolfe et S.J. Godsill, A Gabor régression scheme for audio signal analysis (Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 103-106, 2003 ). Dans la méthode proposée par Gribonval ( R. Gribonval and E. Bacry, Harmonic Decomposition of Audio Signals With Matching Poursuit, IEEE Trans. Signal Proc., 51(1), pp. 101-112, 2003 ), on classe les atomes de décomposition en sous-espaces indépendants, ce qui permet d'extraire des groupes de partiels harmoniques. Une des restrictions de cette méthode est que des dictionnaires génériques d'atomes tels que les atomes de Gabor par exemple, non adaptés aux signaux, ne donnent pas de bons résultats. De plus, pour que ces décompositions soient efficaces, il faut que le dictionnaire contienne toutes les formes translatées des formes d'ondes de chaque type d'instrument. Les dictionnaires de décomposition doivent alors être extrêmement volumineux pour que la projection et donc la séparation soient efficaces.For the first, it is a question of decomposing the waveform of the mixture, and for the other it is a question of decomposing its spectral representation, in a sum of elementary functions called "atoms" elements of a dictionary. Various algorithms allow to choose the type of dictionary and the corresponding decomposition most likely. For the time domain, mention may in particular be made of: L. Benaroya, Sparse representations for source separation with a single sensor (GRETSI Proceedings, 2001) ), or PJ Wolfe and SJ Godsill, A Gabor Regression scheme for audio signal analysis (IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 103-106, 2003). ). In the method proposed by Gribonval ( R. Gribonval and E. Bacry, Harmonic Decomposition of Audio Signals with Matching Pursuit, IEEE Trans. Signal Proc., 51 (1), pp. 101-112, 2003 ), we classify the atoms of decomposition in independent subspaces, which makes it possible to extract groups of harmonic partials. One of the restrictions of this method is that generic dictionaries of atoms such as the Gabor atoms, for example, not adapted to the signals, do not give good results. Moreover, for these decompositions to be effective, the dictionary must contain all the translated forms of the waveforms of each type of instrument. The decomposition dictionaries must then be extremely large for projection and thus separation to be effective.

Pour pallier à ce problème d'invariance par translation qui apparaît dans le cas temporel, il existe des approches de décomposition parcimonieuse en fréquence. On peut citer notamment M.A. Casey et A. Westner (Separation of mixed audio sources by independent subspace analysis, Proc. Int. Computer Music Conf., 2000 ) qui ont introduit l'analyse en sous-espaces indépendants (ISA). Cette analyse consiste à décomposer le spectre d'amplitude à court terme du signal mixé (calculé par transformée de Fourier à court terme (TFCT)) sur une base d'atomes, et ensuite à regrouper les atomes en sous-espaces indépendants, chaque sous-espace étant propre à une source, pour ensuite resynthétiser les sources séparément. Cependant, cette approche est généralement limitée par plusieurs facteurs : la résolution de l'analyse spectrale par TFCT, la superposition des sources dans ce domaine spectral, et la restriction de la séparation spectrale à l'amplitude (la phase des signaux resynthétisés étant celle du signal mixé). Il est ainsi généralement difficile de représenter le signal mixé comme une somme de sous-espaces indépendants du fait de la complexité de la scène sonore dans le domaine spectral (imbrication forte des différentes composantes) et en raison de l'évolution, en fonction du temps, de la contribution de chaque composante dans le signal mixé. De fait, les méthodes sont souvent évaluées sur des signaux mixés « simplifiés » bien contrôlés (les signaux sources sont des instruments MIDI ou sont des instruments relativement bien séparables, en nombre restreint).To overcome this problem of invariance by translation which appears in the temporal case, there are approaches of parsimonious decomposition in frequency. We can mention in particular MA Casey and A. Westner (Separation of mixed audio sources by independant subspace analysis, Proc. Int. Computer Music Conf., 2000 ) who introduced the analysis in independent subspace (ISA). This analysis consists of breaking down the short-term amplitude spectrum of the mixed signal (calculated by short-term Fourier transform (TFCT)) on an atomic basis, and then grouping the atoms into independent subspaces, each sub-space. space being specific to a source, to then resynthesize the sources separately. However, this approach is generally limited by several factors: the resolution of the spectral analysis by TFCT, the superposition of the sources in this spectral domain, and the restriction of the spectral separation to the amplitude (the phase of the resynthesized signals being that of the mixed signal ). It is thus generally difficult to represent the mixed signal as a sum of independent subspaces because of the complexity of the sound scene in the spectral domain (strong interweaving of the different components) and because of the evolution as a function of time , the contribution of each component in the mixed signal. In fact, the methods are often evaluated on well-controlled "simplified" mixed signals (the source signals are MIDI instruments or are relatively well separable instruments, in limited numbers).

Une autre méthode de séparation de sources est la séparation de sources « informée » : des informations relatives à un ou plusieurs signaux sources sont transmises avec le signal mixé au décodeur. Le décodeur est alors capable, à partir d'algorithmes et desdites informations, de séparer au moins partiellement au moins un signal source du signal mixé. Un exemple de séparation de sources informée est décrit par M. Parvaix et L. Girin (Informed source separation of linear instantaneous under-determined audio mixtures by source index embedding, IEEE trans. Audio Speech Lang. Process., volume 19, pages 1721-1733, août 2011 ). L'information transmise au décodeur indique notamment les deux signaux sources prédominants dans le signal mixé, pour différentes zones fréquentielles. Cependant, une telle méthode n'est pas toujours adaptée lorsqu'il existe plus de deux signaux sources contribuant simultanément dans une même zone fréquentielle du signal mixé : dans ce cas, au moins un signal source est négligé, créant ainsi un « trou spectral » dans la reconstitution dudit signal source.Another method of source separation is "informed" source separation: information relating to one or more source signals is transmitted with the mixed signal to the decoder. The decoder is then able, from algorithms and information, to at least partially separate at least one source signal of the mixed signal. An example of informed source separation is described by M. Parvaix and L. Girin (IEEE trans. Audio Speech Lang. Process., Volume 19, pages 1721-1733, August 2011). ). The information transmitted to the decoder indicates in particular the two predominant source signals in the mixed signal, for different frequency zones. However, such a method is not always suitable when there are more than two source signals contributing simultaneously in the same frequency zone of the mixed signal: in this case, at least one source signal is neglected, thus creating a "spectral hole" in the reconstitution of said source signal.

Il est également connu, notamment dans le domaine des télécommunications, de filtrer des signaux captés par une pluralité de capteurs, en fonction de la position dans l'espace desdits signaux par rapport auxdits capteurs. Il s'agit ainsi d'un filtrage spatial (ou encore « beamforming ») qui permet de privilégier le signal dans une direction de l'espace donnée et de filtrer les signaux issus d'autres directions. Un exemple de tels filtres sont les filtres spatiaux à variance minimum sous contrainte linéaire (en anglais : « linearly constrained minimum variance (LCMV) »). Un exemple d'un tel filtre est notamment divulgué dans le document EP 1 633 121 .It is also known, particularly in the field of telecommunications, to filter signals picked up by a plurality of sensors, as a function of the position in the space of said signals relative to said sensors. This is a spatial filtering (or Beamforming) which makes it possible to privilege the signal in a direction of the given space and to filter the signals coming from other directions. An example of such filters are the linear minimum constrained variance (LMCV) spatial filters. An example of such a filter is disclosed in particular in the document EP 1 633 121 .

Un but de la présente invention est donc de proposer un procédé permettant de séparer des signaux sources compris dans un ou plusieurs signaux mixés, de manière plus efficace.An object of the present invention is therefore to provide a method for separating source signals included in one or more mixed signals, more effectively.

A cet effet, dans un mode de réalisation, il est proposé un procédé pour séparer, au moins partiellement, un ou plusieurs signaux sources numériques audio particuliers contenus dans un signal mixé numérique audio multicanal (c'est-à-dire comprenant au moins deux canaux), par exemple stéréo. Le signal mixé est obtenu par mixage de plusieurs signaux sources numériques audio et comprend des valeurs représentatives du ou des signaux sources particuliers. Selon le procédé :

  • on détermine le module de l'amplitude ou la puissance normalisée du ou des signaux sources particuliers à partir des valeurs représentatives dudit ou desdits signaux sources particuliers contenues dans le signal mixé, puis
  • on effectue un filtrage spatial à variance minimale sous contrainte linéaire du signal mixé pour obtenir au moins partiellement chaque signal source particulier, ledit filtrage étant basé sur la répartition dudit signal source particulier entre au moins deux canaux du signal mixé, et le module de l'amplitude ou la puissance normalisée dudit signal source particulier étant utilisé comme contrainte linéaire du filtre.
For this purpose, in one embodiment, there is provided a method for separating, at least partially, one or more particular digital audio source signals contained in a multichannel digital audio mix signal (i.e. comprising at least two channels), for example stereo. The mixed signal is obtained by mixing several digital audio source signals and includes values representative of the particular source signal or signals. According to the method:
  • the modulus of the amplitude or the normalized power of the particular source signal or signals is determined from the values representative of the at least one particular source signal contained in the mixed signal, and then
  • linearly minimally space-constrained spatial filtering of the mixed signal to at least partially obtain each particular source signal, said filtering being based on the distribution of said particular source signal between at least two channels of the mixed signal, and the module of the amplitude or the normalized power of said particular source signal being used as a linear constraint of the filter.

Les valeurs représentatives peuvent être la répartition temporelle, spectrale ou spectro-temporelle du signal source particulier, ou la contribution temporelle, spectrale ou spectro-temporelle du signal source particulier dans le signal mixé. Les valeurs représentatives des signaux sources peuvent ainsi être en module de l'amplitude ou en puissance normalisée (c'est-à-dire en énergie, qui correspond au carré du module de l'amplitude) : les valeurs représentatives peuvent donc être les valeurs en module de l'amplitude ou les valeurs de puissance normalisée (ou d'énergie).The representative values may be the temporal, spectral or spectro-temporal distribution of the particular source signal, or the temporal, spectral or spectro-temporal contribution of the particular source signal in the mixed signal. The representative values of the source signals can thus be in amplitude modulus or in normalized power (that is to say in energy, which corresponds to the square of the modulus of the amplitude): the representative values can thus be the values in modulus of the amplitude or the values of normalized power (or of energy).

Les valeurs représentatives peuvent par exemple être la répartition temporelle, spectrale ou spectro-temporelle du signal source particulier, ou la contribution temporelle, spectrale ou spectro-temporelle du signal source particulier dans le signal mixé, pour plusieurs zones (ou points) d'un plan temps-fréquence. Dans ce cas, la détermination du module de l'amplitude ou de la puissance normalisée du ou des signaux sources particuliers peut se faire dans le plan temps-fréquence : les modules de l'amplitude et les puissances normalisées sont des valeurs spectro-temporelles.Representative values may for example be the temporal, spectral or spectro-temporal distribution of the particular source signal, or the temporal, spectral or spectro-temporal contribution of the particular source signal in the mixed signal, for several zones (or points) of a particular source signal. time-frequency plan. In this case, the determination of the amplitude or normalized power module of the particular source signal or signals can be done in the time-frequency plane: the amplitude modules and the normalized powers are spectro-temporal values.

Une transformation ou une représentation dans le plan temps-fréquence consiste à représenter, en énergie (ou puissance normalisée) ou en module de l'amplitude (c'est-à-dire la racine carrée de l'énergie), le signal source en fonction de deux paramètres, le temps et la fréquence. Cela correspond à l'évolution, en énergie ou en module, du contenu fréquentiel du signal source en fonction du temps. On obtient ainsi, pour un instant donné et une fréquence donnée, une valeur positive réelle correspondant aux composantes du signal à cette fréquence et à cet instant. Des exemples de formulations théoriques et de mises en oeuvre pratiques de représentations temps-fréquence sont déjà décrites ( L. Cohen : Time-Frequency Distributions, a Review, Proceedings of the IEEE, vol. 77, N° 7, 1989 ; F. Hlawatsch, F. Auger : Temps-fréquence, concepts et outils, Hermès Science, Lavoisier 2005 ; P. Flandrin : Temps Fréquence, Hermès Science, 1998 ).A transformation or representation in the time-frequency plane consists in representing, in energy (or normalized power) or in modulus of the amplitude (that is to say the square root of the energy), the source signal in function of two parameters, time and frequency. This corresponds to the evolution, in energy or in module, of the frequency content of the source signal as a function of time. Thus, for a given instant and a given frequency, a real positive value corresponding to the signal components at this frequency and at this instant is obtained. Examples of theoretical formulations and practical implementations of time-frequency representations are already described ( L. Cohen: Time-Frequency Distributions, Review, Proceedings of the IEEE, vol. 77, No. 7, 1989 ; F. Hlawatsch, F. Auger: Time-frequency, concepts and tools, Hermès Science, Lavoisier 2005 ; P. Flandrin: Time Frequency, Hermès Science, 1998 ).

Ainsi, grâce au procédé décrit, il est possible de séparer efficacement, avec un filtrage spatial amélioré par les informations contenues dans le signal mixé, les signaux sources particuliers, sans faire d'hypothèse sur ces différents signaux (à l'exception des hypothèses statistiques classiques, i.e. : indépendance des signaux sources, moyenne nulle de ces signaux sources, distribution gaussienne). En particulier, le procédé est basé sur la répartition de chaque signal source entre les différents canaux du signal mixé pour isoler lesdits signaux sources (filtrage spatial). L'utilisation d'un filtre à variance minimum sous contrainte linéaire permet d'obtenir une séparation spatiale performante, en faisant intervenir, comme contrainte, le module de l'amplitude ou la puissance normalisée du signal source. On peut ainsi décorréler spatialement un signal source particulier du signal mixé, et ajuster l'amplitude du signal séparé au niveau souhaité dans le même temps. On améliore donc l'étape de filtrage spatial en prenant en considération la valeur représentative du signal source particulier dont on a connaissance.Thus, thanks to the method described, it is possible to effectively separate, with spatial filtering improved by the information contained in the mixed signal, the particular source signals, without making any assumption on these different signals (with the exception of the statistical hypotheses classics, ie independence of the source signals, zero mean of these source signals, Gaussian distribution). In particular, the method is based on the distribution of each source signal between the different channels of the mixed signal to isolate said source signals (spatial filtering). The use of a linear variance minimum variance filter makes it possible to obtain a high-performance spatial separation, by using the modulus of amplitude or the normalized power of the source signal as a constraint. It is thus possible to spatially decorrelate a particular source signal of the mixed signal, and to adjust the amplitude of the separated signal to the desired level at the same time. The spatial filtering step is therefore improved by taking into consideration the representative value of the particular source signal of which we know.

Il est notamment possible d'isoler simultanément les différents signaux sources particuliers présents dans le signal mixé, par exemple en mettant en oeuvre autant de filtres spatiaux qu'il y a de signaux sources à séparer.It is in particular possible to simultaneously isolate the different particular source signals present in the mixed signal, for example by using as many spatial filters as there are source signals to be separated.

Préférentiellement, le filtrage est également basé sur le module de l'amplitude ou la puissance normalisée des signaux sources particuliers. Plus précisément, l'étape de filtrage spatial peut comprendre la modélisation d'une matrice de corrélation spatiale utilisant le module de l'amplitude ou la puissance normalisée des signaux sources particuliers et la répartition dudit signal source particulier entre au moins deux canaux du signal mixé.Preferably, the filtering is also based on the amplitude module or the normalized power of the particular source signals. More specifically, the spatial filtering step may comprise modeling a spatial correlation matrix using the amplitude module or the normalized power of the particular source signals and the distribution of said particular source signal between at least two channels of the mixed signal. .

Préférentiellement, le signal mixé comprend des valeurs représentatives du ou des signaux sources particuliers pour au moins deux canaux du signal mixé, et, avant d'effectuer le filtrage spatial, on détermine, à partir du signal mixé et desdites valeurs représentatives des signaux sources particuliers, la répartition de chaque signal source particulier entre lesdits au moins deux canaux du signal mixé.Preferably, the mixed signal comprises values representative of the particular source signal or signals for at least two channels of the mixed signal, and, before performing the spatial filtering, from the mixed signal and from said representative values, particular source signals are determined. , the distribution of each particular source signal between said at least two channels of the mixed signal.

Alternativement, on peut recevoir en entrée, par exemple avec le signal mixé, la répartition du ou des signaux sources particuliers entre au moins deux canaux du signal mixé.Alternatively, it is possible to receive as input, for example with the mixed signal, the distribution of the particular source signal or signals between at least two channels of the mixed signal.

Autrement dit, la répartition des signaux sources particuliers entre les différents canaux du signal mixé peut être fournie lors de la mise en oeuvre du procédé de séparation, par exemple en même temps que les valeurs représentatives desdits signaux sources particuliers, ou bien peut être déterminée pendant le procédé de séparation, à partir du signal mixé multicanal et des valeurs représentatives des signaux sources particuliers.In other words, the distribution of the particular source signals between the different channels of the mixed signal can be provided during the implementation of the separation method, for example at the same time as the representative values of said particular source signals, or Well can be determined during the separation process from the multichannel mixed signal and the representative values of the particular source signals.

Selon un mode de mise en oeuvre, la détermination du module de l'amplitude ou de la puissance normalisée du ou des signaux sources particuliers comprend l'extraction des valeurs représentatives du ou des signaux sources particuliers qui ont été insérées dans le signal mixé, par exemple par tatouage. L'extraction des valeurs représentatives découle de la transmission des valeurs représentatives des signaux sources particuliers, qui peut se faire avec le signal mixé, par exemple lorsque les informations sont tatouées ou insérées de manière inaudible, dans le signal mixé, ou bien par un canal particulier du signal mixé qui est dédié à la transmission desdites valeurs représentatives.According to one embodiment, the determination of the modulus of the amplitude or of the normalized power of the particular source signal or signals comprises the extraction of the values representative of the particular source signal or signals which have been inserted in the mixed signal, by example by tattoo. The extraction of the representative values results from the transmission of the representative values of the particular source signals, which can be done with the mixed signal, for example when the information is tattooed or inserted in an inaudible manner, in the mixed signal, or by a channel particular of the mixed signal which is dedicated to the transmission of said representative values.

Selon un autre aspect, il est proposé un dispositif pour séparer, au moins partiellement, un ou plusieurs signaux sources numériques audio particuliers contenus dans un signal mixé numérique audio multicanal. Le signal mixé est obtenu par mixage de plusieurs signaux sources numériques audio et comprend des valeurs représentatives du ou des signaux sources particuliers. Le dispositif comprend :

  • un moyen de détermination du module de l'amplitude ou de la puissance normalisée du ou des signaux sources particuliers à partir des valeurs représentatives dudit ou desdits signaux sources particuliers contenues dans le signal mixé, et
  • un filtre spatial à variance minimale sous contrainte linéaire apte à isoler au moins partiellement, à partir du signal mixé, chaque signal source particulier, ledit filtre étant basé sur la répartition dudit signal source particulier entre au moins deux canaux du signal mixé, et le module de l'amplitude ou la puissance normalisée dudit signal source particulier étant utilisé comme contrainte linéaire.
In another aspect, there is provided a device for separating, at least partially, one or more particular digital audio source signals contained in a mixed digital audio multichannel signal. The mixed signal is obtained by mixing several digital audio source signals and includes values representative of the particular source signal or signals. The device comprises:
  • means for determining the modulus of the amplitude or the normalized power of the particular source signal or signals from the values representative of the at least one particular source signal contained in the mixed signal, and
  • a linear constrained minimum variance spatial filter adapted to at least partially isolate, from the mixed signal, each particular source signal, said filter being based on the distribution of said particular source signal between at least two channels of the mixed signal, and the module amplitude or normalized power of said particular source signal being used as a linear constraint.

Préférentiellement, le signal mixé est un signal stéréo.Preferably, the mixed signal is a stereo signal.

Préférentiellement, le signal mixé comprend des valeurs représentatives du ou des signaux sources particuliers pour au moins deux canaux du signal mixé, et le dispositif comprend un moyen de détermination, à partir du signal mixé et desdites valeurs représentatives des signaux sources particuliers, de la répartition de chaque signal source particulier entre lesdits au moins deux canaux du signal mixé.Preferably, the mixed signal comprises values representative of the particular source signal or signals for at least two channels of the mixed signal, and the device comprises a means of determining, from the mixed signal and said representative values of the particular source signals, the distribution of each particular source signal between said at least two channels of the mixed signal.

Préférentiellement, le moyen de détermination du module de l'amplitude ou de la puissance normalisée comprend un moyen d'extraction des valeurs représentatives du ou des signaux sources particuliers qui ont insérées dans le signal mixé, par exemple par tatouage.Preferably, the means for determining the amplitude or the normalized power module comprises means for extracting the values representative of the particular source signal or signals which have inserted into the mixed signal, for example by tattooing.

L'invention sera mieux comprise à l'étude d'un mode de réalisation particulier, pris à titre d'exemple nullement limitatif et illustré par les dessins annexés, sur lesquels :

  • la figure 1 représente schématiquement un mode de réalisation d'un dispositif de séparation selon l'invention ; et
  • la figure 2 est un organigramme d'un procédé de séparation selon l'invention.
The invention will be better understood from the study of a particular embodiment, taken by way of non-limiting example and illustrated by the appended drawings, in which:
  • the figure 1 schematically represents an embodiment of a separation device according to the invention; and
  • the figure 2 is a flowchart of a separation process according to the invention.

Dans la suite de la description détaillée, on considère que le signal mixé smix(t) est un signal stéréo avec un canal gauche smix g(t) et un canal droite smix d(t), et comprenant p signaux sources s1(t), ... , sp(t). Le signal mixé smix(t) peut s'écrire comme le produit des p signaux sources par une matrice de mixage A : A = a 1 g , , a p g = a 1 , , a p a 1 d , , a p d

Figure imgb0001
où ai=[ai g , ai d]T (T représentant la transposée de la matrice) et ai g et ai d représentent la répartition du signal source i dans chaque canal du signal mixé : (ai g)2 + (ai d)2 =1.In the remainder of the detailed description, it is considered that the mixed signal s mix (t) is a stereo signal with a left channel s mix g (t) and a right channel s mix d (t), and comprising p source signals s 1 (t), ..., s p (t). The mixed signal s mix (t) can be written as the product of the p source signals by a mixing matrix A: AT = at 1 boy Wut , ... , at p boy Wut = at 1 , ... , at p at 1 d , ... , at p d
Figure imgb0001
where a i = [a i g , a i d ] T ( T representing the transpose of the matrix) and a i g and a i d represent the distribution of the source signal i in each channel of the mixed signal: (a i g ) 2 + (a i d ) 2 = 1.

Plus précisément, les coefficients ai g et ai d peuvent s'écrire sous la forme suivante : ai g= sin(θi) et ai d= cos(θi), où θi représente la balance du signal source i entre les deux canaux du signal mixé.More precisely, the coefficients a i g and a i d can be written in the following form: a i g = sin (θ i ) and a i d = cos (θ i ), where θ i represents the balance of the source signal i between the two channels of the mixed signal.

Autrement dit, on a : s mix t = A . s t

Figure imgb0002
avec : smix(t)=[smix g(t),smix d(t)]T et s(t)=[s1(t),...,sp(t)]T (T représentant la transposée).In other words, we have: s mix t = AT . s t
Figure imgb0002
with: s mix (t) = [s mix g (t), s mix d (t)] T and s (t) = [s 1 (t), ..., s p (t)] T ( T representing the transpose).

Par ailleurs, on considère dans la suite de la description, que les signaux sont des signaux audio.Furthermore, it is considered in the remainder of the description, that the signals are audio signals.

On considère, dans le cas de la présente description, la transformation de Fourier à court terme comme transformation dans le plan temps-fréquence. La transformée du signal source i dans le plan temps-fréquence s'écrit ainsi sous la forme : S i k m = s i k + n f n e 2 iπmn / N

Figure imgb0003
où N est une constante et f(n) est une fonction de fenêtre de la transformée de Fourier à court terme.In the case of this description, the short-term Fourier transformation is considered to be a transformation in the time-frequency plane. The transform of the source signal i in the time-frequency plane is thus written in the form: S i k m = Σ s i k + not f not e - 2 iπmn / NOT
Figure imgb0003
where N is a constant and f (n) is a window function of the short-term Fourier transform.

On considère dans la suite de la description que la contrainte linéaire du filtre spatial est la puissance normalisée. Pour un signal source si donné, et un point (k,m) du plan temps-fréquence donné, on obtient donc comme énergie ou puissance normalisée ϕi(k,m) : ϕ i k m = S i k m 2

Figure imgb0004
It is considered in the remainder of the description that the linear constraint of the spatial filter is the normalized power. For a given source signal s i , and a point (k, m) of the given time-frequency plane, one thus obtains as normalized energy or power φ i (k, m): φ i k m = S i k m 2
Figure imgb0004

La valeur représentative du signal source peut ainsi être |Si(k,m)| (valeur en module) ou bien ϕi(k,m) (valeur en énergie égale à la valeur en puissance normalisée). La valeur représentative du signal source peut également être le logarithme de la valeur en énergie : Φ i = 10 log 10 ϕ i k m

Figure imgb0005
The representative value of the source signal can thus be | S i (k, m) | (value in module) or else φ i (k, m) (energy value equal to the value in standardized power). The representative value of the source signal can also be the logarithm of the energy value: Φ i = 10 log 10 φ i k m
Figure imgb0005

La valeur représentative du signal source peut également être déterminée après avoir effectué des traitements sur le signal source, par exemple en réduisant la résolution fréquentielle du spectre en énergie ou bien encore en adaptant la quantification des valeurs représentatives à la sensibilité de l'oreille humaine. Il est alors possible d'obtenir des valeurs représentatives des signaux sources qui sont moins volumineuses en terme de taille, tout en gardant une qualité sonore voulue.The representative value of the source signal can also be determined after processing on the source signal, for example by reducing the frequency resolution of the energy spectrum or by adapting the quantification of the representative values to the sensitivity of the human ear. It is then possible to obtain representative values of the source signals which are less bulky in terms of size, while maintaining a desired sound quality.

On considère dans la suite de la description que la valeur représentative des signaux sources est la valeur en puissance normalisée (ou énergie) quantifiée Φi(k,m).It is considered in the remainder of the description that the representative value of the source signals is the quantized normalized power value (or energy) Φ i (k, m).

Les valeurs représentatives des signaux sources Φi(k,m) sont transmises au dispositif de séparation ou décodeur. Elles peuvent l'être par un canal dédié (associé aux canaux stéréo pour former le signal mixé), ou par incorporation dans le signal mixé, par exemple par tatouage ou par utilisation de bits non-utilisés du signal mixé. Dans ce dernier cas, le dispositif de séparation peut comprendre un moyen d'extraction des valeurs représentatives, recevant en entrée le signal mixé et fournissant en sortie, les valeurs représentatives des signaux sources.The representative values of the source signals Φ i (k, m) are transmitted to the separation device or decoder. They can be by a dedicated channel (associated with the stereo channels to form the mixed signal), or by incorporation in the mixed signal, for example tattooing or using unused bits of the mixed signal. In the latter case, the separation device may comprise means for extracting the representative values, receiving as input the mixed signal and outputting the representative values of the source signals.

De même, le dispositif de séparation peut également recevoir les répartitions des signaux sources dans chaque voie (ou canal) du signal mixé : a1 g, ...ap g, a1 d, ...ap d. Ces répartitions peuvent être transmises par un canal dédié (associé aux canaux stéréo pour former le signal mixé, ou indépendant des canaux stéréo), ou par incorporation dans le signal mixé, par exemple par tatouage ou par utilisation de bits non-utilisés du signal mixé. Dans ce dernier cas, le dispositif de séparation peut comprendre un moyen d'extraction des répartitions des signaux sources, recevant en entrée le signal mixé et fournissant en sortie, les répartitions des signaux sources. Le moyen d'extraction des valeurs représentatives et le moyen d'extraction des répartitions peuvent être un seul et même moyen.Similarly, the separation device can also receive the distributions of the source signals in each channel (or channel) of the mixed signal: a 1 g , ... a p g , a 1 d , ... a p d . These distributions can be transmitted by a dedicated channel (associated with the stereo channels to form the mixed signal, or independent of the stereo channels), or by incorporation into the mixed signal, for example by tattooing or by using unused bits of the mixed signal. . In the latter case, the separation device may comprise a means for extracting the distributions of the source signals, receiving as input the mixed signal and providing, as output, the distributions of the source signals. The means for extracting the representative values and the means for extracting the distributions can be one and the same means.

Alternativement, le dispositif de séparation peut comprendre un moyen de détermination des répartitions des signaux sources : un tel moyen de détermination peut recevoir en entrée le signal mixé et les valeurs représentatives Φi(k,m), et fournir en sortie la répartition dudit signal source ai g, ai d. Cela est possible notamment lorsque chaque canal du signal mixé comprend les valeurs représentatives d'un signal source pour ledit canal du signal mixé : autrement dit, les valeurs représentatives d'un signal source donné ne seront pas les mêmes pour chaque canal du signal mixé, la différence entre les valeurs représentatives d'un même signal source pour les différents canaux du signal mixé permettant de déterminer la répartition dudit signal source entre les différents canaux du signal mixé.Alternatively, the separation device may comprise means for determining the distributions of the source signals: such a determination means may receive as input the mixed signal and the representative values Φ i (k, m), and output the distribution of said signal source a i g , a i d . This is possible especially when each channel of the mixed signal comprises the values representative of a source signal for said channel of the mixed signal: in other words, the values representative of a given source signal will not be the same for each channel of the mixed signal. the difference between the representative values of the same source signal for the different channels of the mixed signal making it possible to determine the distribution of said source signal between the different channels of the mixed signal.

Sur la figure 1, on a représenté schématiquement un mode de réalisation d'un dispositif de séparation 1 de signaux sources particuliers contenus dans un signal mixé smix. Le dispositif de séparation 1 reçoit en entrée les canaux stéréo smix g et smix d du signal mixé smix, et délivre des signaux sources particuliers séparés au moins partiellement s'i, avec 1 variant de 1 à p. Le dispositif de séparation 1 a pour but de délivrer, au moins partiellement, plusieurs signaux sources particuliers contenus dans le signal mixé smix en utilisant les valeurs représentatives desdits signaux sources particuliers Φi(k,m).On the figure 1 schematically an embodiment of a separation device 1 of particular source signals contained in a mixed signal s mix . The separation device 1 receives as input the stereo channels s mix g and s mix d of the mixed signal s mix , and delivers separate particular source signals at least partially s' i , with 1 varying from 1 to p. The purpose of the separation device 1 is to deliver, at least partially, a plurality of particular source signals contained in the mixed signal mix by using the representative values of said particular source signals Φ i (k, m).

On considère pour la présente description, que le dispositif de séparation 1 reçoit en entrée, les canaux du signal mixé numérique audio smix g(t) et smix d(t), dans lesquels sont insérées, par exemple par tatouage, les valeurs représentatives des signaux sources particuliers Φi(k,m), et éventuellement les répartitions a1 g, ..., ap g, a1 d, ..., ap d des signaux sources particuliers entre les deux canaux du signal mixé numérique audio smix d(t) et smix g(t).For the purposes of the present description, it is considered that the separation device 1 receives, as input, the channels of the mixed digital audio signal s mix g (t) and s mix d (t), in which are inserted, for example by tattooing, the values representative of the particular source signals Φ i (k, m), and possibly the distributions at 1 g , ..., a p g , a 1 d , ..., a p d of particular source signals between the two signal channels Mixed digital audio s mix d (t) and s mix g (t).

Le dispositif de séparation 1 comprend un moyen de transformation 2, un moyen d'extraction 3, un moyen de traitement 4, un moyen de filtrage 5, et un moyen de transformation inverse 6.The separating device 1 comprises a transformation means 2, an extraction means 3, a processing means 4, a filtering means 5, and an inverse transformation means 6.

Le moyen de transformation 2 reçoit en entrée les canaux du signal mixé numérique audio smix g(t) et smix d(t) et délivre, en sortie, la transformée des canaux du signal mixé dans le plan temps-fréquence Smix g(k,m) et Smix d(k,m).The transformation means 2 receives as input the channels of the mixed digital audio signal s mix g (t) and s mix d (t) and delivers, at the output, the transform of the channels of the mixed signal in the time-frequency plane S mix g (k, m) and S mix d (k, m).

Le moyen d'extraction 3 reçoit en entrée la transformée des canaux du signal mixé dans le plan temps-fréquence Smix d(k,m) et Smix g(k,m), et délivre les valeurs représentative Φi(k,m) des signaux sources particuliers contenues dans le signal mixé. Le moyen d'extraction 3 peut, le cas échéant, délivrer également les répartitions ai g, ..., ap g, a1 d, ..., ap d des signaux sources particuliers entre les deux canaux du signal mixé numérique audio smix d(t) et smix g(t), lorsque celles-ci sont insérées dans le signal mixé. Le moyen d'extraction 3 permet ainsi d'extraire du signal mixé les valeurs représentatives qui y ont été ajoutées a posteriori, par exemple par tatouage, et des isoler du signal mixé. Les valeurs représentatives Φi(k,m) sont alors transmises au moyen de traitement 4 et, le cas échéant, les répartitions a1 g, ..., ap g, a1 d, ..., ap d sont transmises au moyen de filtrage 5.The extraction means 3 receives as input the transformation of the channels of the mixed signal in the time-frequency plane S mix d (k, m) and S mix g (k, m), and delivers the representative values Φ i (k, m) particular source signals contained in the mixed signal. The extraction means 3 can, if necessary, also deliver the distributions a i g , ..., a p g , a 1 d , ..., a p d of particular source signals between the two channels of the mixed signal. digital audio s mix d (t) and s mix g (t), when these are inserted in the mixed signal. The extraction means 3 thus makes it possible to extract from the mixed signal the representative values that have been added thereto a posteriori, for example by tattooing, and to isolate the mixed signal. The representative values Φ i (k, m) are then transmitted to the processing means 4 and, where appropriate, the distributions at 1 g , ..., a p g , a 1 d , ..., a p d are transmitted by means of filtering 5.

Il convient de noter que le moyen d'extraction 3 peut alternativement recevoir directement en entrée, les canaux du signal mixé smix d(t) et smix g(t).It should be noted that the extraction means 3 may alternatively receive directly at the input, the channels of the mixed signal s mix d (t) and s mix g (t).

Le moyen de traitement 4 permet de traiter les valeurs représentatives Φi(k,m) reçues par le moyen d'extraction 3, afin de déterminer une estimation de la puissance normalisée ϕ'i(k,m) des signaux sources à séparer, dans le plan temps-fréquence. Les estimations de la puissance normalisée ϕ'i(k,m) des signaux sources à séparer sont ensuite transmises au moyen de filtrage 5.The processing means 4 make it possible to process the representative values Φ i (k, m) received by the extraction means 3, in order to determine an estimate of the normalized power φ ' i (k, m) of the source signals to be separated, in the time-frequency plane. The estimates of the normalized power φ ' i (k, m) of the source signals to be separated are then transmitted to the filtering means 5.

La transformée des canaux du signal mixé dans le plan temps-fréquence Smix d(k,m) et Smix g(k,m) fournie par le moyen de transformation 2, les estimations des puissances normalisées des signaux sources particuliers ϕ'i(k,m), et les répartitions a1 g, ..., ap g, a1 d, ..., ap d des signaux sources particuliers entre les deux canaux du signal mixé numérique audio smix d(t) et smix g(t), sont ainsi fournies au moyen de filtrage 5.The transform of the channels of the mixed signal in the time-frequency plane S mix d (k, m) and S mix g (k, m) provided by the transformation means 2, the estimates of the normalized powers of the particular source signals φ ' i (k, m), and the distributions at 1 g , ..., a p g , a 1 d , ..., a p d of particular source signals between the two channels of the mixed digital audio signal s mix d (t ) and s mix g (t), are thus provided to the filtering means 5.

Le moyen de filtrage 5 permet d'obtenir une estimation S'i(k,m) de chaque signal source particulier, grâce à un filtrage spatial. Le moyen de filtrage 5 permet, dans le plan temps-fréquence, d'isoler le signal source particulier, grâce à un filtrage spatial à variance minimale sous contrainte linéaire. Plus particulièrement, le moyen de filtrage 5 se base sur la répartition dudit signal source particulier entre les deux canaux du signal mixé pour isoler le signal source particulier : il s'agit donc d'un filtrage spatial (en anglais : « beamforming »). Par ailleurs, afin d'améliorer le filtrage et l'estimation du signal source obtenue, le filtre spatial utilise la puissance normalisée du signal source particulier à séparer comme contrainte linéaire, afin d'obtenir une estimation plus proche du signal source d'origine.The filtering means 5 makes it possible to obtain an estimation S ' i (k, m) of each particular source signal, by means of spatial filtering. The filtering means 5 makes it possible, in the time-frequency plan, to isolate the particular source signal, by means of spatial filtering with minimum variance under linear stress. More particularly, the filtering means 5 is based on the distribution of said particular source signal between the two channels of the mixed signal to isolate the particular source signal: it is therefore a spatial filtering (in English: "beamforming"). On the other hand, in order to improve the filtering and estimation of the source signal obtained, the spatial filter uses the normalized power of the particular source signal to be separated as a linear constraint, in order to obtain an estimate closer to the original source signal.

Plus précisément, dans le plan temps-fréquence, on a : S mix k m = A . S k m

Figure imgb0006
avec : S mix k m = S mix g k m , S mix d k m T
Figure imgb0007
et S k m = S 1 k m , , S p k m T
Figure imgb0008
More precisely, in the time-frequency plane, we have: S mix k m = AT . S k m
Figure imgb0006
with: S mix k m = S mix boy Wut k m , S mix d k m T
Figure imgb0007
and S k m = S 1 k m , ... , S p k m T
Figure imgb0008

Chaque signal mixé Smix d(k,m) et Smix g(k,m) est alors décomposé en estimations de signaux sources particuliers S'1(k,m),...,S'p(k,m) en utilisant un filtrage spatial linéaire : S i k m = w ik g . S mix g k m + w ik d . S mix d k m = W ik T . S mix k m

Figure imgb0009
avec: W ik = w ik g w ik d T et S i k m = S i g k m , S i d k m T
Figure imgb0010
Each mixed signal S mix d (k, m) and S mix g (k, m) is then decomposed into estimates of particular source signals S ' 1 (k, m), ..., S' p (k, m) using linear spatial filtering: S ' i k m = w ik boy Wut . S mix boy Wut k m + w ik d . S mix d k m = W ik T . S mix k m
Figure imgb0009
with: W ik = w ik boy Wut w ik d T and S ' i k m = S ' i boy Wut k m , S ' i d k m T
Figure imgb0010

Wik est le filtre spatial (ou « beamformer ») permettant d'obtenir l'estimation S'i(k,m) du ième signal source dans la sous-bande k à partir du signal mixé Smix(k,m).W ik is the spatial filter (or "beamformer") for obtaining the estimate S ' i (k, m) of the i th source signal in the subband k from the mixed signal S mix (k, m) .

Dans le cas d'un filtre spatial à variance minimum sous contrainte linéaire, on considère comme bruit la somme de tous les signaux sources interférant, à l'exception de celui à filtrer. Ainsi, le signal mixé peut être réécrit de la manière suivante : S mix k m = a i . S i k m + r k m

Figure imgb0011
où r(k,m) est la somme des autres signaux sources.In the case of a minimum variance spatial filter under linear stress, the sum of all the interfering source signals, with the exception of the one to be filtered, is considered as noise. Thus, the mixed signal can be rewritten as follows: S mix k m = at i . S i k m + r k m
Figure imgb0011
where r (k, m) is the sum of the other source signals.

L'estimation S'i(k,m) est obtenue en minimisant la puissance moyenne du bruit ou, de manière équivalente, la puissance moyenne de sortie du filtre spatial, selon la direction du signal source à séparer : P θ i = W ik T m . R Smix k m . W ik m

Figure imgb0012
où RSmix est la matrice de corrélation spatiale des deux canaux Smix d(k,m) et Smix g(k,m) du signal mixé Smix(k, m).The estimate S ' i (k, m) is obtained by minimizing the average power of the noise or, equivalently, the mean output power of the spatial filter, according to the direction of the source signal to be separated: P θ i = W ik T m . R ' smix k m . W ik m
Figure imgb0012
where R Smix is the spatial correlation matrix of the two S channels mix d (k, m) and S mix g (k, m) of the mixed signal S mix (k, m).

La solution est donnée par : W ik m = R Smix 1 k m . a i . ϕ i k m a i T . R 1 Smix k m . a i

Figure imgb0013
The solution is given by: W ik m = R ' smix - 1 k m . at i . φ ' i k m at i T . R ' - 1 smix k m . at i
Figure imgb0013

On obtient donc : S i k m = ϕ i k m a i T . R Smix 1 k m . a i . a i T . R 1 Smix k m . S mix k m

Figure imgb0014
avec : R'Smix -1(k, m) = ∑ ϕ'i(k,m).ai.ai T.We thus obtain: S ' i k m = φ ' i k m at i T . R ' smix - 1 k m . at i . at i T . R ' - 1 smix k m . S mix k m
Figure imgb0014
with: R ' Smix -1 (k, m) = Σ φ' i (k, m) .a i .a i T.

Une fois appliqué au signal mixé Smix(k, m), le filtre obtenu permet de réduire la contribution du spectre en puissance des autres signaux. Par ailleurs, grâce à la contrainte linéaire, la puissance du signal source estimé correspond à la puissance du signal source initial pour les différents points du plan temps-fréquence (ce qui peut être vérifié en réinjectant la solution Wik dans l'équation définissant P(θi)). Ainsi, le moyen de filtrage 5 permet de décorréler spatialement the ième signal source du reste du signal mixé, tout en ajustant l'amplitude dudit signal décorrélé au niveau souhaité.When applied to the mixed signal S mix (k, m), the resulting filter makes it possible to reduce the contribution of the power spectrum of the other signals. Moreover, thanks to the linear constraint, the power of the estimated source signal corresponds to the power of the initial source signal for the different points of the time-frequency plane (which can be verified by re-injecting the solution W ik into the equation defining P (θ i )). Thus, the filtering means 5 makes it possible to spatially decorrelate the i th source signal from the rest of the mixed signal, while adjusting the amplitude of said decorrelated signal to the desired level.

On peut également noter que, lorsque la quantité d'informations tatouées dans le signal mixé est trop importante pour que le bruit du tatouage puisse être négligé, il est également possible d'ajuster les composants des signaux sources estimés comme suit : S i k m = S i k m . √ϕ i k m / S i k m

Figure imgb0015
It may also be noted that, when the amount of information tattooed in the mixed signal is too large for the tattoo noise to be neglected, it is also possible to adjust the components of the estimated source signals as follows: S ' i k m = S ' i k m . √φ ' i k m / S ' i k m
Figure imgb0015

Les transformées des estimations des signaux sources particuliers séparés sont alors transmises au moyen de transformation inverse 6. Le moyen 6 permet de transformer les transformées des estimations des signaux sources séparés en signaux temporels s'1(t), ..., s'p(t) correspondant, au moins partiellement, aux signaux sources s1(t), ..., sp(t).The transforms of the estimates of the separate particular source signals are then transmitted to the inverse transformation means 6. The means 6 makes it possible to transform the transforms of the estimates of the separate source signals into time signals s' 1 (t), ..., s'p (t) corresponding, at least partially, to the source signals s 1 (t), ..., s p (t).

Sur la figure 2, on a représenté un organigramme représentant les différentes étapes du procédé de séparation selon l'invention.On the figure 2 there is shown a flow chart showing the different steps of the separation process according to the invention.

Le procédé comprend une première étape 7 au cours de laquelle le signal mixé est transformé dans un plan temps-fréquence. Puis, dans une étape 8, on effectue une extraction des informations tatouées dans le signal mixé, notamment les valeurs représentatives et les répartitions des signaux sources entre au moins deux canaux du signal mixé. Lors d'une étape 9, on détermine les puissances normalisées des signaux sources à séparer, puis on effectue, lors de l'étape 10, un filtrage spatial à variance minimum sous contrainte linéaire, la contrainte étant la puissance normalisée du signal source à séparer. Enfin on effectue, dans une étape 11, une transformation inverse des transformées des signaux sources particuliers séparés, de manière à obtenir, au moins partiellement, les signaux sources particuliers.The method includes a first step 7 in which the mixed signal is transformed in a time-frequency plane. Then, in a step 8, the information tattooed in the mixed signal is extracted, in particular the representative values and the distributions of the source signals between at least two channels of the mixed signal. During a step 9, the normalized powers of the source signals to be separated are determined, and then, in step 10, a minimum variance spatial filtering under linear stress is performed, the constraint being the normalized power of the source signal to be separated. . Finally, in a step 11, an inverse transformation of the transforms of the separate particular source signals is performed so as to obtain, at least partially, the particular source signals.

Dans le cas de signaux audio, il est ainsi possible d'effectuer en sortie du système de séparation de l'invention un certain nombre de contrôles majeurs en écoute audio (volume, tonalité, effets) de façon indépendante sur les différents éléments de la scène sonore (instruments et voix obtenus par le dispositif de séparation).In the case of audio signals, it is thus possible to perform at the output of the separation system of the invention a number of major controls in audio listening (volume, tone, effects) so independent on the different elements of the sound stage (instruments and voices obtained by the separation device).

Claims (8)

  1. A method of separating, at least in part, one or more particular digital audio source signals (si) contained in a mixed multichannel digital audio signal (smix), the mixed signal being obtained by mixing a plurality of digital audio source signals (s1, ..., sp) and including representative values (Φi) of the particular source signal(s), the method comprising:
    · determining (9) the modulus of the amplitude or the normalized power (ϕ'i) of the particular source signal(s) from the representative values (Φi) in the time-frequency plane of said particular source signal(s) contained in the mixed signal;
    the method being characterized by then performing (10) linearly constrained minimum variance spatial filtering in order to obtain, at least in part, each particular source signal (s'i), said filtering being based on the distribution (ai) of said particular source signal between at least two channels of the mixed signal, and the modulus of the amplitude or the normalized power (ϕ'i) of said particular source signal being used as a linear constraint of the filter.
  2. A method according to claim 1, wherein the mixed signal includes representative values (Φi g, Φi d) of the particular source signal(s) for at least two channels of the mixed signal (smix g, smix d), and wherein, prior to performing spatial filtering, the mixed signal and said representative values of the particular signals are used to determine the distribution (ai g, ai d) of each particular source signal (si) between said at least two channels of the mixed signal (smix g, smix d).
  3. A method according to claim 1, wherein the distribution (ai) of the particular source signal(s) between at least two channels of said mixed signal is received as input, e.g. in the mixed signal.
  4. A method according to any preceding claim, wherein determining the modulus of the amplitude or the normalized power (ϕ'i) of the particular source signal(s) comprises extracting (8) representative values (Φi) of the particular source signals that have been inserted into the mixed signal, e.g. by watermarking.
  5. A method according to any preceding claim, wherein the modulus of the amplitude or the normalized power (ϕ'i) of said particular source signal are spectro-temporal values.
  6. A device (1) for separating, at least in part, one or more particular digital audio source signals (si) contained in a multichannel mixed digital audio signal (smix), the mixed signal (smix) being obtained by mixing a plurality of digital audio source signals (s1, ..., sp) and including representative values (Φi) of the particular source signal(s), the device comprising:
    · determination means (4) for determining the modulus of the amplitude or the normalized power (ϕ'i) of the particular source signal(s) from the representative values (Φi) in the time-frequency plane of said particular source signal(s) contained in the mixed signal;
    the device being characterized by a linearly constrained minimum variance spatial filter (5) adapted to isolate, at least in part, each particular source signal (s'i) from the mixed signal (smix), said filter being based on the distribution (ai) of said particular source signal between at least two channels of the mixed signal (smix g, smix d), and the modulus of the amplitude or the normalized power (ϕ'i) of said particular source signal being used as a linear constraint.
  7. A device according to claim 6, wherein the mixed signal includes representative values (Φi) of the particular source signal(s) for at least two channels of the mixed signal, the device including determination means for determining the distribution (ai) of each particular source signal between said at least two channels of the mixed signal from the mixed signal and from said representative values (Φi) of the particular source signals.
  8. A device according to claim 6 or claim 7, also including extractor means (3) for extracting the representative values (Φi) of the particular source signal(s) that have been inserted in the mixed signal, e.g. by watermarking.
EP13770877.2A 2012-09-27 2013-09-25 Method and device for separating signals by minimum variance spatial filtering under linear constraint Not-in-force EP2901447B1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
FR1259115A FR2996043B1 (en) 2012-09-27 2012-09-27 METHOD AND DEVICE FOR SEPARATING SIGNALS BY SPATIAL FILTRATION WITH MINIMUM VARIANCE UNDER LINEAR CONSTRAINTS
PCT/EP2013/069937 WO2014048970A1 (en) 2012-09-27 2013-09-25 Method and device for separating signals by minimum variance spatial filtering under linear constraint

Publications (2)

Publication Number Publication Date
EP2901447A1 EP2901447A1 (en) 2015-08-05
EP2901447B1 true EP2901447B1 (en) 2016-12-21

Family

ID=47505065

Family Applications (1)

Application Number Title Priority Date Filing Date
EP13770877.2A Not-in-force EP2901447B1 (en) 2012-09-27 2013-09-25 Method and device for separating signals by minimum variance spatial filtering under linear constraint

Country Status (5)

Country Link
US (1) US9437199B2 (en)
EP (1) EP2901447B1 (en)
JP (1) JP6129321B2 (en)
FR (1) FR2996043B1 (en)
WO (1) WO2014048970A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110780302A (en) * 2019-11-01 2020-02-11 天津大学 Echo signal generation method based on continuous sound beam synthetic aperture

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
SE521024C2 (en) * 1999-03-08 2003-09-23 Ericsson Telefon Ab L M Method and apparatus for separating a mixture of source signals
US6321200B1 (en) * 1999-07-02 2001-11-20 Mitsubish Electric Research Laboratories, Inc Method for extracting features from a mixture of signals
CN1830026B (en) * 2001-01-30 2011-06-15 汤姆森特许公司 Geometric source preparation signal processing technique
JP2003270034A (en) * 2002-03-15 2003-09-25 Nippon Telegr & Teleph Corp <Ntt> Sound information analyzing method, apparatus, program, and recording medium
ATE413769T1 (en) * 2004-09-03 2008-11-15 Harman Becker Automotive Sys VOICE SIGNAL PROCESSING FOR THE JOINT ADAPTIVE REDUCTION OF NOISE AND ACOUSTIC ECHOS
JP4594681B2 (en) * 2004-09-08 2010-12-08 ソニー株式会社 Audio signal processing apparatus and audio signal processing method
US20070135952A1 (en) * 2005-12-06 2007-06-14 Dts, Inc. Audio channel extraction using inter-channel amplitude spectra
JP5605575B2 (en) * 2009-02-13 2014-10-15 日本電気株式会社 Multi-channel acoustic signal processing method, system and program thereof
US9100734B2 (en) * 2010-10-22 2015-08-04 Qualcomm Incorporated Systems, methods, apparatus, and computer-readable media for far-field multi-source tracking and separation
CN102903368B (en) * 2011-07-29 2017-04-12 杜比实验室特许公司 Method and equipment for separating convoluted blind sources
GB2495128B (en) * 2011-09-30 2018-04-04 Skype Processing signals

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
None *

Also Published As

Publication number Publication date
FR2996043A1 (en) 2014-03-28
FR2996043B1 (en) 2014-10-24
JP2015530619A (en) 2015-10-15
US9437199B2 (en) 2016-09-06
JP6129321B2 (en) 2017-05-17
WO2014048970A1 (en) 2014-04-03
US20150243290A1 (en) 2015-08-27
EP2901447A1 (en) 2015-08-05

Similar Documents

Publication Publication Date Title
Gabbay et al. Seeing through noise: Visually driven speaker separation and enhancement
EP2374123B1 (en) Improved encoding of multichannel digital audio signals
EP2898707B1 (en) Optimized calibration of a multi-loudspeaker sound restitution system
EP1992198B1 (en) Optimization of binaural sound spatialization based on multichannel encoding
EP2005420B1 (en) Device and method for encoding by principal component analysis a multichannel audio signal
EP3040989B1 (en) Improved method of separation and computer program product
WO2005106852A1 (en) Improved voice signal conversion method and system
EP2415047A1 (en) Method and device for classifying background noise contained in an audio signal
EP2419900A1 (en) Method and device for the objective evaluation of the voice quality of a speech signal taking into account the classification of the background noise contained in the signal
EP2417597A1 (en) Method and device for forming a mixed signal, method and device for separating signals, and corresponding signal
EP2901447B1 (en) Method and device for separating signals by minimum variance spatial filtering under linear constraint
EP3025342B1 (en) Method for suppressing the late reverberation of an audible signal
WO2018115666A1 (en) Processing in sub-bands of an actual ambisonic content for improved decoding
WO2013053631A1 (en) Method and device for separating signals by iterative spatial filtering
WO2020049263A1 (en) Device for speech enhancement by implementation of a neural network in the time domain
FR2966277A1 (en) METHOD AND DEVICE FOR FORMING AUDIO DIGITAL MIXED SIGNAL, SIGNAL SEPARATION METHOD AND DEVICE, AND CORRESPONDING SIGNAL
EP3384688B1 (en) Successive decompositions of audio filters
EP1605440B1 (en) Method for signal source separation from a mixture signal
WO2022207994A1 (en) Estimating an optimized mask for processing acquired sound data
FR2980620A1 (en) Method for processing decoded audio frequency signal, e.g. coded voice signal including music, involves performing spectral attenuation of residue, and combining residue and attenuated signal from spectrum of tonal components
WO2012085453A1 (en) Processing sound data for source separation

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20150311

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME

DAX Request for extension of the european patent (deleted)
REG Reference to a national code

Ref country code: DE

Ref legal event code: R079

Ref document number: 602013015655

Country of ref document: DE

Free format text: PREVIOUS MAIN CLASS: G10L0019008000

Ipc: G10L0021030800

GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

RIC1 Information provided on ipc code assigned before grant

Ipc: G10L 19/008 20130101AFI20160616BHEP

Ipc: G10L 19/018 20130101ALI20160616BHEP

Ipc: G10L 21/028 20130101ALI20160616BHEP

RIC1 Information provided on ipc code assigned before grant

Ipc: G10L 21/0308 20130101AFI20160629BHEP

Ipc: G10L 19/018 20130101ALI20160629BHEP

Ipc: G10L 19/008 20130101ALI20160629BHEP

INTG Intention to grant announced

Effective date: 20160718

RIN1 Information on inventor provided before grant (corrected)

Inventor name: GORLOW, STANISLAW

Inventor name: MARCHAND, SYLVAIN

GRAS Grant fee paid

Free format text: ORIGINAL CODE: EPIDOSNIGR3

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

REG Reference to a national code

Ref country code: GB

Ref legal event code: FG4D

Free format text: NOT ENGLISH

REG Reference to a national code

Ref country code: CH

Ref legal event code: EP

REG Reference to a national code

Ref country code: IE

Ref legal event code: FG4D

Free format text: LANGUAGE OF EP DOCUMENT: FRENCH

REG Reference to a national code

Ref country code: AT

Ref legal event code: REF

Ref document number: 856102

Country of ref document: AT

Kind code of ref document: T

Effective date: 20170115

REG Reference to a national code

Ref country code: DE

Ref legal event code: R096

Ref document number: 602013015655

Country of ref document: DE

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: LV

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20161221

REG Reference to a national code

Ref country code: LT

Ref legal event code: MG4D

REG Reference to a national code

Ref country code: NL

Ref legal event code: MP

Effective date: 20161221

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: SE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20161221

Ref country code: GR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170322

Ref country code: LT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20161221

Ref country code: NO

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170321

REG Reference to a national code

Ref country code: AT

Ref legal event code: MK05

Ref document number: 856102

Country of ref document: AT

Kind code of ref document: T

Effective date: 20161221

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: RS

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20161221

Ref country code: HR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20161221

Ref country code: FI

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20161221

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: NL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20161221

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: SK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20161221

Ref country code: EE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20161221

Ref country code: CZ

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20161221

Ref country code: IS

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170421

Ref country code: RO

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20161221

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: SM

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20161221

Ref country code: PT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170421

Ref country code: PL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20161221

Ref country code: ES

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20161221

Ref country code: AT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20161221

Ref country code: BG

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170321

Ref country code: IT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20161221

REG Reference to a national code

Ref country code: DE

Ref legal event code: R097

Ref document number: 602013015655

Country of ref document: DE

PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

26N No opposition filed

Effective date: 20170922

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: DK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20161221

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: SI

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20161221

REG Reference to a national code

Ref country code: DE

Ref legal event code: R119

Ref document number: 602013015655

Country of ref document: DE

REG Reference to a national code

Ref country code: CH

Ref legal event code: PL

GBPC Gb: european patent ceased through non-payment of renewal fee

Effective date: 20170925

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: MC

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20161221

REG Reference to a national code

Ref country code: IE

Ref legal event code: MM4A

REG Reference to a national code

Ref country code: BE

Ref legal event code: MM

Effective date: 20170930

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: LU

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20170925

REG Reference to a national code

Ref country code: FR

Ref legal event code: ST

Effective date: 20180531

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: GB

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20170925

Ref country code: CH

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20170930

Ref country code: DE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20180404

Ref country code: IE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20170925

Ref country code: LI

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20170930

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: FR

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20171002

Ref country code: BE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20170930

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: MT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20161221

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: HU

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT; INVALID AB INITIO

Effective date: 20130925

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: CY

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20161221

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: MK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20161221

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: TR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20161221

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: AL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20161221