EP3635718A1 - Verarbeitung von klangdaten zur trennung von klangquellen in einem mehrkanalsignal - Google Patents

Verarbeitung von klangdaten zur trennung von klangquellen in einem mehrkanalsignal

Info

Publication number
EP3635718A1
EP3635718A1 EP18737650.4A EP18737650A EP3635718A1 EP 3635718 A1 EP3635718 A1 EP 3635718A1 EP 18737650 A EP18737650 A EP 18737650A EP 3635718 A1 EP3635718 A1 EP 3635718A1
Authority
EP
European Patent Office
Prior art keywords
components
descriptors
sources
sound
direct
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
EP18737650.4A
Other languages
English (en)
French (fr)
Other versions
EP3635718B1 (de
Inventor
Mathieu BAQU
Alexandre Guerin
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Orange SA
Original Assignee
Orange SA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Orange SA filed Critical Orange SA
Publication of EP3635718A1 publication Critical patent/EP3635718A1/de
Application granted granted Critical
Publication of EP3635718B1 publication Critical patent/EP3635718B1/de
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • G10L21/0308Voice signal separating characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/84Detection of presence or absence of voice signals for discriminating voice from noise
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R5/00Stereophonic arrangements
    • H04R5/02Spatial or constructional arrangements of loudspeakers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming

Definitions

  • the present invention relates to the field of audio or acoustic signal processing and more particularly to the processing of real multichannel sound contents to separate sound sources.
  • a blind separation of the sources consists, from a number M of observations from distributed sensors in this space E r to count and extract the number N of sources.
  • each observation is obtained using a sensor that records the signal up to a point in the space where the sensor is located.
  • the recorded signal then results from the mixing and propagation in the space E of the signals 5 / and is therefore affected by different disturbances specific to the medium traversed, such as for example noise, reverberation, interference, etc.
  • x is the vector of the M registered channels, s the vector of N sources and A a matrix called “mixing matrix" of dimension MxN the contributions of each source to each
  • matrix A can take different forms.
  • A is a simple matrix of gains.
  • the matrix A becomes a filter matrix.
  • we usually express the relation in the frequent domain x (f) As (f), where A is expressed as a matrix of complex coefficients,
  • the analysis ie, the identification of the number of sources and their positions
  • the decomposition of the scene into objects ie the sources
  • ACI independent component analysis algorithm
  • the preliminary step of estimating the dimension of the problem ie the estimation of the size of the separation matrix, ie the number of sources N, is conventionally done by calculating the rank of the observation covariance matrix, which is , in this case anechoic, equal to
  • An example of beamforming to extract three sources positioned at respectively 0 °, 90 ° and -120 ° of azimuth is illustrated in FIG. 1. Each of the directivities formed corresponds to the extraction of one of the sources of sys-
  • the total acoustic field can be modeled as the sum of the direct field of the sources of interest (represented in 1 in Figure 2), the first reflections (secondary sources, represented in 2 in Figure 2) and a diffuse field (represented in 3 in Figure 2).
  • the covariance matrix of the observations is then of full rank, regardless of the actual number of active sources in the mixture: this means that one can no longer use the rank of Co to estimate the number of sources.
  • the separation matrix B of size MxM is obtained, generating at output M sources in the place of the desired N, the last MN components essentially containing the reverberated field, by matricial topatlon:
  • each additional component induces constraints on the directivities formed and generally degrades the directivity factor with the consequence of raising the level of reverberation in the extracted signals.
  • the source / active is estimated as follows:
  • a representation in space of all couples is performed in the form of a histogram, the clustering is then performed on the maximum likelihood histogram, a function of the position of the zone and the assumed position of the associated source, assuming a Gaussian distribution of the estimated positions of the each area around the actual position of the sources.
  • the parsimony hypothesis of the sources in the time-frequency domain is often faulted, which constitutes a significant limitation of these approaches for the enumeration of sources, because the directions of arrival pointed for each zone then result from a combination of contributions from multiple sources and clustering is no longer working properly.
  • the presence of reverberation can on the one hand degrade the location of sources and on the other hand generate an over-estimation of the number of real sources when initial reflections reach a level sufficient energy to be perceived as secondary sources.
  • the present invention improves the situation.
  • the method is such that it comprises the following steps:
  • classification of the components of the set of M components according to two classes of components, a first class of N so-called direct components corresponding to the N direct sound sources and a second class of MN components referred to as reverberated, by a calculation of probability of belonging. to one of the two classes, a function of the sets of first and second descriptors.
  • the set of first bivariate descriptors makes it possible to determine, on the one hand, whether the components of a pair of the set of components obtained following the source separation step belong to the same class of components. or a different class while the set of second univariate descriptors allows to define for a component, if it has more probability to belong to such or such class. This makes it possible to determine the probability of membership of a component to one of the two classes and thus to determine the N direct sound sources corresponding to the N components classified in the first class.
  • calculating a bivariate descriptor comprises calculating a coherence score between two components. This descriptor calculation makes it possible to know whether a pair of components corresponds to two direct components (2 sources) or if at least one of the components comes from a reverberant effect.
  • determining a delay between the two components of the pair comprises determining a delay between the two components of the pair. This determination of the delay and the sign associated with this delay makes it possible to determine, for a pair of components, which component corresponds more probably to the direct signal and which component more probably corresponds to the reverberated signal.
  • the delay between two components is determined by taking into account the delay maximizing an inter-correlation function between the two components of the pair.
  • This method of obtaining the delay provides a determination of a reliable bi-varied descriptor.
  • the determination of the delay between two components of a pair is associated with an indicator of reliability of the sign of the delay, a function of the coherence between the components of the pair.
  • the determination of the delay between two components of a pair is associated with a reliability indicator of the sign of the retord, a function of the ratio of the maximum of an inter-correlation function for delays of opposite sign.
  • the calculation of a unvaried descriptor is a function of a mapping between mixing coefficients of a mixture matrix estimated from the source separation step and the characteristics of encoding a source of the type plane wave. This descriptor calculation allows for a single component, to estimate the probability that the component is direct or reverberated.
  • the classification of the components of the set of M components takes place by taking into account of the M components, and by calculating the
  • the most likely combination is calculated by determining a maximum of the likelihood values expressed as the product of the conditional probabilities associated with the descriptors / for the possible M classification combinations. components.
  • a step of pre-selecting the possible combinations is performed based on the only unl-varlated descriptors before the step of calculating the most probable combination.
  • a component pre-selection step is performed based on the only unl-varlated descriptors before the step of calculating the bivariate descriptors.
  • the multichannel signal is an ambisonic signal.
  • the invention also relates to a sound data processing device implemented to perform a separation processing of N sound sources of a multichannel sound signal picked up by a plurality of sensors in real environment.
  • the device is such that it comprises:
  • an input interface for receiving the signals picked up by a plurality of sensors, the multichannel sound signal
  • a processing circuit comprising a processor and able to implement:
  • a source separation processing module applied to the multichannel signal picked up to obtain a
  • a calculator able to compute a set of first descriptors called bhvariates, representative of statistical relations between the components of the pairs of the set of M components obtained and a set of second descriptors said uni-varied representative of encoding characteristics of the components of the set of M components obtained;
  • an output interface for delivering the classification information of the components.
  • the invention also applies to a computer program comprising code instructions for implementing the steps of the processing method as described above, when these instructions are executed by a processor and to a storage medium, readable by a processor, on which is recorded a computer program comprising code instructions for performing the steps of the processing method as described.
  • the device, program and storage medium have the same advantages as the method described above, which they implement. Other characteristics and advantages of the invention will appear more clearly on reading the following description, given solely for
  • FIG. 1 illustrates a channel formation for extracting three sources according to a method of source separation of the state of the art as described above;
  • FIG. 2 illustrates an impulse response with room effect as previously described
  • FIG. 3 illustrates, in flowchart form, the main steps of a processing method according to one embodiment of the invention
  • FIG. 4 illustrates, as a function of frequency, coherence functions representing bi-varied descriptors between two components according to one embodiment of the invention, and according to different pairs of components;
  • FIG. 5 illustrates the probability densities of the average coherences representing the bivariate descriptors according to one embodiment of the invention and for different pairs of components and different numbers of sources;
  • FIG. 6 illustrates inter-correlation functions between two different class components according to one embodiment of the invention and according to the number of sources;
  • FIG. 7 illustrates the probability densities of a plane wave criterion as a function of the class of the component, of the ambisonic order and of the number of sources, for a particular embodiment of the invention
  • FIG. 8 illustrates a hardware representation of a processing device according to one embodiment of the invention, implementing a processing method according to one embodiment of the invention.
  • FIG. 9 illustrates an exemplary probability law calculation for a criterion of coherence between a direct component and a reverberated component according to one embodiment of the invention.
  • FIG. 3 illustrates the main steps of a sound data processing method for a separation of N sound sources from a muitican sound signal captured in a real medium in one embodiment of the invention.
  • the method implements a step E310 blind separation of sound sources (SAS).
  • SAS blind separation of sound sources
  • the step of blind separation of sources can be implemented, for example using an independent component analysis algorithm (or "ACI"), or a component analysis algorithm. main.
  • ACI independent component analysis algorithm
  • Ambisia consists of a projection of the acoustic field on a basis of spherical harmonic functions, to obtain a spatial representation of the sound stage.
  • the function is the harmonic
  • a real ambisonic encoding is done from a network of sensors, generally distributed over a sphere.
  • the captured signals are combined to synthesize an ambisonic content whose channels respect the directivity of spherical harmonics.
  • the basic principles of ambisonic encoding are described below.
  • Ambisonic formalism initially limited to the representation of spherical harmonic functions of order 1, was later extended to higher orders.
  • Ambisonic formalism with a larger number of components is commonly referred to as “Higher Order Ambisonlcs” (or “HOA” hereinafter),
  • a content of order m contains a total of (m + 1) 2 channels (4 channels at order 1, 9 channels at order 2, 16 channels at order 3, and so on).
  • ambient components is understood to mean the ambisonic signal in each ambisonic channel, with reference to the “vector components” in a vector base that would be formed by each spherical harmonic function. For example, we can count:
  • step E310 The blind separation of sources is therefore performed in step E310 as explained above.
  • the components obtained at the output of the source separation step can be classified according to two classes of components: a first class of direct components corresponding to the sources direct sound and a second class of so-called reverberated components corresponding to the reflections of the sources.
  • step E32Q a computation of descriptors of the M components (si, S2, ... s M ) resulting from the source separation step is implemented, descriptors which will make it possible to associate with each component extracted the corresponding class: direct component or reverberated component.
  • bi-variant descriptors that involve pairs of components and univariate descriptors calculated for a component 3 ⁇ 4.
  • a set of first bi-variate descriptors is calculated. These descriptors are representative of statistical relations between the components of the pairs of the set of M components obtained.
  • an average coherence between two components is calculated here.
  • This type of descriptor represents a statistical relationship between the components of a couple and provides an indication of the presence of at least one reverberated component in a pair of components.
  • each direct component consists mainly of the direct field of a source, comparable to a plane wave, plus a residual reverberation whose energy contribution is lower than that of the direct field. Since the sources are statistically independent by nature, there is therefore a weak correlation between the extracted direct components.
  • each reverberated component consists of early reflections, delayed and filtered versions of the direct field (s), and late reverberation, so that the reverberated components exhibit a significant correlation with the direct components, and usually a group delay identifiable in relation to the direct components.
  • the coherence function informs about the existence of a correlation
  • Consistency is ideally zero when are the direct fields from independent sources but it takes a high value when are two contributions from the same source; the direct field and a first reflection or two reflections. such a coherence function indicates
  • the interspectres and aulospectres can be calculated by segmenting the extracted components in K frames (adjacent or overlapped), by applying a short-term Fourier transform to each frame k of these K frames to produce the instantaneous spectra and by means of observations on K fields;
  • the descriptor used for a broadband signal is the average over all the frequencies of the coherence function between two components, namely:
  • the coherence value dy is less than 0.3 while in the second case d y reaches 0.7 in the presence of a single active source.
  • step E330 of FIG. 3 a probability calculation is deduced from the descriptor thus described.
  • the probability densities of FIGS. 5 and 7 described below, and more generally all the probability densities of the descriptors, are learned in a statistical manner on databases comprising various acoustic (reverberant / masts) and different acoustic conditions.
  • sources male / female voice, French / English languages .
  • the components are classified informally: to each source is associated the extracted component closest spatially, the remaining being classified as reverberated components.
  • To calculate the position of the component we use the first 4 coefficients of its mixing vector from matrix A (ie, order 1), which is the inverse of the separation matrix 8. Assuming that this vector follows the rule of encoding of a plane wave is:
  • arctan2 is the arctangent function which makes it possible to remove the ambiguity of sign of the arctangent function.
  • FIG. 9 shows an example of law calculation for the criterion of coherence between a direct component and a reverberated component: the lognormal law has been selected from among ten laws because it minimizes the Kullback-Leibler divergence.
  • FIG. 5 represents the distributions (probability density or pdf for "Probability density function") associated with the value of the average coherence between two components.
  • the coherence estimators are degraded, whether it is the direct / reverberated or reverberated / reverberated pairs (in the presence of a single source, the direct / direct pair does not exist).
  • This descriptor is therefore relevant for detecting whether a pair of extracted components corresponds to two direct components (2 true sources) or if at least one of the two components originates from the room effect.
  • step E320 another type of bi-varied descriptor is calculated in step E320. Either this descriptor is calculated in place of the coherence type descriptor described above, or in addition to it.
  • This descriptor will make it possible to determine, for a pair (direct / reverberated) which component is more likely the direct signal and which corresponds to the reverberated signal, based on the simple assumption that the first reflections are delayed and attenuated versions of the signal. direct.
  • Delay is defined as the delay that maximizes the intercorrelation function
  • the average coherence between the components makes it possible to evaluate the pertinence of the reflected-reverberation pair as seen previously. If it is strong, we can hope that the group delay will be a reliable descriptor.
  • Figure 6 illustrates the emerging character of the autocorrelation peak between a direct component and a reverberated component.
  • the maximum inter-correlation clearly emerges from the inter-correlation remainder, reliably indicating that one of the components is lagging behind the other. It emerges in particular with respect to the values of the autocorrelation function for signs opposite to that of
  • a second indicator of reliability of the sign of the delay called emergence is defined, by calculating the ratio between the absolute value of the intercorrelation to and that of the maximum correlation for r's of sign opposite to that of :
  • This ratio which we call emergence, is an ad hoc criterion whose relevance is verified in practice: it takes values close to 1 for independent signals, l.e. 2 direct components, and higher values for correlated signals as a direct component and a reverberated component. In the aforementioned case of curve (1) of FIG. 6, the emergence value is 4.
  • this descriptor is sensitive to noise, and in particular to the presence of several simultaneous sources, as illustrated in curve (2) of FIG. 6: in the presence of two sources, even if the maximum correlation still emerges, its relative value - 2.6 - is less because of the presence of an interfering source which reduces the correlation between the extracted components.
  • the reliability of the sign of the delay will be measured as a function of the value of the emergence, which will be weighted by the number of sources to be detected a priori.
  • a probability of belonging to a first class of direct components or a second class of reverberant components for a pair of components is calculated in step E330.
  • Sj identified as being in advance on% the probability that 3 ⁇ 4 either direct and reverberated by a two-dimensional law.
  • the sign of delay is a reliable indicator when both consistency and emergence have medium or high values.
  • a weak emergence or a weak coherence will make the couples direct / reverberated or reverberated / direct equiprobables.
  • step E320 a set of second unidimensional descriptors representative of encoding characteristics of the components of the set of M components obtained is also calculated.
  • the encoding of a source coming from a given direction is done with mixing coefficients depending, among other things, on the directivity of the sensors.
  • the source can be considered as point and where the wavelengths are large compared to the size of the antenna, one can consider the source as a plane wave. This assumption is generally true in the case of an ambisonic microphone that is small, provided that the source is sufficiently far from the microphone (in practice, one meter is enough).
  • the j th column of the estimated mixing matrix A obtained by inverting the separation matrix B, will contain the mixture of coefficients associated therewith. If this component is direct, that is to say that it corresponds to a single source, the mixing coefficients of the column Aj will tend towards the characteristics of the microphone encoding for a plane wave. In the case of a reverberated component, the sum of several reflections and a diffuse field, the estimated mixing coefficients will be more random and not encoding a single source
  • N3D is carried out according to the formula:
  • plane wave criterion which illustrates the conformity between the estimated mixing coefficients and the theoretical equation of an encoded plane wave alone:
  • the criterion c op is by definition equal to 1 in the case of a plane wave.
  • the plane wave criterion will remain very close to the value 1, conversely, in the case of a reverberated component, the multitude of contributions (first reflections and late) with energy levels
  • the distribution associated and calculated in E330 knows a certain variability, according in particular according to the level of noise present in the extracted components.
  • This noise consists mainly of residual reverberation and contributions from interfering sources that have not been perfectly canceled.
  • the probability laws (probability density) associated with this descriptor can be observed in FIG. 7, as a function of the number of active sources simultaneously (1 or 2) and of the ambisonic order of the content analyzed (orders 1 to 2). According to the initial hypothesis, the value of the plane wave criterion is concentrated around the value 1 for the direct components. For reverberated components, the distribution is more uniform, but with a slightly asymmetrical shape, because of the descriptor itself, which is asymmetric, with a 1 / x form.
  • the distance between the distributions of the two classes allows a fairly reliable discrimination between the components of the flat wave type and those more diffuse.
  • the descriptors calculated in step E320 and exposed id are based on both the extracted component statistics (average coherence and group delay) and on the estimated mixing matrix (plane wave criterion). These make it possible to determine conditional probabilities of belonging of a component to one of the two classes C d or C r .
  • the problem is ultimately to choose from a total of 2 M potential configurations assumed equiprobable.
  • the rule of the posterior maximum is applied: knowing the likelihood of the configuration, the configuration chosen will be the one with the maximum likelihood, ie:
  • the chosen approach can be exhaustive and then consists in estimating the likelihood of all the possible configurations, from the descriptors determined in step E320 and the distributions associated with them which are calculated in step E330.
  • a pre-selection of the configurations can be performed to reduce the number of configurations to be tested, and therefore the complexity of the implementation of the solution.
  • This pre-selection can be done for example according to the plane wave criterion alone by classifying certain components in the category when the value of their criterion away from the theoretical value of a plane wave 1: in the case of ambisonic signals, we can see on the distributions of Figure 7 that we can, whatever the configuration (order or number of sources) and a priori without loss of robustness, classify in the category c T the components whose checks one of the following inequalities:
  • Another possibility for further reducing the complexity is to exclude the pre-classified components of the computation of the bi-varied descriptors and the likelihood calculation, which reduces the number of bi-varied criteria to be calculated and therefore even more complexity. treatment.
  • Likelihood is expressed as the product of the conditional probabilities associated with each of the K descriptors, assuming that they are independent: where d is the vector of the descriptors and C is a vector representing a configuration (ie the combination of the supposed classes of the M components), as defined above.
  • a number K1 of univariate descriptors is used for each of the components, while a number / type of bi-varied descriptors is used for each pair of components. Since the laws of descriptor probabilities are established according to the number of supposed sources and the number of channels (the index m represents the ambisonic order, in the case of a capture of this type), we formulate the final expression. likelihood:
  • - is the value of the bi-varied descriptor of index k for the components 3 ⁇ 4 and si;
  • Ci are the supposed classes of the components Jet /;
  • log-likelihood This equation is the one ultimately used to determine the most likely configuration in the Bayesian classifier described here for this embodiment.
  • Bayesian classifier presented here is only one example of implementation, it could be replaced, inter alia, by a carrier vector machine or a neural network.
  • the processing described here is performed in the time domain, but may also be, in an alternative embodiment, applied in a transformed domain.
  • the method as described with reference to FIG. 3 then being implemented by frequency subbands after passing through the transformed domain of the signals picked up.
  • the useful bandwidth can be reduced according to the potential imperfections of the captaBon system, in high frequencies (presence of spatial folding) or at low frequencies (impossibility to find the theoretical directivities of the microphonic encoding).
  • FIG. 8 represents here an embodiment of a processing device (DIS) according to an embodiment of the invention.
  • Sensors represented here in the form of a spherical microphone MIC make it possible to acquire, in a real medium, thus reverberant, M mixing signals from a multichannel signal.
  • sensors can be integrated in the DIS device or outside the device / the resulting signals are then transmitted to the processing device that receives them via its input interface 840. Alternatively, these signals can simply be obtained beforehand and imported. in memory of the DIS device.
  • This memory may include a computer program including code instructions for the implementation of steps of the processing method as described for example with reference to Figure 3 and in particular the steps of applying a source separation process to the multichannel signal captured and obtaining a set of M sound components, with M ⁇ N , calculating a set of first descriptors said bivariate, representative of statistical relations between the components of the pairs of the set of M components obtained and a set of second descriptors said uni-varied representative of encoding characteristics components of the set of M components obtained and of classification of the components of the set of M components, according to two classes of components, a p first class of N so-called direct components corresponding to the N direct sound sources and a second class of M-N components called reverberated, by a calculation of probability of belonging to one of the two classes, a function of the sets of first and second descriptors.
  • the device comprises a source separation processing module 810 applied to the multichannel signal picked up to obtain a set of M sound components. with M ⁇ N.
  • the M components are provided at the input of a calculator 820 capable of calculating a set of first so-called bi-varied descriptors, representative of statistical relations between the components of the pairs of the set of M components obtained and a set of second descriptors said to be uni -variés of encoding features of the compo
  • a classification module 830 or classifier able to classify components of the set of M components, according to two classes of components, a first class of N so-called direct components corresponding to N direct sound sources and a second class of MN components called reverberated.
  • the classification module comprises a module 831 for calculating the probability of belonging to one of the two classes of the components of the set M, which is a function of the sets of first and second descriptors.
  • the classifier uses descriptors related to the correlation between the components to determine which are direct signals (ie true sources) and which are reverb residues. It also uses descriptors related to SAS-estimated mixing coefficients, to evaluate the conformity between the theoretical encoding of a single source and the estimated encoding of each component. Some of the descriptors are therefore a function of a pair of components (for the correlation), and others are functions of a single component (for the conformity of the estimated microphonic encoding).
  • a likelihood calculation module 832 makes it possible to determine, in one embodiment, the most probable combination of the classifications of the M components by a calculation of likelihood values according to the probabilities calculated in module 831 and for the possible combinations.
  • the device comprises an output interface 850 for outputting the classification information of the components, for example to another processing device that can use this information to enhance the sound of the discriminated sources, to denoise them or to perform a mixing from several discriminated sources.
  • Another possible treatment may also be to analyze or locate the sources to optimize the processing of a voice command.
  • the device can also be integrated in a communication terminal capable of processing signals picked up by a plurality of integrated or remote sensors of the terminal.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Measurement Of Velocity Or Position Using Acoustic Or Ultrasonic Waves (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
EP18737650.4A 2017-06-09 2018-05-24 Verarbeitung von klangdaten zur trennung von klangquellen in einem mehrkanalsignal Active EP3635718B1 (de)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
FR1755183A FR3067511A1 (fr) 2017-06-09 2017-06-09 Traitement de donnees sonores pour une separation de sources sonores dans un signal multicanal
PCT/FR2018/000139 WO2018224739A1 (fr) 2017-06-09 2018-05-24 Traitement de donnees sonores pour une separation de sources sonores dans un signal multicanal

Publications (2)

Publication Number Publication Date
EP3635718A1 true EP3635718A1 (de) 2020-04-15
EP3635718B1 EP3635718B1 (de) 2023-06-28

Family

ID=59746081

Family Applications (1)

Application Number Title Priority Date Filing Date
EP18737650.4A Active EP3635718B1 (de) 2017-06-09 2018-05-24 Verarbeitung von klangdaten zur trennung von klangquellen in einem mehrkanalsignal

Country Status (5)

Country Link
US (1) US11081126B2 (de)
EP (1) EP3635718B1 (de)
CN (1) CN110709929B (de)
FR (1) FR3067511A1 (de)
WO (1) WO2018224739A1 (de)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110473566A (zh) * 2019-07-25 2019-11-19 深圳壹账通智能科技有限公司 音频分离方法、装置、电子设备及计算机可读存储介质
EP4107723A4 (de) * 2020-02-21 2023-08-23 Harman International Industries, Incorporated Verfahren und system zur verbesserung der stimmentrennung durch beseitigung von überlappungen
CN113450823B (zh) * 2020-03-24 2022-10-28 海信视像科技股份有限公司 基于音频的场景识别方法、装置、设备及存储介质
FR3116348A1 (fr) * 2020-11-19 2022-05-20 Orange Localisation perfectionnée d’une source acoustique
CN112599144B (zh) * 2020-12-03 2023-06-06 Oppo(重庆)智能科技有限公司 音频数据处理方法、音频数据处理装置、介质与电子设备

Family Cites Families (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6879952B2 (en) * 2000-04-26 2005-04-12 Microsoft Corporation Sound source separation using convolutional mixing and a priori sound source knowledge
US20040086130A1 (en) * 2002-05-03 2004-05-06 Eid Bradley F. Multi-channel sound processing systems
US7809145B2 (en) * 2006-05-04 2010-10-05 Sony Computer Entertainment Inc. Ultra small microphone array
US7099821B2 (en) * 2003-09-12 2006-08-29 Softmax, Inc. Separation of target acoustic signals in a multi-transducer arrangement
KR100647286B1 (ko) * 2004-08-14 2006-11-23 삼성전자주식회사 교차채널 간섭을 제거하기 위한 후처리장치 및 방법과이를 이용한 다채널 음원 분리장치 및 방법
JP5053849B2 (ja) * 2005-09-01 2012-10-24 パナソニック株式会社 マルチチャンネル音響信号処理装置およびマルチチャンネル音響信号処理方法
EP1989777A4 (de) * 2006-03-01 2011-04-27 Softmax Inc System und verfahren zur erzeugung eines separaten signals
FR2899424A1 (fr) * 2006-03-28 2007-10-05 France Telecom Procede de synthese binaurale prenant en compte un effet de salle
FR2903562A1 (fr) * 2006-07-07 2008-01-11 France Telecom Spatialisation binaurale de donnees sonores encodees en compression.
JP2010519602A (ja) * 2007-02-26 2010-06-03 クゥアルコム・インコーポレイテッド 信号分離のためのシステム、方法、および装置
US8639498B2 (en) * 2007-03-30 2014-01-28 Electronics And Telecommunications Research Institute Apparatus and method for coding and decoding multi object audio signal with multi channel
US8131542B2 (en) * 2007-06-08 2012-03-06 Honda Motor Co., Ltd. Sound source separation system which converges a separation matrix using a dynamic update amount based on a cost function
GB0720473D0 (en) * 2007-10-19 2007-11-28 Univ Surrey Accoustic source separation
JP5195652B2 (ja) * 2008-06-11 2013-05-08 ソニー株式会社 信号処理装置、および信号処理方法、並びにプログラム
JP4816711B2 (ja) * 2008-11-04 2011-11-16 ソニー株式会社 通話音声処理装置および通話音声処理方法
US20110058676A1 (en) * 2009-09-07 2011-03-10 Qualcomm Incorporated Systems, methods, apparatus, and computer-readable media for dereverberation of multichannel signal
KR101567461B1 (ko) * 2009-11-16 2015-11-09 삼성전자주식회사 다채널 사운드 신호 생성 장치
US9165565B2 (en) * 2011-09-09 2015-10-20 Adobe Systems Incorporated Sound mixture recognition
US9654894B2 (en) * 2013-10-31 2017-05-16 Conexant Systems, Inc. Selective audio source enhancement

Also Published As

Publication number Publication date
EP3635718B1 (de) 2023-06-28
CN110709929B (zh) 2023-08-15
US11081126B2 (en) 2021-08-03
FR3067511A1 (fr) 2018-12-14
US20200152222A1 (en) 2020-05-14
CN110709929A (zh) 2020-01-17
WO2018224739A1 (fr) 2018-12-13

Similar Documents

Publication Publication Date Title
EP3635718B1 (de) Verarbeitung von klangdaten zur trennung von klangquellen in einem mehrkanalsignal
EP3807669B1 (de) Ortung von schallquellen in einer bestimmten akustischen umgebung
JP4406428B2 (ja) 信号分離装置、信号分離方法、信号分離プログラム及び記録媒体
EP2898707B1 (de) Optimierte kalibrierung eines klangwiedergabesystems mit mehreren lautsprechern
EP3281026B1 (de) Verfahren zur trennung von quellen für spärliche signale
US10078785B2 (en) Video-based sound source separation
EP4046390A1 (de) Verbesserte ortung einer schallquelle
EP2517037A1 (de) Verfahren zur kalkulation der anzahl von störquellen einer sensoranordnung mittels kalkulation von rauschstatistiken
WO2022106765A1 (fr) Localisation perfectionnée d'une source acoustique
EP3292819B1 (de) Identifikation eines verrauschten signals aus nicht stationären audiosignalen
CN113409771B (zh) 一种伪造音频的检测方法及其检测系统和存储介质
EP3559947B1 (de) Verarbeitung in subbändern eines aktuellen ambisonic-inhalts zur verbesserten dekodierung
Kressner et al. Outcome measures based on classification performance fail to predict the intelligibility of binary-masked speech
Zohny et al. Modelling interaural level and phase cues with Student's t-distribution for robust clustering in MESSL
WO2022219558A1 (en) System and method for estimating direction of arrival and delays of early room reflections
EP3385899A1 (de) Verfahren und gerät zur detektierung einer szene in echteit
FR3011086A1 (fr) Procede realisant conjointement la synchronisation, l'identification, la mesure, l'estimation du filtre de propagation et la localisation d'emetteurs utiles et interferants
EP1949548B1 (de) Verfahren zum Erkennen von Wegen bei der Impulsübertragung und Einrichtung zum ausführen des Verfahrens
EP1359685A1 (de) Quellentrennung für zyklostationäre Signale
WO2011012789A1 (fr) Localisation de sources
EP4315328A1 (de) Schätzung einer optimierten maske zur verarbeitung von erfassten tondaten
JP2023122018A (ja) 信号処理装置、信号処理プログラム及び信号処理方法

Legal Events

Date Code Title Description
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: UNKNOWN

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20191210

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME

RAP1 Party data changed (applicant data changed or rights of an application transferred)

Owner name: ORANGE

DAV Request for validation of the european patent (deleted)
DAX Request for extension of the european patent (deleted)
RAP3 Party data changed (applicant data changed or rights of an application transferred)

Owner name: ORANGE

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: EXAMINATION IS IN PROGRESS

17Q First examination report despatched

Effective date: 20211006

GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: GRANT OF PATENT IS INTENDED

INTG Intention to grant announced

Effective date: 20230310

RIN1 Information on inventor provided before grant (corrected)

Inventor name: GUERIN, ALEXANDRE

Inventor name: BAQUE, MATHIEU

GRAS Grant fee paid

Free format text: ORIGINAL CODE: EPIDOSNIGR3

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE PATENT HAS BEEN GRANTED

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

REG Reference to a national code

Ref country code: CH

Ref legal event code: EP

REG Reference to a national code

Ref country code: AT

Ref legal event code: REF

Ref document number: 1583444

Country of ref document: AT

Kind code of ref document: T

Effective date: 20230715

REG Reference to a national code

Ref country code: IE

Ref legal event code: FG4D

Free format text: LANGUAGE OF EP DOCUMENT: FRENCH

REG Reference to a national code

Ref country code: DE

Ref legal event code: R096

Ref document number: 602018052425

Country of ref document: DE

REG Reference to a national code

Ref country code: LT

Ref legal event code: MG9D

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: SE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20230628

Ref country code: NO

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20230928

REG Reference to a national code

Ref country code: NL

Ref legal event code: MP

Effective date: 20230628

REG Reference to a national code

Ref country code: AT

Ref legal event code: MK05

Ref document number: 1583444

Country of ref document: AT

Kind code of ref document: T

Effective date: 20230628

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: RS

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20230628

Ref country code: NL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20230628

Ref country code: LV

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20230628

Ref country code: LT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20230628

Ref country code: HR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20230628

Ref country code: GR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20230929

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: FI

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20230628

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: SK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20230628

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: ES

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20230628

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: IS

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20231028

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: SM

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20230628

Ref country code: SK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20230628

Ref country code: RO

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20230628

Ref country code: PT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20231030

Ref country code: IS

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20231028

Ref country code: ES

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20230628

Ref country code: EE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20230628

Ref country code: CZ

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20230628

Ref country code: AT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20230628

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: PL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20230628

REG Reference to a national code

Ref country code: DE

Ref legal event code: R097

Ref document number: 602018052425

Country of ref document: DE

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: DK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20230628

PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: IT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20230628

26N No opposition filed

Effective date: 20240402

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: GB

Payment date: 20240419

Year of fee payment: 7

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: DE

Payment date: 20240418

Year of fee payment: 7