EP3899936B1 - Séparation de sources utilisant une estimation et un contrôle de la qualité sonore - Google Patents
Séparation de sources utilisant une estimation et un contrôle de la qualité sonore Download PDFInfo
- Publication number
- EP3899936B1 EP3899936B1 EP19824332.1A EP19824332A EP3899936B1 EP 3899936 B1 EP3899936 B1 EP 3899936B1 EP 19824332 A EP19824332 A EP 19824332A EP 3899936 B1 EP3899936 B1 EP 3899936B1
- Authority
- EP
- European Patent Office
- Prior art keywords
- signal
- estimated
- audio
- depending
- values
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000926 separation method Methods 0.000 title claims description 67
- 230000005236 sound signal Effects 0.000 claims description 74
- 238000000034 method Methods 0.000 claims description 61
- 238000012549 training Methods 0.000 claims description 37
- 238000013528 artificial neural network Methods 0.000 claims description 34
- 238000011156 evaluation Methods 0.000 claims description 17
- 238000004590 computer program Methods 0.000 claims description 13
- 238000005094 computer simulation Methods 0.000 claims description 13
- 238000012805 post-processing Methods 0.000 claims description 13
- 230000008447 perception Effects 0.000 claims description 4
- 238000012545 processing Methods 0.000 description 19
- 230000002452 interceptive effect Effects 0.000 description 17
- 239000000203 mixture Substances 0.000 description 15
- 230000006870 function Effects 0.000 description 9
- 230000003595 spectral effect Effects 0.000 description 8
- 230000001276 controlling effect Effects 0.000 description 6
- 230000000694 effects Effects 0.000 description 6
- 230000015556 catabolic process Effects 0.000 description 5
- 230000000875 corresponding effect Effects 0.000 description 5
- 238000006731 degradation reaction Methods 0.000 description 5
- 230000008901 benefit Effects 0.000 description 4
- 238000004422 calculation algorithm Methods 0.000 description 4
- 238000013507 mapping Methods 0.000 description 4
- 238000013459 approach Methods 0.000 description 3
- 238000005457 optimization Methods 0.000 description 3
- 206010011878 Deafness Diseases 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 230000010370 hearing loss Effects 0.000 description 2
- 231100000888 hearing loss Toxicity 0.000 description 2
- 208000016354 hearing loss disease Diseases 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000013139 quantization Methods 0.000 description 2
- 230000002829 reductive effect Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000003321 amplification Effects 0.000 description 1
- 230000002238 attenuated effect Effects 0.000 description 1
- 210000000721 basilar membrane Anatomy 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000013213 extrapolation Methods 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 230000001771 impaired effect Effects 0.000 description 1
- 238000012880 independent component analysis Methods 0.000 description 1
- 230000000873 masking effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 230000036961 partial effect Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000003908 quality control method Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000002787 reinforcement Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
- G10L21/0308—Voice signal separating characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/60—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for measuring the quality of voice signals
Definitions
- the present invention relates to source separation of audio signals, in particular to signal adaptive control of sound quality of separated output signals, and, in particular, to an apparatus and a method for source separation using an estimation and control of sound quality.
- Such processing typically introduces artifacts in the output signal that deteriorate the sound quality. This degradation of the sound quality is monotonically increasing with the amount of separation, the attenuation of the interfering signals. Many applications do not require a total separation but a partial enhancement, the interfering sounds are attenuated but still present in the output signal.
- Partially masking of an audio signal means that its loudness (e.g., its perceived intensity) is partially reduced. It can furthermore be desired and required that, rather than achieving a large attenuation, the sound quality of the output does not fall below a defined sound quality level.
- dialog enhancement An example for such application is dialog enhancement.
- the audio signals in TV and radio broadcast and movie sound are often mixtures of speech signals and background signals, e.g. environmental sounds and music.
- background signals e.g. environmental sounds and music.
- speech signals and background signals e.g. environmental sounds and music.
- the listener may have difficulties to understand what has been said, or the understanding requires very high listening effort and this results in listener fatigue.
- Methods for automatically reducing the level of the background can be applied in such scenarios, but the result should be of high sound quality.
- the first category of methods is based on formulated assumptions about the signal model and/or the mixing model.
- the signal model describes characteristics of the input signals, here s ( n ) and b ( n ).
- the mixing model describes characteristics of how the input signals are combined to yield the mixture signal x ( n ), here by means of addition.
- the method of Independent Component Analysis can be derived by assuming that the mixture comprises two source signals that are statistically independent, the mixture has been captured by two microphones, and the mixing has been derived by adding both signals (producing an instantaneous mixture). The inverse process of the mixing is then mathematically derived as inversion of the mixing matrix and the elements of this unmixing matrix are computed according to a specified method. Most analytically derived methods are derived by formulating the separation problem as a numerical optimization of a criterion, e.g. the mean squared error between the true target and the estimated target.
- a criterion e.g. the mean squared error between the true target and the estimated target.
- a second category is data driven.
- a representation of the target signals is estimated, or a set of parameters for retrieving the target signals from the input mixture is estimated.
- the estimation is based on a model that has been trained on set of training data, hence the name "data driven”.
- the estimation is derived by optimizing a criterion, e.g. by minimizing the mean squared error between the true target and the estimated target, given the training data.
- An example for this category are Artificial Neural Networks (ANN) that have been trained to output an estimate of a speech signal given a mixture of speech signal and a interfering signal.
- ANN Artificial Neural Networks
- the adjustable parameters of the artificial neural network are determined such that a performance criterion computed for a set of training data is optimized - on average over the full data set.
- a solution that is optimal in a mean squared error sense or optimal with respect to any other numerical criterion is not necessarily the solution with the highest sound quality that is preferred by human listeners.
- a second problem stems from the fact that source separation always result in two effects, first the desired attenuation of the interfering sounds and second the undesired degradation of the sound quality. Both effects are correlated, e.g. increasing the desired effect results in an increase of the undesired effect. The ultimate aim is to control the trade-off between both.
- Sound quality can be estimated, e.g., quantified by means of listening test or by means of computational models of sound quality. Sound quality has multiple aspects, in the following referred to as Sound Quality Components (SQCs).
- SQCs Sound Quality Components
- the sound quality is determined by the perceived intensity of artifacts (these are signal components that have been introduced by a signal processing, e.g. source separation, and that decrease the sound quality).
- the sound quality is determined by the perceived intensity of interfering signals, or, e.g., by speech intelligibility (when the target signal is speech), or, for example, by the overall sound quality.
- the target signals s ( n ) (and the interfering signals b ( n )) are not available, otherwise the separation would not be required.
- the Sound Quality Components cannot be computed with these methods.
- Blind Source Separation Evaluation (see [1]) is a multicriteria performance evaluation toolbox.
- the estimated signal is decomposed by an orthogonal projection into target signal component, interference from other sources, and artifacts. Metrics are computed as energy ratios of these components and expressed in dB. These are: Source to Distortion Ratio (SDR), Source to Interference Ratio (SIR), and Source to Artifact Ratio (SAR).
- SDR Source to Distortion Ratio
- SIR Source to Interference Ratio
- SAR Source to Artifact Ratio
- Perceptual Evaluation methods for Audio Source Separation (PEASS) (see [2]) was designed as a perceptually motivated successor of BSSEval.
- the signal projection is carried out on time segments and with a gammatone filterbank.
- PEMO-Q (see [3]) is used to provide multiple features.
- Four perceptual scores are obtained from these features using a neural network trained with subjective ratings.
- the scores are: Overall Perceptual Score (OPS), Interference-related Perceptual Score (IPS), Artifact-related Perceptual Score (APS), and Target-related Perceptual Score (TPS).
- OPS Overall Perceptual Score
- IPS Interference-related Perceptual Score
- APS Artifact-related Perceptual Score
- TPS Target-related Perceptual Score
- Perceptual Evaluation of Audio Quality (PEAQ) (see [4]) is a metric designed for audio coding. It employs a peripheral ear model in order to calculate the basilar membrane representations of reference and test signal. Aspects of the difference between these representations are quantified by several output variables. By means of a neural network trained with subjective data, these variables are combined to give the main output, e.g., the Overall Difference Grade (ODG).
- ODG Overall Difference Grade
- Perceptual Evaluation of Speech Quality (PESQ) (see [5]) is a metric designed for speech transmitted over telecommunication networks.
- the method comprises a preprocessing that mimics a telephone handset. Measures for audible disturbances are computed from the specific loudness of the signals and combined in PESQ scores. From them a MOS score is predicted by means of a polynomial mapping function (see [6]).
- ViSQOLAudio is a metric designed for music encoded at low bitrates developed from Virtual Speech Quality Objective Listener (ViSQOL). Both metrics are based on a model of the peripheral auditory system to create internal representations of the signals called neurograms. These are compared via an adaptation of the structural similarity index, originally developed for evaluating the quality of compressed images.
- Hearing-Aid Audio Quality Index (HAAQI) (see [8]) is an index designed to predict music quality for individuals listening through hearing aids. The index is based on a model of the auditory periphery, extended to include the effects of hearing loss. This is fitted to a database of quality ratings made by listeners having normal or impaired hearing. The hearing loss simulation can be bypassed and the index becomes valid also for normal-hearing people. Based on the same auditory model, the authors of HAAQI also proposed an index for speech quality, Hearing-Aid Speech Quality Index (HASQI) (see [9]) and an index for speech intelligibility, Hearing-Aid Speech Perception Index (HASPI) (see [10]).
- HASQI Hearing-Aid Speech Quality Index
- HASPI Hearing-Aid Speech Perception Index
- SMS Short-Time Objective Intelligibility
- an artificial neural network is trained so to estimate a Source to Distortion Ratio given only the input signal and the output estimated target signal, where the calculation of the Source to Distortion Ratio would normally take as inputs also the true target and the interfering signal.
- a pool of separation algorithms is run in parallel on the same input signal.
- the Source to Distortion Ratio estimates are used in order to select for each time frame the output from the algorithm with the best Source to Distortion Ratio. Hence, no control over the trade-off between sound quality and separation is formulated, and no control of the parameters of a separation algorithm is proposed.
- the Source to Distortion Ratio is used, which is not perceptually-motivated and it was shown to poorly correlate with perceived quality, e.g. in [13].
- an audio processing device where an audibility measure is used together with an artifact identification measure in order to control the time-frequency gains applied by the processing.
- an audibility measure is used together with an artifact identification measure in order to control the time-frequency gains applied by the processing.
- This is to provide, e.g., that the amount of noise reduction is at a maximum level subject to the constraint that no artifact is introduced, the trade-off between sound quality and separation is fixed.
- the system does not involve supervised learning.
- the Kurtosis Ratio is used, a measure that directly compares output and input signals (possibly in segments where speech is not present), without the need for the true target and the interfering signal. This simple measure is enriched by an audibility measure.
- US 2017/251320 A1 discloses an apparatus and method for creating multilingual audio content based on a stereo audio signal.
- the method of creating multilingual audio content including adjusting an energy value of each of a plurality of sound sources provided in multiple languages, setting an initial azimuth angle of each of the sound sources based on a number of the sound sources, mixing each of the sound sources to generate a stereo signal based on the set initial azimuth angle, separating the sound sources to play the mixed sound sources using a sound source separating algorithm, and storing the mixed sound sources based on a sound quality of each of the separated sound sources.
- the object of the present invention is to provide improved concepts for source separation.
- the object of the present invention is solved by an apparatus according to claim 1, by a method according to claim 15 and by a computer program according to claim 16.
- the audio input signal comprises a target audio signal portion and a residual audio signal portion.
- the residual audio signal portion indicates a residual between the audio input signal and the target audio signal portion.
- the apparatus comprises a source separator, a determining module and a signal processor.
- the source separator is configured to determine an estimated target signal which depends on the audio input signal, the estimated target signal being an estimate of a signal that only comprises the target audio signal portion.
- the determining module is configured to determine one or more result values depending on an estimated sound quality of the estimated target signal to obtain one or more parameter values, wherein the one or more parameter values are the one or more result values or depend on the one or more result values.
- the signal processor is configured to generate the separated audio signal depending on the one or more parameter values and depending on at least one of the estimated target signal and the audio input signal and an estimated residual signal, the estimated residual signal being an estimate of a signal that only comprises the residual audio signal portion.
- the audio input signal comprises a target audio signal portion and a residual audio signal portion.
- the residual audio signal portion indicates a residual between the audio input signal and the target audio signal portion. The method comprises:
- Fig. 1a illustrates an apparatus for generating a separated audio signal from an audio input signal according to an embodiment.
- the audio input signal comprises a target audio signal portion and a residual audio signal portion.
- the residual audio signal portion indicates a residual between the audio input signal and the target audio signal portion.
- the apparatus comprises a source separator 110, a determining module 120 and a signal processor 130.
- the source separator 110 is configured to determine an estimated target signal which depends on the audio input signal, the estimated target signal being an estimate of a signal that only comprises the target audio signal portion.
- the determining module 120 is configured to determine one or more result values depending on an estimated sound quality of the estimated target signal to obtain one or more parameter values, wherein the one or more parameter values are the one or more result values or depend on the one or more result values.
- the signal processor 130 is configured to generate the separated audio signal depending on the one or more parameter values and depending on at least one of the estimated target signal and the audio input signal and an estimated residual signal.
- the estimated residual signal is an estimate of a signal that only comprises the residual audio signal portion.
- the determining module 120 may, e.g., be configured to determine the one or more result values depending on the estimated target signal and depending on at least one of the audio input signal and the estimated residual signal.
- Embodiments provide a perceptually-motivated and signal-adaptive control over the trade-off between sound quality and separation using supervised learning. This can be achieved in two ways. The first method estimates the sound quality of the output signal and uses this estimate to adapt parameters of the separation or a post-processing of the separated signals. In a second embodiment, the regression method directly outputs the control parameters such that sound quality of the output signal meets predefined requirements.
- analysing the input signal and the output signal of the separation is conducted to yield an estimate of the sound quality q m , and determining processing parameters based on q m such that the sound quality of the output (when using the determined processing parameters) is not lower than a defined quality value.
- the analysis outputs a quality measure q m in (9).
- a control parameter p 1 in formula (13) below is computed (e.g., a scaling factor), and the final output is obtained by mixing the initial output and the input as in formula (13) below.
- the control parameter may, e.g., be a smoothing parameter or the like.
- the analysis yields the control parameter p 1 in (13) directly, see Fig 3 .
- Fig 4 and Fig 5 define further embodiments.
- Some embodiments achieve a control of sound quality in a post-processing step, as described below.
- a subset of the herein described embodiments can be applied independently of the separation method. Some herein described embodiments control parameters of the separation process.
- Source separation using spectral weighting processes signals in the time-frequency domain or a short-time spectral domain.
- the input signal x ( n ) is transformed by means of the short-time Fourier transform (STFT) or processed by means of a filterbank, yielding complex-valued STFT coefficients or subband signals X ( m, k ) , where m denotes the time frame index, k denotes the frequency bin index or the subband index.
- the complex-valued STFT coefficients or subband signals of the desired signal are S ( m, k )
- of the interfering signal are B ( m , k ).
- the aim is to attenuate elements in X ( m, k ) where the interferer B ( m, k ) is large.
- the spectral weights can be further modified, e.g. by thresholding such that G is larger than a threshold.
- Increasing the threshold v reduces the attenuation of the interferer and reduces the potential degradation of the sound quality.
- the output signal ⁇ ( n ) is then computed using the inverse processing of the STFT or filterbank.
- a representation of the target signal can also be estimated directly from the input signal, e.g. by means of an artificial neural network.
- an artificial neural network has been trained to estimate the target time signal, or its STFT coefficients, or the magnitudes of the STFT coefficients.
- the supervised learning method g ( ⁇ ) is realized by:
- An application of supervised learning methods for quality control of the separated output signal is provided.
- Fig. 1b illustrates an embodiment, where the determining module 120 comprises an artificial neural network 125.
- the artificial neural network 125 may, e.g., be configured to determine the one or more result values depending on the estimated target signal.
- the artificial neural network 125 may, e.g., be configured to receive a plurality of input values, each of the plurality of input values depending on at least one of the estimated target signal and the estimated residual signal and the audio input signal.
- the artificial neural network 125 may, e.g., be configured to determine the one or more result values as one or more output values of the artificial neural network 125.
- the artificial neural network 125 may, e.g., be configured to determine the one or more result values depending on the estimated target signal and at least one of the audio input signal and the estimated residual signal.
- each of the plurality of input values may, e.g., depend on at least one of the estimated target signal and the estimated residual signal and the audio input signal.
- the one or more result values may, e.g., indicate the estimated sound quality of the estimated target signal.
- each of the plurality of input values may, e.g., depend on at least one of the estimated target signal and the estimated residual signal and the audio input signal.
- the one or more result values may, e.g., be the one or more parameter values.
- the artificial neural network 125 may, e.g., be configured to be trained by receiving a plurality of training sets, wherein each of the plurality of training sets comprises a plurality of input training values of the artificial neural network 125 and one or more output training values of the artificial neural network 125, wherein each of the plurality of output training values may, e.g., depend on at least one of a training target signal and a training residual signal and a training input signal, wherein each of the or more output training values may, e.g., depend on an estimation of a sound quality of the training target signal.
- An estimate for Sound Quality Component is obtained by means of supervised learning using a supervised learning model (SLM), e.g. an Artificial Neural Network (ANN) 125.
- SLM supervised learning model
- ANN Artificial Neural Network
- the Artificial Neural Network 125 can be for example a fully connected Artificial Neural Network 125 that comprises an input layer with A units, at least one hidden layer with input layers at least two units each, and an output layer with one or more units.
- the supervised learning model can be implemented as a regression model or a classification model.
- a regression model estimates one the target value at the output of one unit in the output layer.
- the regression problem can be formulated as a classification problem by quantizing the outputs value into at least 3 steps and using an output layer with C units where C equals the number of quantization steps.
- the supervised learning model is first trained with a data set that contain multiple examples of mixture signal x, estimated target ⁇ , and Sound Quality Component q m , where the Sound Quality Component has been computed from the estimated target ⁇ , and the true target s, for example.
- One item of the data set is denoted by ⁇ x i , ⁇ i , q i ⁇ .
- the output of the supervised learning model is here denoted by q i .
- the number of units in the input layer A corresponds to the number of input values.
- the inputs to the models are computed from the input signals.
- Each signal can be optionally processed by means of the filterbank of time-frequency transform, e.g. a short-term Fourier transform (STFT).
- STFT short-term Fourier transform
- B being the total number of spectral coefficients per frame
- the total number of input coefficients is 2 ⁇ B ⁇ D.
- the number of input values K equals the number of input coefficients D. All w t and o i are parameters of the Artificial Neural Network 125 that are determined in the training procedure.
- the units of one layer are connected to the units of the following layer, the outputs of the units of a preceding layer are the inputs to the units of the next layer.
- the training is carried out by minimizing the prediction error using a numerical optimization method, e.g. a gradient descent method.
- the prediction error over the full data set or a subset of the data set that is used an optimization criterion is for example the mean squared error MSE or the mean absolute error MAE, where N denotes the number of items in the data set.
- MSE 1 N ⁇ i N e i 2
- MAE 1 N ⁇ i N e i
- mapping function is controlled by a set of parameters (e.g. w i and o i ) that are determined in a training procedure by optimizing a scalar criterion.
- the supervised learning model can be used for the estimation of the sound quality of an unknown estimated target 9 given the mixture without the need for the true target s.
- the estimation of the sound quality of the training target signal may, e.g., depend on one or more computational models of sound quality.
- the estimation of the sound quality of the training target signal may, e.g., depend on one or more of the following computational models of sound quality:
- the control of sound quality can be implemented by estimating the Sound Quality Component and computing processing parameters based on the Sound Quality Component estimate, or by directly estimating optimal processing parameters such that the Sound Quality Component meet a target value q 0 (or do not fall below that target).
- the target value for the sound quality q 0 will determine the trade-off between separation and sound quality. This parameter can be controlled by the user, or it is specified dependent on the sound reproduction scenario. Sound reproduction at home in a quiet environment over high quality equipment may benefit from higher sound quality and lower separation. Sound reproduction in vehicles in a noisy environment over loudspeakers built into a smartphone may benefit from lower sound quality but higher separation and speech intelligibility.
- the estimated quantities can be further applied to either control a post-processing or to control a secondary separation.
- Fig. 2 illustrates an apparatus according to an embodiment which is configured to use an estimation of sound quality and which is configured to conduct post-processing.
- the determining module 120 may, e.g., be configured to estimate, depending on at least one of the estimated target signal and the audio input signal and the estimated residual signal, a sound quality value as the one or more result values, wherein the sound quality value indicates the estimated sound quality of the estimated target signal.
- the determining module 120 may, e.g., be configured to determine the one or more parameter values depending on the sound quality value.
- the determining module 120 may e.g., be configured to determine, depending on the estimated sound quality of the estimated target signal, a control parameter as the one or more parameter value.
- the signal processor 130 may e.g., be configured to determine the separated audio signal depending on the control parameter and depending on at least one of the estimated target signal and the audio input signal and the estimated residual signal.
- a first step the separation is applied.
- the separated signal and the unprocessed signal are the inputs to a Quality Estimation Module (QEM).
- QEM Quality Estimation Module
- the QEM computes an estimate for Sound Quality Components, q ⁇ ( n ).
- the estimated Sound Quality Components q ⁇ ( n .) are used to compute a set of parameters p ⁇ ( n ) for controlling the post-processing.
- the variables q ( n ), q ⁇ ( n ) , p ( n ) , and p ⁇ ( n ) can be time varying, but the time dependency is omitted in the following for the sake of a clear notation.
- the signal processor 130 may e.g., be configured to determine the separated audio signal depending on formula (13), wherein y is the separated audio signal, wherein ⁇ is the estimated target signal, wherein x is the audio input signal, wherein p 1 is the control parameter, and wherein n is an index.
- This function f can be, for example, an iterative extensive search, as illustrated by the following pseudocode.
- Equation (13) when the processing parameter p is controlling a post-processing as in Equation (13), q ⁇ is computed for a fixed number of values of p 1 , e.g. corresponding to 18, 12, and 6 dB of relative amplification of ⁇ .
- the signal processor 130 may, e.g., be configured to generate the separated audio signal by determining a first version of the separated audio signal and by modifying the separated audio signal one or more times to obtain one or more intermediate versions of the separated audio signal.
- the determining module 120 may, e.g., be configured to modify the sound quality value depending on one of the one or more intermediate values of the separated audio signal.
- the signal processor 130 may, e.g., be configured to stop modifying the separated audio signal, if sound quality value is greater than or equal to a defined quality value.
- Fig. 3 illustrates an apparatus according to another embodiment, wherein direct estimation of post-processing parameters is conducted.
- the separation is applied.
- the separated signals are the input to a Parameter Estimation Module (PEM).
- the estimated parameters are applied for controlling the post-processing.
- the PEM has been trained to directly estimate p(n) from the separated signal ⁇ ( n ) and the input signal x ( n ). This means that the operation in Eq. 14 is moved to the training phase and the regression method is trained to estimate p ⁇ instead of q ⁇ .
- the following function is learned.
- the signal processor 130 may, e.g., be configured to generate the separated audio signal depending on the one or more parameter values and depending on a postprocessing of the estimated target signal.
- Fig. 4 illustrates an apparatus according to a further embodiment, in accordance with the invention, wherein estimation of sound quality and secondary separation is conducted.
- the separated signals are the input to a QEM.
- the estimated Sound Quality Components are used to compute a set of parameters for controlling secondary separation.
- the signal processor 130 is configured to generate the separated audio signal depending on the one or more parameter values and depending on a linear combination of the estimated target signal and the audio input signal, or the signal processor 130 is configured to generate the separated audio signal depending on the one or more parameter values and depending on a linear combination of the estimated target signal and the estimated residual signal.
- Suitable parameters for controlling the secondary separation are, for example, parameters that modify the spectral weights.
- Fig. 5 illustrates an apparatus according to another embodiment, wherein direct estimation of separation parameters is conducted.
- the separated signals are the input to a PEM.
- the estimated parameters control the secondary separation.
- Equations (5), (6) and v are controlled: a , and c from Equations (5), (6) and v as described above.
- Fig 4 and 5 depicts an iterative processing with one iteration. In general, this can be repeated multiple times, and implemented in a loop.
- the iterative processing (without quality estimation in between) is very similar to other prior methods that concatenate multiple separations.
- Such an approach may, e.g., be suitable for combining multiple different methods (which is better than repeating one method).
- aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus.
- Some or all of the method steps may be executed by (or using) a hardware apparatus, like for example, a microprocessor, a programmable computer or an electronic circuit. In some embodiments, one or more of the most important method steps may be executed by such an apparatus.
- embodiments of the invention can be implemented in hardware or in software or at least partially in hardware or at least partially in software.
- the implementation can be performed using a digital storage medium, for example a floppy disk, a DVD, a Blu-Ray, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed. Therefore, the digital storage medium may be computer readable.
- Some embodiments according to the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the inventive methods described herein is performed.
- embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operative for performing one of the inventive methods when the computer program product runs on a computer.
- the program code may for example be stored on a machine readable carrier.
- inventions comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier.
- an embodiment of the invention is, therefore, a computer program having a program code for performing one of the inventive methods described herein, when the computer program runs on a computer.
- a further embodiment of the invention is, therefore, a data carrier (or a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the inventive methods described herein.
- the data carrier, the digital storage medium or the recorded medium are typically tangible and/or non-transitory.
- a further embodiment of the invention is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the inventive methods described herein.
- the data stream or the sequence of signals may for example be configured to be transferred via a data communication connection, for example via the Internet.
- a further embodiment comprises a processing means, for example a computer, or a programmable logic device, configured to or adapted to perform one of the methods described herein.
- a processing means for example a computer, or a programmable logic device, configured to or adapted to perform one of the methods described herein.
- a further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.
- a further embodiment according to the invention comprises an apparatus or a system configured to transfer (for example, electronically or optically) a computer program for performing one of the inventive methods described herein to a receiver.
- the receiver may, for example, be a computer, a mobile device, a memory device or the like.
- the apparatus or system may, for example, comprise a file server for transferring the computer program to the receiver.
- a programmable logic device for example a field programmable gate array
- a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein.
- the methods are preferably performed by any hardware apparatus.
- the apparatus described herein may be implemented using a hardware apparatus, or using a computer, or using a combination of a hardware apparatus and a computer.
- the methods described herein may be performed using a hardware apparatus, or using a computer, or using a combination of a hardware apparatus and a computer.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Human Computer Interaction (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Signal Processing (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Theoretical Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Tone Control, Compression And Expansion, Limiting Amplitude (AREA)
- Circuit For Audible Band Transducer (AREA)
- Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
Claims (16)
- Appareil pour générer un signal audio séparé à partir d'un signal d'entrée audio, dans lequel le signal d'entrée audio comprend une partie de signal audio cible et une partie de signal audio résiduel, dans lequel la partie de signal audio résiduel indique un résidu entre le signal d'entrée audio et la partie de signal audio cible, dans lequel l'appareil comprend:un séparateur de sources (110) destiné à déterminer un signal cible estimé qui dépend du signal d'entrée audio, le signal cible estimé étant une estimation d'un signal qui ne comprend que la partie de signal audio cible,un module de détermination (120), où le module de détermination (120) est configuré pour déterminer une ou plusieurs valeurs de résultat en fonction d'une qualité de son estimée du signal cible estimé pour obtenir une ou plusieurs valeurs de paramètre, où les une ou plusieurs valeurs de paramètre sont les une ou plusieurs valeurs de résultat ou dépendent des une ou plusieurs valeurs de résultat, etun processeur de signal (130) destiné à générer le signal audio séparé en fonction des une ou plusieurs valeurs de paramètre et en fonction d'au moins l'un parmi le signal cible estimé et le signal d'entrée audio et un signal résiduel estimé, le signal résiduel estimé étant une estimation d'un signal qui ne comprend que la partie de signal audio résiduel,dans lequel le processeur de signal (130) est configuré pour générer le signal audio séparé en fonction des une ou plusieurs valeurs de paramètre et en fonction d'une combinaison linéaire du signal cible estimé et du signal d'entrée audio; ou dans lequel le processeur de signal (130) est configuré pour générer le signal audio séparé en fonction des une ou plusieurs valeurs de paramètre et en fonction d'une combinaison linéaire du signal cible estimé et du signal résiduel estimé.
- Appareil selon la revendication 1,dans lequel le module de détermination (120) est configuré pour déterminer, en fonction de la qualité de son estimée du signal cible estimé, un paramètre de commande comme les une ou plusieurs valeurs de paramètre, etdans lequel le processeur de signal est configuré pour déterminer le signal audio séparé en fonction du paramètre de commande et en fonction d'au moins l'un parmi le signal cible estimé et le signal d'entrée audio et le signal résiduel estimé.
- Appareil selon la revendication 2,dans lequel le processeur de signal (130) est configuré pour déterminer le signal audio séparé en fonction de:où y est le signal audio séparé,où ŝ est le signal cible estimé,où x est le signal d'entrée audio,où b̂ est le signal résiduel estimé,où p 1 est le paramètre de commande, etoù n est un indice.
- Appareil selon la. revendication 2 ou 3,dans lequel le module de détermination (120) est configuré pour estimer, en fonction d'au moins l'un parmi le signal cible estimé et le signal d'entrée audio et le signal résiduel estimé, une valeur de qualité de son comme les une ou plusieurs valeurs de résultat, dans lequel la valeur de qualité de son indique la qualité de son estimée du signal cible estimé, etdans lequel le module de détermination (120) est configuré pour déterminer les une ou plusieurs valeurs de paramètre en fonction de la valeur de qualité de son.
- Appareil selon la revendication 4,dans lequel le processeur de signal (130) est configuré pour générer le signal audio séparé en déterminant une première version du signal audio séparé et en modifiant le signal audio séparé une ou plusieurs fois pour obtenir une ou plusieurs versions intermédiaires du signal audio séparé,dans lequel le module de détermination (120) est configuré pour modifier la valeur de qualité de son en fonction de l'une des une ou plusieurs valeurs intermédiaires du signal audio séparé, etdans lequel le processeur de signal (130) est configuré pour arrêter de modifier le signal audio séparé si la valeur de qualité de son est supérieure ou égale à une valeur de qualité définie.
- Appareil selon l'une des revendications précédentes,
dans lequel le module de détermination (120) est configuré pour déterminer les une ou plusieurs valeurs de résultat en fonction du signal cible estimé et en fonction d'au moins l'un parmi le signal d'entrée audio et le signal résiduel estimé. - Appareil selon l'une des revendications précédentes,
dans lequel le module de détermination (120) comprend un réseau neuronal artificiel (125) destiné à déterminer les une ou plusieurs valeurs de résultat en fonction du signal cible estimé, dans lequel le réseau neuronal artificiel (125) est configuré pour recevoir une pluralité de valeurs d'entrée, chacune de la pluralité de valeurs d'entrée dépendant d'au moins l'un parmi le signal cible estimé et le signal résiduel estimé et le signal d'entrée audio, et dans lequel le réseau neuronal artificiel (125) est configuré pour déterminer les une ou plusieurs valeurs de résultat comme une ou plusieurs valeurs de sortie du réseau neuronal artificiel (125). - Appareil selon la revendication 7,dans lequel chacune de la pluralité de valeurs d'entrée dépend d'au moins l'un parmi le signal cible estimé et le signal résiduel estimé et le signal d'entrée audio, etdans lequel les une ou plusieurs valeurs de résultat indiquent la qualité de son estimée du signal cible estimé.
- Appareil selon la revendication 7,dans lequel chacune de la pluralité de valeurs d'entrée dépend d'au moins l'un parmi le signal cible estimé et le signal résiduel estimé et le signal d'entrée audio, etdans lequel les une ou plusieurs valeurs de résultat sont les une ou plusieurs valeurs de paramètre.
- Appareil selon l'une des revendications 7 à 9,
dans lequel le réseau neuronal artificiel (125) est configuré pour être entraîné en recevant une pluralité d'ensembles d'apprentissage, dans lequel chacun de la pluralité d'ensembles d'apprentissage comprend une pluralité de valeurs d'apprentissage d'entrée du réseau neuronal artificiel (125) et une ou plusieurs valeurs d'apprentissage de sortie du réseau neuronal artificiel (125), dans lequel chacune de la pluralité de valeurs d'apprentissage de sortie dépend d'au moins l'un parmi un signal cible d'apprentissage et un signal résiduel d'apprentissage et un signal d'entrée d'apprentissage, dans lequel chacune des une ou plusieurs valeurs d'apprentissage de sortie dépend d'une estimation d'une qualité de son du signal cible d'apprentissage. - Appareil selon la revendication 10,
dans lequel l'estimation de la qualité de son du signal cible d'apprentissage dépend d'un ou plusieurs modèles informatiques de qualité de son. - Appareil selon la revendication 11,
dans lequel les un ou plusieurs modèles informatiques de qualité de son sont au moins l'un parmi:Evaluation de Séparation de Sources Aveugle,Procédés d'Evaluation Perceptuelle pour la Séparation de Sources Audio,Evaluation Perceptuelle de la Qualité Audio,Evaluation Perceptuelle de la Qualité de la Parole,Audio d'Auditeur Objectif de Qualité de la Parole Virtuelle,Indice de Qualité d'Audio de l'Appareil Auditif,Indice de Qualité de la Parole de l'Appareil AuditifIndice de Perception de la Parole de l'Appareil Auditif, etIntelligibilité Objective à Court Terme. - Appareil selon l'une des revendications 7 à 12,
dans lequel le réseau neuronal artificiel (125) est configuré pour déterminer les une ou plusieurs valeurs de résultat en fonction du signal cible estimé et en fonction d'au moins l'un parmi le signal d'entrée audio et le signal résiduel estimé. - Appareil selon l'une des revendications précédentes,
dans lequel le processeur de signal (130) est configuré pour générer le signal audio séparé en fonction des une ou plusieurs valeurs de paramètre et en fonction d'un post-traitement du signal cible estimé. - Procédé pour générer un signal audio séparé à partir d'un signal d'entrée audio, dans lequel le signal d'entrée audio comprend une partie de signal audio cible et une partie de signal audio résiduel, dans lequel la partie de signal audio résiduel indique un résidu entre le signal d'entrée audio et le partie de signal audio cible, dans lequel le procédé comprend le fait de:déterminer un signal cible estimé qui dépend du signal d'entrée audio, le signal cible estimé étant une estimation d'un signal qui ne comprend que la partie de signal audio cible,déterminer une ou plusieurs valeurs de résultat en fonction d'une qualité de son estimée du signal cible estimé pour obtenir une ou plusieurs valeurs de paramètre, où les une ou plusieurs valeurs de paramètre sont les une ou plusieurs valeurs de résultat ou dépendent des une ou plusieurs valeurs de résultat, etgénérer le signal audio séparé en fonction des une ou plusieurs valeurs de paramètre et en fonction d'au moins l'un parmi le signal cible estimé et le signal d'entrée audio et un signal résiduel estimé, le signal résiduel estimé étant une estimation d'un signal qui ne comprend que la partie de signal audio résiduel,dans lequel la génération du signal audio séparé est effectuée en fonction des une ou plusieurs valeurs de paramètre et en fonction d'une combinaison linéaire du signal cible estimé et du signal d'entrée audio; ou dans lequel la génération du signal audio séparé est effectuée en fonction des une ou plusieurs valeurs de paramètre et en fonction d'une combinaison linéaire du signal cible estimé et du signal résiduel estimé.
- Programme d'ordinateur pour mettre en oeuvre le procédé selon la revendication 15 lorsqu'il est exécuté sur un ordinateur ou un processeur de signal.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP18215707.3A EP3671739A1 (fr) | 2018-12-21 | 2018-12-21 | Appareil et procédé de séparation de source à l'aide d'une estimation et du contrôle de la qualité sonore |
PCT/EP2019/086565 WO2020127900A1 (fr) | 2018-12-21 | 2019-12-20 | Appareil et procédé de séparation de source utilisant une estimation et un contrôle de la qualité sonore |
Publications (3)
Publication Number | Publication Date |
---|---|
EP3899936A1 EP3899936A1 (fr) | 2021-10-27 |
EP3899936B1 true EP3899936B1 (fr) | 2023-09-06 |
EP3899936C0 EP3899936C0 (fr) | 2023-09-06 |
Family
ID=65011753
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP18215707.3A Withdrawn EP3671739A1 (fr) | 2018-12-21 | 2018-12-21 | Appareil et procédé de séparation de source à l'aide d'une estimation et du contrôle de la qualité sonore |
EP19824332.1A Active EP3899936B1 (fr) | 2018-12-21 | 2019-12-20 | Séparation de sources utilisant une estimation et un contrôle de la qualité sonore |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP18215707.3A Withdrawn EP3671739A1 (fr) | 2018-12-21 | 2018-12-21 | Appareil et procédé de séparation de source à l'aide d'une estimation et du contrôle de la qualité sonore |
Country Status (10)
Country | Link |
---|---|
US (1) | US20210312939A1 (fr) |
EP (2) | EP3671739A1 (fr) |
JP (1) | JP7314279B2 (fr) |
KR (1) | KR102630449B1 (fr) |
CN (1) | CN113574597B (fr) |
BR (1) | BR112021012308A2 (fr) |
CA (1) | CA3124017C (fr) |
ES (1) | ES2966063T3 (fr) |
MX (1) | MX2021007323A (fr) |
WO (1) | WO2020127900A1 (fr) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116997962A (zh) * | 2020-11-30 | 2023-11-03 | 杜比国际公司 | 基于卷积神经网络的鲁棒侵入式感知音频质量评估 |
CN113470689B (zh) * | 2021-08-23 | 2024-01-30 | 杭州国芯科技股份有限公司 | 一种语音分离方法 |
WO2023073596A1 (fr) * | 2021-10-27 | 2023-05-04 | WingNut Films Productions Limited | Systèmes et procédés de flux de travail pour traitement de séparation de sources audio |
US11763826B2 (en) | 2021-10-27 | 2023-09-19 | WingNut Films Productions Limited | Audio source separation processing pipeline systems and methods |
US20230126779A1 (en) * | 2021-10-27 | 2023-04-27 | WingNut Films Productions Limited | Audio Source Separation Systems and Methods |
CN113850246B (zh) * | 2021-11-30 | 2022-02-18 | 杭州一知智能科技有限公司 | 基于对偶一致网络的声源定位与声源分离的方法和系统 |
CN117475360B (zh) * | 2023-12-27 | 2024-03-26 | 南京纳实医学科技有限公司 | 基于改进型mlstm-fcn的音视频特点的生物特征提取与分析方法 |
Family Cites Families (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1808571A (zh) * | 2005-01-19 | 2006-07-26 | 松下电器产业株式会社 | 声音信号分离系统及方法 |
US7464029B2 (en) * | 2005-07-22 | 2008-12-09 | Qualcomm Incorporated | Robust separation of speech signals in a noisy environment |
EP2375409A1 (fr) * | 2010-04-09 | 2011-10-12 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Codeur audio, décodeur audio et procédés connexes pour le traitement de signaux audio multicanaux au moyen d'une prédiction complexe |
DE102011084035A1 (de) * | 2011-10-05 | 2013-04-11 | Nero Ag | Vorrichtung, verfahren und computerprogramm zur bewertung einer wahrgenommenen audioqualität |
EP2747081A1 (fr) | 2012-12-18 | 2014-06-25 | Oticon A/s | Dispositif de traitement audio comprenant une réduction d'artéfacts |
SG11201507066PA (en) * | 2013-03-05 | 2015-10-29 | Fraunhofer Ges Forschung | Apparatus and method for multichannel direct-ambient decomposition for audio signal processing |
EP2790419A1 (fr) * | 2013-04-12 | 2014-10-15 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Appareil et procédé de mise à l'échelle d'un signal central et amélioration stéréophonique basée sur un rapport signal-mixage réducteur |
GB2516483B (en) * | 2013-07-24 | 2018-07-18 | Canon Kk | Sound source separation method |
JP6143887B2 (ja) * | 2013-12-26 | 2017-06-07 | 株式会社東芝 | 方法、電子機器およびプログラム |
WO2016033269A1 (fr) * | 2014-08-28 | 2016-03-03 | Analog Devices, Inc. | Traitement audio en utilisant un microphone intelligent |
US10397711B2 (en) * | 2015-09-24 | 2019-08-27 | Gn Hearing A/S | Method of determining objective perceptual quantities of noisy speech signals |
MX2018003529A (es) * | 2015-09-25 | 2018-08-01 | Fraunhofer Ges Forschung | Codificador y metodo para codificar una se?al de audio con ruido de fondo reducido que utiliza codificacion predictiva lineal. |
KR20170101629A (ko) * | 2016-02-29 | 2017-09-06 | 한국전자통신연구원 | 스테레오 오디오 신호 기반의 다국어 오디오 서비스 제공 장치 및 방법 |
EP3220661B1 (fr) * | 2016-03-15 | 2019-11-20 | Oticon A/s | Procédé permettant de prédire l'intelligibilité de bruit et/ou de la parole améliorée et système auditif binauriculaire |
EP3453187B1 (fr) * | 2016-05-25 | 2020-05-13 | Huawei Technologies Co., Ltd. | Étage de traitement de signal audio, appareil de traitement de signal audio et procédé de traitement de signal audio |
DK3252766T3 (da) * | 2016-05-30 | 2021-09-06 | Oticon As | Audiobehandlingsanordning og fremgangsmåde til estimering af signal-til-støj-forholdet for et lydsignal |
US10861478B2 (en) * | 2016-05-30 | 2020-12-08 | Oticon A/S | Audio processing device and a method for estimating a signal-to-noise-ratio of a sound signal |
CN106531190B (zh) * | 2016-10-12 | 2020-05-05 | 科大讯飞股份有限公司 | 语音质量评价方法和装置 |
CN106847301A (zh) * | 2017-01-03 | 2017-06-13 | 东南大学 | 一种基于压缩感知和空间方位信息的双耳语音分离方法 |
EP3474280B1 (fr) * | 2017-10-19 | 2021-07-07 | Goodix Technology (HK) Company Limited | Processeur de signal pour l'amélioration du signal de parole |
CN107993671A (zh) * | 2017-12-04 | 2018-05-04 | 南京地平线机器人技术有限公司 | 声音处理方法、装置和电子设备 |
EP3573058B1 (fr) * | 2018-05-23 | 2021-02-24 | Harman Becker Automotive Systems GmbH | Séparation de son sec et de son ambiant |
-
2018
- 2018-12-21 EP EP18215707.3A patent/EP3671739A1/fr not_active Withdrawn
-
2019
- 2019-12-20 MX MX2021007323A patent/MX2021007323A/es unknown
- 2019-12-20 EP EP19824332.1A patent/EP3899936B1/fr active Active
- 2019-12-20 WO PCT/EP2019/086565 patent/WO2020127900A1/fr active Search and Examination
- 2019-12-20 KR KR1020217023148A patent/KR102630449B1/ko active IP Right Grant
- 2019-12-20 BR BR112021012308-3A patent/BR112021012308A2/pt unknown
- 2019-12-20 ES ES19824332T patent/ES2966063T3/es active Active
- 2019-12-20 JP JP2021535739A patent/JP7314279B2/ja active Active
- 2019-12-20 CA CA3124017A patent/CA3124017C/fr active Active
- 2019-12-20 CN CN201980092879.8A patent/CN113574597B/zh active Active
-
2021
- 2021-06-21 US US17/353,297 patent/US20210312939A1/en active Pending
Also Published As
Publication number | Publication date |
---|---|
EP3671739A1 (fr) | 2020-06-24 |
BR112021012308A2 (pt) | 2021-09-08 |
ES2966063T3 (es) | 2024-04-18 |
CA3124017C (fr) | 2024-01-16 |
WO2020127900A1 (fr) | 2020-06-25 |
KR102630449B1 (ko) | 2024-01-31 |
JP7314279B2 (ja) | 2023-07-25 |
CN113574597B (zh) | 2024-04-12 |
JP2022514878A (ja) | 2022-02-16 |
CA3124017A1 (fr) | 2020-06-25 |
MX2021007323A (es) | 2021-08-24 |
EP3899936C0 (fr) | 2023-09-06 |
KR20210110622A (ko) | 2021-09-08 |
CN113574597A (zh) | 2021-10-29 |
US20210312939A1 (en) | 2021-10-07 |
EP3899936A1 (fr) | 2021-10-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3899936B1 (fr) | Séparation de sources utilisant une estimation et un contrôle de la qualité sonore | |
Choi et al. | Real-time denoising and dereverberation wtih tiny recurrent u-net | |
CN111785288B (zh) | 语音增强方法、装置、设备及存储介质 | |
Ren et al. | A Causal U-Net Based Neural Beamforming Network for Real-Time Multi-Channel Speech Enhancement. | |
US20110218803A1 (en) | Method and system for assessing intelligibility of speech represented by a speech signal | |
CN109979478A (zh) | 语音降噪方法及装置、存储介质及电子设备 | |
Braun et al. | Effect of noise suppression losses on speech distortion and ASR performance | |
CN113744749B (zh) | 一种基于心理声学域加权损失函数的语音增强方法及系统 | |
US20090161882A1 (en) | Method of Measuring an Audio Signal Perceived Quality Degraded by a Noise Presence | |
Záviška et al. | Psychoacoustically motivated audio declipping based on weighted l 1 minimization | |
US11224360B2 (en) | Systems and methods for evaluating hearing health | |
Ghorpade et al. | Single-channel speech enhancement using single dimension change accelerated particle swarm optimization for subspace partitioning | |
Uhle et al. | Speech enhancement of movie sound | |
RU2782364C1 (ru) | Устройство и способ отделения источников с использованием оценки и управления качеством звука | |
Miyazaki et al. | Theoretical analysis of parametric blind spatial subtraction array and its application to speech recognition performance prediction | |
CN110168640A (zh) | 用于增强信号中需要分量的装置和方法 | |
Lee et al. | Speech Enhancement for Virtual Meetings on Cellular Networks | |
Li et al. | Joint Noise Reduction and Listening Enhancement for Full-End Speech Enhancement | |
Muhammed Shifas et al. | Speech intelligibility enhancement based on a non-causal WaveNet-like model | |
Pourmand et al. | Computational auditory models in predicting noise reduction performance for wideband telephony applications | |
Langjahr et al. | Objective quality assessment of target speaker separation performance in multisource reverberant environment | |
US20240363133A1 (en) | Noise suppression model using gated linear units | |
KR102505653B1 (ko) | 심화신경망을 이용한 에코 및 잡음 통합 제거 방법 및 장치 | |
Freiwald et al. | Loss Functions for Deep Monaural Speech Enhancement | |
Mahé et al. | Correction of the voice timbre distortions in telephone networks: method and evaluation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: UNKNOWN |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE |
|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
17P | Request for examination filed |
Effective date: 20210616 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
DAV | Request for validation of the european patent (deleted) | ||
DAX | Request for extension of the european patent (deleted) | ||
RAP3 | Party data changed (applicant data changed or rights of an application transferred) |
Owner name: FRAUNHOFER-GESELLSCHAFT ZUR FOERDERUNG DER ANGEWANDTEN FORSCHUNG E.V. |
|
GRAP | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOSNIGR1 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: GRANT OF PATENT IS INTENDED |
|
INTG | Intention to grant announced |
Effective date: 20230322 |
|
GRAS | Grant fee paid |
Free format text: ORIGINAL CODE: EPIDOSNIGR3 |
|
GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE PATENT HAS BEEN GRANTED |
|
AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
REG | Reference to a national code |
Ref country code: GB Ref legal event code: FG4D |
|
REG | Reference to a national code |
Ref country code: CH Ref legal event code: EP |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R096 Ref document number: 602019037001 Country of ref document: DE |
|
REG | Reference to a national code |
Ref country code: IE Ref legal event code: FG4D |
|
U01 | Request for unitary effect filed |
Effective date: 20231005 |
|
U07 | Unitary effect registered |
Designated state(s): AT BE BG DE DK EE FI FR IT LT LU LV MT NL PT SE SI Effective date: 20231017 |
|
U20 | Renewal fee paid [unitary effect] |
Year of fee payment: 5 Effective date: 20231012 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: GR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20231207 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: GB Payment date: 20231218 Year of fee payment: 5 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: RS Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230906 Ref country code: NO Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20231206 Ref country code: HR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230906 Ref country code: GR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20231207 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: TR Payment date: 20231129 Year of fee payment: 5 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: IS Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20240106 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: ES Payment date: 20240102 Year of fee payment: 5 |
|
REG | Reference to a national code |
Ref country code: ES Ref legal event code: FG2A Ref document number: 2966063 Country of ref document: ES Kind code of ref document: T3 Effective date: 20240418 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: SM Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230906 Ref country code: RO Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230906 Ref country code: IS Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20240106 Ref country code: CZ Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230906 Ref country code: SK Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230906 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: PL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230906 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R097 Ref document number: 602019037001 Country of ref document: DE |
|
PLBE | No opposition filed within time limit |
Free format text: ORIGINAL CODE: 0009261 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT |
|
REG | Reference to a national code |
Ref country code: CH Ref legal event code: PL |
|
26N | No opposition filed |
Effective date: 20240607 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: MC Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230906 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: MC Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230906 |
|
REG | Reference to a national code |
Ref country code: IE Ref legal event code: MM4A |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: IE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20231220 |