EP2912660B1 - Method for determining a dictionary of base components from an audio signal - Google Patents

Method for determining a dictionary of base components from an audio signal Download PDF

Info

Publication number
EP2912660B1
EP2912660B1 EP12794680.4A EP12794680A EP2912660B1 EP 2912660 B1 EP2912660 B1 EP 2912660B1 EP 12794680 A EP12794680 A EP 12794680A EP 2912660 B1 EP2912660 B1 EP 2912660B1
Authority
EP
European Patent Office
Prior art keywords
matrix
denotes
negative
symbol
base
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
EP12794680.4A
Other languages
German (de)
French (fr)
Other versions
EP2912660A1 (en
Inventor
Cyril JODER
Felix WENNINGER
Björn SCHULLER
David Virette
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of EP2912660A1 publication Critical patent/EP2912660A1/en
Application granted granted Critical
Publication of EP2912660B1 publication Critical patent/EP2912660B1/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain

Definitions

  • the present invention relates to a method and a device for determining a dictionary of base components from an input signal.
  • the present invention relates to the processing of an acoustic signal input for the estimation of a feature vector dictionary for describing acoustic sources.
  • Audio signals are composed of a plurality of individual sound sources.
  • Music recordings for example, comprise most of the time several instruments.
  • the signal often comprises, in addition to the speech itself, other interfering sounds which are recorded by the same microphone.
  • interfering sounds can be for example ambient noise or other people talking in the same room.
  • Non-negative Matrix Factorisation has been first proposed by Paatero: "Least Squares Formulation of Robust Non-Negative Factor Analysis", Chemometrics and Intelligent Laboratory Systems 37, pp. 23-35, 1997 and has been successfully applied to a wide variety of applications since then.
  • this technique has become a standard method for audio source separation, where an input audio signal is to be separated into several signals corresponding to the different acoustic sources. It is based on a decomposition of the power spectrogram of the mixture into a non-negative combination of several spectral bases, each associated to one of the present sources.
  • Non-negative Matrix Factorization (NMF) methods have been used in that context with relatively good results.
  • the non-negative constraint which is inherent to this technique complies with the structure of the audio spectrograms, and can allow for the decomposition of a sound into some meaningful components.
  • These components form a dictionary of spectral bases which describe the signal.
  • the decomposition typically aims to estimate spectral bases corresponding to different "parts" of the spectrogram, e.g. different sounds or speakers. A separation of these parts can then be performed by a partial reconstruction of the signal, considering only the wanted components.
  • This technique has been applied by C. Joder, F. Weninger, F. Eyben, D. Virette, B. Schuller "Real-time Speech Separation by Semi-Supervised Nonnegative Matrix Factorization", Proc. International Conference on Latent Variable Analysis and Signal Separation, March 2012 , in particular, to the separation of a target speaker from noisy recordings.
  • the basic principle of NMF-based audio processing 100 as schematically illustrated in Fig. 1 is to find a locally optimal factorization of a short-time magnitude spectrogram V 103 of an audio signal 101 into two factors W and H, of which the first one W represents the spectra of the events occurring in the signal 101 and the second one H their activation over time.
  • the first factor W describes the component spectra of the source model 109.
  • the second factor H describes the activations 107 of the signal spectrogram 103 of the audio signal 101.
  • the first factor W and the second factor H are matched with the short-time magnitude spectrogram V 103 of the audio signal 101 by an optimization procedure.
  • the source model 109 is pre-defined when applying supervised NMF and a joint estimation is applied for the source model 109 when using unsupervised NMF.
  • the source signal or signals 113 can be derived from the source spectrogram 111.
  • the conventional formulation of NMF is defined as follows.
  • the matrix V defines a m ⁇ n matrix of non-negative real values.
  • the goal is to approximate this matrix by the product of two other non-negative matrices W ⁇ R + m ⁇ r and H ⁇ R + r ⁇ n , where r ⁇ m , n holds.
  • a cost function is minimized, measuring the so called "reconstruction error" D V , W ⁇ H , where the term D describes some distance or divergence function.
  • the input matrix V is given by the succession of short-time magnitude (or power) spectra of the input signal, each column of the matrix containing the values of the spectrum computed at a specific instance in time.
  • these features are given by a short-time Fourier transform of the input signal, after some window function is applied to it.
  • This matrix contains only non-negative values, because of the kind of features used.
  • the values of the matrices W and H which are estimated by the NMF are initialized by a random number generator and then updated by an iterative process.
  • the initial values can also be set according to some prior knowledge of the signal.
  • several decompositions are performed on successive mid-term windows of the signal as shown by C. Joder, F. Weninger, F. Eyben, D. Virette, B. Schuller: "Real-time Speech Separation by Semi-Supervised Nonnegative Matrix Factorization", Proc. of LVA/ICA 2012, Springer, p. 322-329 . Then, a faster convergence can be obtained by initializing the matrices according to the output of the previous decomposition.
  • some of the spectral basis can be set to a constant value, fixed by a prior learning. This can be beneficial if one of the sources is known and sufficient data is available to estimate the characteristic spectra of this source. In this case, the corresponding columns of W are not updated.
  • the methods wherein the matrix W is entirely constant during the decomposition and the method in which the matrix W is entirely updated are called supervised NMF and unsupervised NMF, respectively. In the case where only a part of the spectral basis is updated, the method is called semi-supervised NMF.
  • the NMF decomposition is illustrated in Fig. 2 by a simple example.
  • the figure represents a spectrogram 201 represented by the matrix V, a matrix of two spectral bases 202 represented by the matrix W and the corresponding temporal weights 203 represented by the matrix H.
  • the greyscale of the spectrogram 201 represents the amplitude of the Fourier coefficients.
  • the spectrogram defines an acoustic scene which can be described as the superposition of two so called "atomic sounds".
  • the matrices W and H as defined in Fig. 2 can be obtained.
  • Each column of W can be interpreted as a basis function for the spectra contained in V, when weighted with the corresponding values of H.
  • spectral bases are non-negative, they correspond to proper magnitude spectra, which can then be used to reconstruct each of the so called "atomic sounds".
  • the example of Fig. 2 is simplistic; however the NMF method can provide satisfactory results in separating different sound sources from realistic recordings. In these cases, a larger value of the order of decomposition r is used. Then, each "component”, i.e. the product of one spectral basis with the corresponding temporal weights, is assigned to a specific source. The estimated spectrogram of each source is finally obtained by the sum of all the components attributed to the source.
  • the estimation of the dictionary of spectral bases often suffers from some inaccuracies and results in components representing several sources at the same time. Indeed, this method minimizes a reconstruction error between the original input and the decomposition, without taking into account the structure of the individual signals. As a result, the estimated bases can capture some unstructured so called "building blocks" which can be used to reconstruct several sources, whereas the goal is to match each basis to a specific source.
  • several modifications of the standard NMF method have been proposed, which impose a structure by favoring some properties of the decomposition, such as temporal continuity or component sparsity.
  • the sparsity property relates to the fact that the proportion of elements with non-zero value or, more generally, of non-negligible value is very small. In particular, the sparsity of the component activations is often enforced. This property relates to the fact that few components are active at the same time.
  • FIG. 3 A simple example of the usefulness of a sparsity constraint is represented in Fig. 3 .
  • the spectrogram 300 corresponds to the succession of two musical notes, the second one having a pitch one octave higher than the first one.
  • the plots 301 and 302 are the respective spectrograms of these two notes.
  • Audio source separation informed by redundancy with greedy multiscale decompositions (Munuel Moussallam et al, 2012-08-27, pages 2644-2648 , XP032254797) describes an algorithm for audio source separation of repeated musical patterns.
  • a Time-Frequency mask usually based on the power spectral density of the mixtures is constructed for the repeating musical background and the separation is performed by means of Wiener filtering relative to this mask.
  • the invention is based on the finding that sound source estimation is improved when a Wiener entropy-constrained Non-negative Matrix Factorization (WNMF) is used for the factorization which identifies different components of an input signal.
  • WNMF Wiener entropy-constrained Non-negative Matrix Factorization
  • the features are decomposed into a sparse combination of non-negative feature bases.
  • the decomposition can be used to separate the input signal into several output signals corresponding to different components.
  • the obtained dictionary of feature bases can also be used to separate the corresponding components from another signal, by decomposing this other signal according to the elements of the dictionary.
  • aspects of the invention provide a novel method for enforcing a sparse decomposition, resulting in a dictionary of spectral bases which is more characteristic of the different parts of the signal, as will be presented in the following.
  • the invention relates to a method for determining a dictionary of base components from an audio signal, the audio signal being represented by an input matrix which columns comprise features of the audio signal at different instances in time, the method comprising: decomposing the input matrix into a product of a non-negative base matrix and a non-negative weight matrix, the decomposing being constrained by a Wiener entropy measure with respect to elements of the non-negative weight matrix, wherein components of the non-negative base matrix represent the dictionary of base components of the audio signal.
  • Wiener entropy constraint The decomposing being constrained by a Wiener entropy measure, also denoted as Wiener entropy constraint is a new constraint providing a novel method for enforcing sparsity.
  • Wiener entropy or spectral flatness measures how flat the vector is. It is used as a sparsity penalty for NMF. By using that measure meaningful spectral patterns are estimated, speech separation quality, measured by both signal-based and perceptual criteria is improved. Compared to standard NMF the complexity increase is limited.
  • the Wiener entropy constrained NMF (WNMF) can be integrated into any system using NMF.
  • the dictionary of base components represents a specific audio source of a plurality of audio sources of the audio signal.
  • a specific audio source of a multi-source audio signal can be extracted from the noisy multi-source audio signal.
  • the decomposing is performed by using a Non-negative Matrix Factorization.
  • Wiener entropy measure can thus be adapted to a standard NMF factorization with only little overhead thereby saving computational complexity.
  • the decomposing constrained by the Wiener entropy measure is configured to enforce a sparse decomposition of the non-negative base matrix.
  • the decomposing constrained by the Wiener entropy measure comprises: forming the non-negative weight matrix such that a Wiener entropy of each column of the non-negative weight matrix is close to zero.
  • the decomposing constrained by the Wiener entropy measure comprises: minimizing a weighted sum of Wiener entropy values of columns of the non-negative weight matrix by using a cost function.
  • ⁇ 1 / r 1 r ⁇ i 1 r H i , j
  • V denotes the input matrix
  • W denotes the non-negative base matrix
  • H denotes the non-negative weight matrix with elements H i,j
  • the operation ⁇ 1 denotes the vector 1-norm
  • the symbol ⁇ denotes the Hadamard product, i.e.
  • Such a cost function provides an efficient reconstruction of the original signal.
  • the method comprises: updating the cost function by one of a multiplicative update rule and a gradient descent algorithm.
  • Multiplicative update rules are easy to implement and gradient descent algorithms converge to the locally optimum solution.
  • V denotes the input matrix
  • W denotes the non-negative base matrix
  • H denotes the non-negative weight matrix with elements H i,j
  • is a real non-negative parameter
  • the symbol ⁇ denotes the Hadamard product, i.e.
  • V denotes the input matrix
  • W denotes the non-negative base matrix
  • H denotes the non-negative weight matrix with elements H i,j
  • the operation ⁇ 1 denotes the vector 1-norm
  • the symbol ⁇ denotes the Hadamard product, i.e.
  • Such a cost function provides an efficient reconstruction of the original signal and a homogeneous estimation of the components, regardless of the amplitude of the original signal
  • the method comprises: updating the cost function by one of a multiplicative update rule and a gradient descent algorithm.
  • Multiplicative update rules are easy to implement and gradient descent algorithms converge to the locally optimum solution.
  • the method comprises: reconstructing a plurality of output signals from the audio signal, the reconstruction being based on the input matrix, the non-negative base matrix and the non-negative weight matrix.
  • the reconstructed signals are noise-reduced and they indicate the source components of the original audio signal.
  • the output signals can be superposed in order to obtain signals corresponding to the combination of several components, for source separation applications.
  • magnitude spectrograms S k of the plurality of output signals are determined by a product of a column-vector W :, k constituted by the k-th column of the non-negative base matrix W and a row-vector H k ,: constituted by the k-th row of the non-negative weight matrix H.
  • the method comprises: constructing output spectrograms by summing several of the magnitude spectrograms S k of the plurality of output signals.
  • the method comprises: determining a dictionary of base components from a training speech signal according to the first aspect as such or according to any of the preceding implementation forms of the first aspect, and forming a non-negative base matrix of a noisy speech signal by extending the non-negative base matrix of the training speech signal; and updating the non-negative base matrix of the noisy speech signal by using a semi-supervised Non-negative Matrix Factorization.
  • the method comprises: reconstructing the speech signal based on the updated non-negative base matrix of the noisy speech signal.
  • the invention relates to a device for determining a dictionary of base components from an input signal represented by an input matrix, the device comprising: a buffer for storing the input matrix; and means for decomposing the input matrix into a product of a non-negative base matrix and a non-negative weight matrix, wherein the decomposing is constrained by a Wiener entropy measure and wherein components of the non-negative base matrix represent the dictionary of base components of the input signal.
  • Wiener entropy measure By using the Wiener entropy measure, meaningful spectral patterns are estimated and thus, speech separation quality, measured by both signal-based and perceptual criteria is improved. The complexity increase is not significant when compared to standard NMF implementations.
  • the Wiener entropy constrained NMF can be integrated into any device using NMF.
  • the methods and systems described herein may be implemented as software in a Digital Signal Processor (DSP), in a micro-controller or in any other side-processor or as hardware circuit within an application specific integrated circuit (ASIC).
  • the means for decomposing the input matrix may be implemented as software in a Digital Signal Processor (DSP), in a micro-controller or in any other side-processor or as a hardware unit, e.g. within an application specific integrated circuit (ASIC).
  • DSP Digital Signal Processor
  • ASIC application specific integrated circuit
  • the invention can be implemented in digital electronic circuitry, or in computer hardware, firmware, software, or in combinations thereof, e.g. in available hardware or software of conventional mobile devices or hands-free communication systems or in new hardware or software dedicated for processing the methods described herein after.
  • aspects of the invention provide a method for decomposing a signal according to a Wiener entropy-constrained Non-negative Matrix Factorization (WNMF).
  • WNMF Wiener entropy-constrained Non-negative Matrix Factorization
  • NMF Non-negative Matrix Factorization
  • the Wiener entropy also called “spectral flatness" of a set of non-negative values is the ratio between the geometric mean and the arithmetic mean of these values.
  • the Wiener entropy is always between zero and one, and it is equal to one if and only if all he values in the set are equal.
  • a large value of the Wiener entropy corresponds to a "flat” plot and a small value corresponds to a "peaky” plot.
  • the penalty term used to measure the sparsity of the decomposition is given by a weighted sum of the Wiener entropy values of the columns of the matrix H.
  • is a real non-negative parameter and the parameters ⁇ j are non-negative weighting parameters, which can depend on the matrix H.
  • the weighting parameters ⁇ j used in the above defined cost function are all set to one.
  • the optimization process stops when convergence is observed or when a sufficient number of iteration has been performed.
  • gradient-descent algorithms are applied instead of these multiplicative updates.
  • the weighting parameters ⁇ j used in the above defined cost function are all set to the mean value of the corresponding columns of the matrix H .
  • the sparsity penalty applied to each instance in time is approximately proportional to the amplitude of the input signal at the corresponding instance in time.
  • the optimization of this cost function is performed by multiplicative update rules.
  • Another advantage of this setting is that the complexity of the parameter updates is reduced compared to the previous implementation.
  • Fig. 4 shows a schematic diagram of a method 440 for determining a dictionary of base components from an audio signal by performing a WNMF decomposition according to an implementation form.
  • the method 440 performs a WNMF decomposition 400 from a digital single-channel acoustic signal 401.
  • the digital input signal 401 is input to a short-time transform module 410, which performs a windowing into short-time frames and a transform, so as to produce non-negative feature vectors 411, e.g. magnitude spectra.
  • a buffer 420 stores these features in order to produce the matrix V 421.
  • the WNMF module 430 then performs a decomposition of the matrix V 421, representing the magnitude spectra of the input signal.
  • the outputs of this module are the matrices W 431 and H 432 which represent respectively the dictionary of feature bases and the temporal weights of these bases.
  • Fig. 5 shows a schematic diagram of a system 500 for decomposing an audio signal into a dictionary of base components and reconstructing a set of audio signals according to an implementation form.
  • the system 500 comprises a factorization element 400 performing the WNMF decomposition 400 as described above with respect to Fig. 4 and a reconstruction element 510.
  • the factorization element 400 takes as input an acoustic signal 401 and estimates a dictionary of feature bases 431 and the corresponding temporal weights 432 describing the signal.
  • the result of the decomposition is input to the reconstruction module 510, which produces several output signals 511, 512 and 513.
  • the reconstruction module 510 exploits a so-called "soft mask” approach as described in the following.
  • Fork 1 ... 3, W :, k is the column-vector constituted by the k-th column of W and H k ,: is the row-vector constituted by the k-th row of H.
  • the three obtained matrices constitute the magnitude spectrograms of the three output signals.
  • the time-domain signal are then obtained by a standard approach, involving an inverse Fourier transform exploiting the phase of the original complex spectrogram, followed by an overlap-add procedure.
  • the output signals are then superposed in order to obtain signals corresponding to the combination of several components, for a source separation application.
  • the components of the system 500 described above may also be implemented as steps of a method.
  • Fig. 6 shows a schematic diagram of a system 600 for decomposing an audio signal into a dictionary of base components applied to a noisy speech signal according to an implementation form.
  • the decomposition is applied to the reduction of noise in a noisy speech signal.
  • This system 600 involves a prior training phase 610 which comprises a factorization element 400 performing the WNMF decomposition 400 as described above with respect to Fig. 4 .
  • a training speech signal 601 is input to the factorization element 400, which computes a dictionary of feature bases 611 and a matrix of temporal weights 612 corresponding to the WNMF decomposition of the training signal.
  • the system 600 further comprises a short-time transform 630, a buffer 640, a semi-supervised NMF module 650 and a reconstruction module 660.
  • a single-channel noisy speech signal 621 undergoes a short-time transform 630 which calculates non-negative features 631, similarly to the element 410 described above with respect to Fig. 4 .
  • the buffer 640 stores these features to produce a matrix V 641.
  • This matrix undergoes a decomposition using semi-supervised NMF 650, where the feature bases corresponding to speech are set to the values of the dictionary 611 given by the training phase.
  • the other bases are updated by the semi-supervised NMF.
  • the outputs of this decomposition 650 are the dictionary W 651 and the corresponding weights H 652. These matrices are used by a reconstruction element 660, which produces the de-noised speech signal.
  • H' is the matrix extracted from the matrix H 652 comprising the weights corresponding to the speech bases W s .
  • the time-domain signal is then obtained by the same approach as described above with respect to Fig. 5 for the reconstruction element 510.
  • the semi-supervised NMF 650 is replaced by a WNMF decomposition 400 as described above with respect to Fig. 4 .
  • a noise training phase similar to 610 is performed to estimate a noise feature dictionary from a training recording of noise.
  • the dictionary W 651 is defined as the concatenation of the speech dictionary 611 and the noise dictionary, and the semi-supervised NMF 650 is replaced by a supervised NMF.
  • the components of the system 600 described above may also be implemented as steps of a method.
  • Fig. 7 shows a schematic diagram of a de-noising system 700 according to an implementation form.
  • spectral components W speaker 713 and W noise 715 are estimated from clean speech W speaker 701 and noise V noise 703 separately using WNMF 707, 709. These spectral components W speaker 713 and W noise 715 are fed to a de-noising system 711, which exploits them to separate speech from noise.
  • the noise components are estimated on the noisy speech V mix 705 without noise training by the de-noising system 711 which provides the de-noised speech 717.
  • the de-noising system 711 is a supervised system. In an implementation form the de-noising system 711 is a semi-supervised system. In an implementation form the de-noising system 711 is an unsupervised NMF de-noising system where no a priori knowledge of the speech and noise models is available.
  • Fig. 8 shows a schematic diagram of a device 800 for determining a dictionary of base components 804 from an input signal 802 according to an implementation form.
  • the input signal 802 is represented by an input matrix V.
  • the device 800 comprises a buffer 803 for storing the input matrix V.
  • the device 800 further comprises decomposing means 801 for decomposing the input matrix V into a product of a non-negative base matrix W and a non-negative weight matrix H, wherein the decomposing is constrained by a Wiener entropy measure and wherein components of the non-negative base matrix W represent the dictionary of base components 804 of the input signal 802.
  • the dictionary of base components represents a specific audio source of a plurality of audio sources of the audio signal.
  • the decomposing means is configured for decomposing the input matrix V by using a Non-negative Matrix Factorization.
  • the decomposing means is configured to enforce a sparse decomposition of the non-negative base matrix W.
  • the decomposing means comprises means for forming the non-negative weight matrix H such that a Wiener entropy of each column of the non-negative weight matrix H is close to zero.
  • the decomposing means comprises means for minimizing a sum of Wiener entropy values of columns of the non-negative weight matrix H by using a cost function.
  • the device 800 comprises means for updating the cost function by one of a multiplicative update rule and a gradient descent algorithm.
  • the device 800 comprises means for reconstructing a plurality of output signals from the audio signal, the reconstruction being based on the input matrix V, the non-negative base matrix W and the non-negative weight matrix H.
  • magnitude spectrograms S k of the plurality of output signals are determined by a product of a column-vector W :, k constituted by the k-th column of the non-negative base matrix W and a row-vector H k ,: constituted by the k-th row of the non-negative weight matrix H.
  • the device 800 comprises means for determining a dictionary of base components from a training speech signal according to the method 400 as described above with respect to Fig. 4 ; and means for forming a non-negative base matrix W of a noisy speech signal by using a semi-supervised Non-negative Matrix Factorization; and means for updating the non-negative base matrix W of the noisy speech signal with the non-negative base matrix W S of the training speech signal.
  • the device 800 comprises means for reconstructing the speech signal based on the updated non-negative base matrix W of the noisy speech signal.
  • the decomposing means comprises means for minimizing a weighted sum of Wiener entropy values of columns of the non-negative weight matrix H by using a cost function, the weighting parameters of the sum being the mean values of the columns of the matrix H.
  • the present disclosure also supports a computer program product including computer executable code or computer executable instructions that, when executed, causes at least one computer to execute the performing and computing steps described herein.
  • the present disclosure also supports a system configured to execute the performing and computing steps described herein.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Circuit For Audible Band Transducer (AREA)

Description

    BACKGROUND OF THE INVENTION
  • The present invention relates to a method and a device for determining a dictionary of base components from an input signal. In particular, the present invention relates to the processing of an acoustic signal input for the estimation of a feature vector dictionary for describing acoustic sources.
  • Most audio signals are composed of a plurality of individual sound sources. Musical recordings, for example, comprise most of the time several instruments. In the case of speech communication, the signal often comprises, in addition to the speech itself, other interfering sounds which are recorded by the same microphone. Such interfering sounds can be for example ambient noise or other people talking in the same room.
  • Several applications would take advantage of the separation of audio signal into several parts. One of them is the reduction of acoustic noise in telephonic communication, especially in the case of hand-free system where the noise level is often high because of the distance between the microphone and the speaker. Another usage of source separation is the extraction of some target instrument from musical signals, for karaoke or remixing application.
  • Non-negative Matrix Factorisation (NMF) has been first proposed by Paatero: "Least Squares Formulation of Robust Non-Negative Factor Analysis", Chemometrics and Intelligent Laboratory Systems 37, pp. 23-35, 1997 and has been successfully applied to a wide variety of applications since then. In particular, this technique has become a standard method for audio source separation, where an input audio signal is to be separated into several signals corresponding to the different acoustic sources. It is based on a decomposition of the power spectrogram of the mixture into a non-negative combination of several spectral bases, each associated to one of the present sources. Non-negative Matrix Factorization (NMF) methods have been used in that context with relatively good results. Indeed, the non-negative constraint which is inherent to this technique complies with the structure of the audio spectrograms, and can allow for the decomposition of a sound into some meaningful components. These components form a dictionary of spectral bases which describe the signal. The decomposition typically aims to estimate spectral bases corresponding to different "parts" of the spectrogram, e.g. different sounds or speakers. A separation of these parts can then be performed by a partial reconstruction of the signal, considering only the wanted components. This technique has been applied by C. Joder, F. Weninger, F. Eyben, D. Virette, B. Schuller "Real-time Speech Separation by Semi-Supervised Nonnegative Matrix Factorization", Proc. International Conference on Latent Variable Analysis and Signal Separation, March 2012, in particular, to the separation of a target speaker from noisy recordings.
  • The basic principle of NMF-based audio processing 100 as schematically illustrated in Fig. 1 is to find a locally optimal factorization of a short-time magnitude spectrogram V 103 of an audio signal 101 into two factors W and H, of which the first one W represents the spectra of the events occurring in the signal 101 and the second one H their activation over time. The first factor W describes the component spectra of the source model 109. The second factor H describes the activations 107 of the signal spectrogram 103 of the audio signal 101. The first factor W and the second factor H are matched with the short-time magnitude spectrogram V 103 of the audio signal 101 by an optimization procedure. The source model 109 is pre-defined when applying supervised NMF and a joint estimation is applied for the source model 109 when using unsupervised NMF. The source signal or signals 113 can be derived from the source spectrogram 111.
  • The conventional formulation of NMF is defined as follows. The matrix V defines a m × n matrix of non-negative real values. The goal is to approximate this matrix by the product of two other non-negative matrices W R + m × r
    Figure imgb0001
    and H R + r × n ,
    Figure imgb0002
    where r << m,n holds. In mathematical terms, a cost function is minimized, measuring the so called "reconstruction error" D V , W H ,
    Figure imgb0003
    where the term D describes some distance or divergence function. When processing sounds, the input matrix V is given by the succession of short-time magnitude (or power) spectra of the input signal, each column of the matrix containing the values of the spectrum computed at a specific instance in time. In general, these features are given by a short-time Fourier transform of the input signal, after some window function is applied to it. This matrix contains only non-negative values, because of the kind of features used. Typically, the values of the matrices W and H which are estimated by the NMF are initialized by a random number generator and then updated by an iterative process. However, the initial values can also be set according to some prior knowledge of the signal. In particular for an implementation in an on-line system, several decompositions are performed on successive mid-term windows of the signal as shown by C. Joder, F. Weninger, F. Eyben, D. Virette, B. Schuller: "Real-time Speech Separation by Semi-Supervised Nonnegative Matrix Factorization", Proc. of LVA/ICA 2012, Springer, p. 322-329. Then, a faster convergence can be obtained by initializing the matrices according to the output of the previous decomposition.
  • Similarly, some of the spectral basis can be set to a constant value, fixed by a prior learning. This can be beneficial if one of the sources is known and sufficient data is available to estimate the characteristic spectra of this source. In this case, the corresponding columns of W are not updated. The methods wherein the matrix W is entirely constant during the decomposition and the method in which the matrix W is entirely updated are called supervised NMF and unsupervised NMF, respectively. In the case where only a part of the spectral basis is updated, the method is called semi-supervised NMF.
  • The NMF decomposition is illustrated in Fig. 2 by a simple example. The figure represents a spectrogram 201 represented by the matrix V, a matrix of two spectral bases 202 represented by the matrix W and the corresponding temporal weights 203 represented by the matrix H. The greyscale of the spectrogram 201 represents the amplitude of the Fourier coefficients. The spectrogram defines an acoustic scene which can be described as the superposition of two so called "atomic sounds". By applying a two-component NMF to this spectrogram, the matrices W and H as defined in Fig. 2 can be obtained. Each column of W can be interpreted as a basis function for the spectra contained in V, when weighted with the corresponding values of H.
  • Since the spectral bases are non-negative, they correspond to proper magnitude spectra, which can then be used to reconstruct each of the so called "atomic sounds". The example of Fig. 2 is simplistic; however the NMF method can provide satisfactory results in separating different sound sources from realistic recordings. In these cases, a larger value of the order of decomposition r is used. Then, each "component", i.e. the product of one spectral basis with the corresponding temporal weights, is assigned to a specific source. The estimated spectrogram of each source is finally obtained by the sum of all the components attributed to the source.
  • However, in the conventional NMF method, the estimation of the dictionary of spectral bases often suffers from some inaccuracies and results in components representing several sources at the same time. Indeed, this method minimizes a reconstruction error between the original input and the decomposition, without taking into account the structure of the individual signals. As a result, the estimated bases can capture some unstructured so called "building blocks" which can be used to reconstruct several sources, whereas the goal is to match each basis to a specific source. In order to overcome this problem, several modifications of the standard NMF method have been proposed, which impose a structure by favoring some properties of the decomposition, such as temporal continuity or component sparsity.
  • The sparsity property relates to the fact that the proportion of elements with non-zero value or, more generally, of non-negligible value is very small. In particular, the sparsity of the component activations is often enforced. This property relates to the fact that few components are active at the same time.
  • A simple example of the usefulness of a sparsity constraint is represented in Fig. 3. The spectrogram 300 corresponds to the succession of two musical notes, the second one having a pitch one octave higher than the first one. The plots 301 and 302 are the respective spectrograms of these two notes. However, without any constraint on the structure of the decomposition, an NMF factorization with order r=2 applied to the spectrogram 300 can also result in the estimation of the spectrograms 303 and 304, since they yield the same perfect reconstruction of the original signal. Enforcing the sparsity property would favor the first decomposition.
  • This constraint is generally achieved by adding a penalty term in the cost function to be minimized. The cost function then becomes D V , W H + λf H
    Figure imgb0004
    where λ is a real non-negative parameter and f is a function measuring the sparsity of the matrix H. The use of the "pure sparsity measure", that is the number of positive elements in the decomposition, as a penalty in the NMF generally leads to an intractable problem because of its lack of regularity. Thus, the common practice is to approximate this measure with the L1 norm, also called the Manhattan distance according to A. Cichocki, R. Zdunek, S. Amari, "New Algorithms for Non-negative Matrix Factorization in Application to Blind Source Separation", Proc. of IEEE ICASSP 2006. Other variants of this criterion have also been employed, such as a normalized version of the L1 norm according to T. Virtanen, "Monaural Sound Source Separation by Nonnegative Matrix Factorization with Temporal Continuity and Sparseness Criteria", IEEE Trans. on Audio, Speech and Signal Process., vol. 15(3), pp. 1066-1074, 2007 or the ratio between the L1 and the L2 norm, according to P. Hoyer, "Non-negative Matrix Factorization with Sparseness Constraints", Journal of Machine Learning Research, Vol. 5, pp. 1457-1469, 2004.
    Document "Audio source separation informed by redundancy with greedy multiscale decompositions" (Munuel Moussallam et al, 2012-08-27, pages 2644-2648, XP032254797) describes an algorithm for audio source separation of repeated musical patterns. A Time-Frequency mask usually based on the power spectral density of the mixtures is constructed for the repeating musical background and the separation is performed by means of Wiener filtering relative to this mask.
    Document "Sparse nonnegative matrix factorization with constraints" (Robert Peharz et al, 2012-03-15, XP028356707) discloses nonnegative matrix factorization to factorize a nonnegative matrix X into a product of nonnegative matrices W and H with ℓ°-constraints.
  • SUMMARY OF THE INVENTION
  • It is the object of the invention to provide a concept for improving sound source estimation when using Non-Negative Matrix Factorization decompositions.
  • This object is achieved by the features of the independent claims. Further implementation forms are apparent from the dependent claims, the description and the figures.
  • The invention is based on the finding that sound source estimation is improved when a Wiener entropy-constrained Non-negative Matrix Factorization (WNMF) is used for the factorization which identifies different components of an input signal. Applying this technique to non-negative features describing an input signal, such as magnitude spectra, the features are decomposed into a sparse combination of non-negative feature bases. The decomposition can be used to separate the input signal into several output signals corresponding to different components. The obtained dictionary of feature bases can also be used to separate the corresponding components from another signal, by decomposing this other signal according to the elements of the dictionary.
  • Aspects of the invention provide a novel method for enforcing a sparse decomposition, resulting in a dictionary of spectral bases which is more characteristic of the different parts of the signal, as will be presented in the following.
  • In order to describe the invention in detail, the following terms, abbreviations and notations will be used:
  • audio rendering:
    a reproduction technique capable of creating spatial sound fields in an extended area by means of loudspeakers or loudspeaker arrays,
    NMF:
    Non-negative matrix factorization,
    WNMF:
    Wiener entropy-constrained Non-negative Matrix Factorization.
    Vector 1-norm:
    The vector 1-norm is the matrix norm of an m times n matrix A defined as the sum of the absolute values of its elements, A 1 = i = 1 m j = 1 m a i , j
    Figure imgb0005
    Hadamard product:
    The Hadamard product is a binary operation that takes two matrices of the same dimensions, and produces another matrix where each element ij is the product of elements ij of the original two matrices.
  • According to a first aspect, the invention relates to a method for determining a dictionary of base components from an audio signal, the audio signal being represented by an input matrix which columns comprise features of the audio signal at different instances in time, the method comprising: decomposing the input matrix into a product of a non-negative base matrix and a non-negative weight matrix, the decomposing being constrained by a Wiener entropy measure with respect to elements of the non-negative weight matrix, wherein components of the non-negative base matrix represent the dictionary of base components of the audio signal.
  • The decomposing being constrained by a Wiener entropy measure, also denoted as Wiener entropy constraint is a new constraint providing a novel method for enforcing sparsity. The Wiener entropy or spectral flatness measures how flat the vector is. It is used as a sparsity penalty for NMF. By using that measure meaningful spectral patterns are estimated, speech separation quality, measured by both signal-based and perceptual criteria is improved. Compared to standard NMF the complexity increase is limited. The Wiener entropy constrained NMF (WNMF) can be integrated into any system using NMF.
  • In a first possible implementation form of the method according to the first aspect, the dictionary of base components represents a specific audio source of a plurality of audio sources of the audio signal.
  • Thus, a specific audio source of a multi-source audio signal can be extracted from the noisy multi-source audio signal.
  • In a second possible implementation form of the method according to the first aspect as such or according to the first implementation form of the first aspect, the decomposing is performed by using a Non-negative Matrix Factorization.
  • The Wiener entropy measure can thus be adapted to a standard NMF factorization with only little overhead thereby saving computational complexity.
  • In a third possible implementation form of the method according to the first aspect as such or according to any of the preceding implementation forms of the first aspect, the decomposing constrained by the Wiener entropy measure is configured to enforce a sparse decomposition of the non-negative base matrix.
  • Computing with sparse matrices improves speed and reduces complexity.
  • In a fourth possible implementation form of the method according to the first aspect as such or according to any of the preceding implementation forms of the first aspect, the decomposing constrained by the Wiener entropy measure comprises: forming the non-negative weight matrix such that a Wiener entropy of each column of the non-negative weight matrix is close to zero.
  • By that specific forming of the H matrix, reconstruction of the original signal is improved.
  • In a fifth possible implementation form of the method according to the first aspect as such or according to any of the preceding implementation forms of the first aspect, the decomposing constrained by the Wiener entropy measure comprises: minimizing a weighted sum of Wiener entropy values of columns of the non-negative weight matrix by using a cost function.
  • By using a cost function iterative or recursive adaptations can be applied which are computational efficient. Reconstruction of the original signal is improved.
  • In a sixth possible implementation form of the method according to the fifth implementation form of the first aspect, the cost function is according to V ln V W H V + W H 1 + λ j = 1 n i = 1 r | H i , j | ϵ 1 / r 1 r i = 1 r H i , j
    Figure imgb0006
    where V denotes the input matrix, W denotes the non-negative base matrix, H denotes the non-negative weight matrix with elements Hi,j, the operation ∥·∥1 denotes the vector 1-norm, the symbol ⊗ denotes the Hadamard product, i.e. element-wise multiplication and the symbol ÷
    Figure imgb0007
    denotes the element-wise division, A is a real non-negative parameter, the symbol ε denotes a (small) positive real number and the operator [·] ε is defined by x ϵ = max x ϵ .
    Figure imgb0008
  • Such a cost function provides an efficient reconstruction of the original signal.
  • In a seventh possible implementation form of the method according to the fifth implementation form or according to the sixth implementation form of the first aspect, the method comprises: updating the cost function by one of a multiplicative update rule and a gradient descent algorithm.
  • Multiplicative update rules are easy to implement and gradient descent algorithms converge to the locally optimum solution.
  • In an eighth possible implementation form of the method according to the seventh implementation form of the first aspect, the multiplicative update rule is according to: W = W V W H H T I m , n H T and H = H W T V W H + λ G rA A W T I m , n + λ G rA H
    Figure imgb0009
    where V denotes the input matrix, W denotes the non-negative base matrix, H denotes the non-negative weight matrix with elements Hi,j, λ is a real non-negative parameter, the symbol ⊗ denotes the Hadamard product, i.e. element-wise multiplication,
    Figure imgb0010
    denotes a matrix of dimension m x n whose elements are all equal to one, A denotes a matrix of dimension r × n, defined by: A i , j = 1 r k = 1 r H k , j
    Figure imgb0011
    and G denotes a matrix of dimension r × n, defined by: G i , j = i = 1 r H k , j ϵ 1 / r .
    Figure imgb0012
    where the symbol ε denotes a positive real number and the operator [·] ε is defined by x ϵ = max x ϵ .
    Figure imgb0013
  • These multiplicative update rules are easy to implement and fast converging.
  • In a ninth possible implementation form of the method according to the fifth implementation form of the first aspect, the cost function is according to D V , W H + λ i = 1 r H i , j ϵ 1 / r
    Figure imgb0014
    where V denotes the input matrix, W denotes the non-negative base matrix, H denotes the non-negative weight matrix with elements Hi,j, the operation ∥·∥1 denotes the vector 1-norm, the symbol ⊗ denotes the Hadamard product, i.e. element-wise multiplication and the symbol ÷
    Figure imgb0015
    denotes the element-wise division, λ is a real non-negative parameter, the symbol ε denotes a (small) positive real number and the operator [·] ε is defined by x ϵ = max x ϵ .
    Figure imgb0016
  • Such a cost function provides an efficient reconstruction of the original signal and a homogeneous estimation of the components, regardless of the amplitude of the original signal
  • In a tenth possible implementation form of the method according to the ninth implementation form or according to the sixth implementation form of the first aspect, the method comprises: updating the cost function by one of a multiplicative update rule and a gradient descent algorithm.
  • Multiplicative update rules are easy to implement and gradient descent algorithms converge to the locally optimum solution.
  • In an eleventh possible implementation form of the method according to the tenth implementation form of the first aspect, the multiplicative update rule is according to: W = W V W H H T I m , n H T and H = H W T V W H W T I m , n + λ G rH ,
    Figure imgb0017
    where V denotes the input matrix, W denotes the non-negative base matrix, H denotes the non-negative weight matrix with elements Hi,j, λ is a real non-negative parameter, the symbol ⊗ denotes the Hadamard product, i.e. element-wise multiplication,
    Figure imgb0010
    denotes a matrix of dimension m × n whose elements are all equal to one, A denotes a matrix of dimension r × n, defined by: A i , j = 1 r k = 1 r H k , j
    Figure imgb0019
    and G denotes a matrix of dimension r × n, defined by: G i , j = i = 1 r H k , j ϵ 1 / r .
    Figure imgb0020
    where the symbol ε denotes a positive real number and the operator [·] ε is defined by x ϵ = max x ϵ .
    Figure imgb0021
  • These multiplicative update rules are easy to implement and fast converging.
  • In a twelfth possible implementation form of the method according to the first aspect as such or according to any of the preceding implementation forms of the first aspect, the method comprises: reconstructing a plurality of output signals from the audio signal, the reconstruction being based on the input matrix, the non-negative base matrix and the non-negative weight matrix.
  • The reconstructed signals are noise-reduced and they indicate the source components of the original audio signal.
  • In a thirteenth possible implementation form of the method according to the twelfth implementation form of the first aspect, magnitude spectrograms Sk of the plurality of output signals are determined according to: S k = W : , k H k , : W H V ,
    Figure imgb0022
    where V denotes the input matrix, W denotes the non-negative base matrix, H denotes the non-negative weight matrix, W :,k denotes the column-vector constituted by the k-th column of W and H k,: denotes the row-vector constituted by the k-th row of H and the symbol ⊗ denotes the Hadamard product, i.e. element-wise multiplication.
  • The output signals can be superposed in order to obtain signals corresponding to the combination of several components, for source separation applications.
  • In a fourteenth possible implementation form of the method according to the twelfth implementation form of the first aspect, magnitude spectrograms Sk of the plurality of output signals are determined by a product of a column-vector W :,k constituted by the k-th column of the non-negative base matrix W and a row-vector H k,: constituted by the k-th row of the non-negative weight matrix H.
  • When the output signals are directly reconstructed, computational complexity is reduced.
  • In a fifteenth possible implementation form of the method according to the thirteenth implementation form or according to the fourteenth implementation form of the first aspect, the method comprises: constructing output spectrograms by summing several of the magnitude spectrograms Sk of the plurality of output signals.
  • In a sixteenth possible implementation form of the method according to the first aspect as such or according to any of the preceding implementation forms of the first aspect, the method comprises: determining a dictionary of base components from a training speech signal according to the first aspect as such or according to any of the preceding implementation forms of the first aspect, and forming a non-negative base matrix of a noisy speech signal by extending the non-negative base matrix of the training speech signal; and updating the non-negative base matrix of the noisy speech signal by using a semi-supervised Non-negative Matrix Factorization.
  • When using a training speech signal, source separation is improved as a speech signal which is not corrupted by noise is used for determining the dictionary of base components.
  • In a seventeenth possible implementation form of the method according to the sixteenth implementation form of the first aspect, the method comprises: reconstructing the speech signal based on the updated non-negative base matrix of the noisy speech signal.
  • Accuracy of the reconstruction is improved when the reconstruction is based on the updated base matrix of the noisy speech signal.
  • According to a second aspect, the invention relates to a device for determining a dictionary of base components from an input signal represented by an input matrix, the device comprising: a buffer for storing the input matrix; and means for decomposing the input matrix into a product of a non-negative base matrix and a non-negative weight matrix, wherein the decomposing is constrained by a Wiener entropy measure and wherein components of the non-negative base matrix represent the dictionary of base components of the input signal.
  • By using the Wiener entropy measure, meaningful spectral patterns are estimated and thus, speech separation quality, measured by both signal-based and perceptual criteria is improved. The complexity increase is not significant when compared to standard NMF implementations. The Wiener entropy constrained NMF can be integrated into any device using NMF.
  • The methods and systems described herein may be implemented as software in a Digital Signal Processor (DSP), in a micro-controller or in any other side-processor or as hardware circuit within an application specific integrated circuit (ASIC). The means for decomposing the input matrix may be implemented as software in a Digital Signal Processor (DSP), in a micro-controller or in any other side-processor or as a hardware unit, e.g. within an application specific integrated circuit (ASIC).
  • The invention can be implemented in digital electronic circuitry, or in computer hardware, firmware, software, or in combinations thereof, e.g. in available hardware or software of conventional mobile devices or hands-free communication systems or in new hardware or software dedicated for processing the methods described herein after.
  • Aspects of the invention provide a method for decomposing a signal according to a Wiener entropy-constrained Non-negative Matrix Factorization (WNMF). This method brings a modification to Non-negative Matrix Factorization (NMF) to enforce a sparse decomposition of a non-negative matrix.
  • The Wiener entropy, also called "spectral flatness", of a set of non-negative values is the ratio between the geometric mean and the arithmetic mean of these values. The Wiener entropy is always between zero and one, and it is equal to one if and only if all he values in the set are equal. Intuitively, a large value of the Wiener entropy corresponds to a "flat" plot and a small value corresponds to a "peaky" plot. Hence, to enforce the sparsity property of the NMF decomposition, it has to be ensured that the Wiener entropy of each column of H is small.
  • In the WNMF method, the penalty term used to measure the sparsity of the decomposition is given by a weighted sum of the Wiener entropy values of the columns of the matrix H. The cost function to be minimized is then D V , W H + λ j = 1 n ω j i = 1 r H i , j ϵ 1 / r 1 r i = 1 r H i , j ,
    Figure imgb0023
    where Hi,j is the value in the i-th row and j-th column of the matrix H. λ is a real non-negative parameter and the parameters ωj are non-negative weighting parameters, which can depend on the matrix H. The symbol ε denotes a (small) positive real number and the operator [·] ε is defined by x ϵ = max x ϵ .
    Figure imgb0024
  • This maximum ensures that the penalty term is positive, and that sparsity is enforced even when one of the weights is equal to zero. A variety of functions can be used for measuring the reconstruction error. In an implementation form, the reconstruction error is defined as D V , W H = V ln V W H V + W H 1 ,
    Figure imgb0025
    where the operation ∥·∥1 denotes the vector 1-norm, the symbol ⊗ denotes the Hadamard product, i.e. element-wise multiplication and ÷
    Figure imgb0026
    is the element-wise division.
  • In an implementation form, the weighting parameters ωj used in the above defined cost function are all set to one.
  • In an implementation form, the optimization of this cost function is performed by multiplicative update rules, which enforce non-negativity without needing explicit constraints. In an implementation form, A and G are two matrices of dimensions r × n, defined by: A i , j = 1 r k = 1 r H k , j
    Figure imgb0027
    and G i , j = i = 1 r H k , j ϵ 1 / r .
    Figure imgb0028
  • The updates of the decomposition are performed according to: W = W V W H H T I m , n H T and H = H W T V W H + λ G rA A W T I m , n + λ G rA H ,
    Figure imgb0029
    where
    Figure imgb0010
    is a matrix of dimensions m × n whose elements are all equal to one. The optimization process stops when convergence is observed or when a sufficient number of iteration has been performed.
  • In an alternative implementation form, gradient-descent algorithms are applied instead of these multiplicative updates.
  • In an alternative implementation form, the weighting parameters ωj used in the above defined cost function are all set to the mean value of the corresponding columns of the matrix H. ω j = 1 r i = 1 r H i , j
    Figure imgb0031
  • Hence, the sparsity penalty applied to each instance in time is approximately proportional to the amplitude of the input signal at the corresponding instance in time.
  • This ensures that the relative orders of magnitude of the sparsity term and the reconstruction error term are homogeneous over time. Thus, the relative importance of both constraints does not depend on the amplitude of the input signal. In this case, the cost function simplifies to: D V , W H + λ i = 1 r H i , j ϵ 1 / r .
    Figure imgb0032
  • In an implementation the optimization of this cost function is performed by multiplicative update rules. The updates of the decomposition are performed according to: W = W V W H H T I m , n H T and H = H W T V W H W T I m , n + λ G rH .
    Figure imgb0033
  • Another advantage of this setting is that the complexity of the parameter updates is reduced compared to the previous implementation.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • Further embodiments of the invention will be described with respect to the following figures, in which:
    • Fig. 1 shows a schematic diagram 100 of a conventional non-negative Matrix Factorization (NMF) technique;
    • Fig. 2 shows three schematic diagrams 201, 202, 203 representing V, W and H matrices of a conventional Non-negative Matrix Factorization decomposition;
    • Fig. 3 shows exemplary spectrograms of two musical notes 301, 302, a succession of the two musical notes 300 and reconstructions 303, 304 of the two musical notes reconstructed by using a conventional NMF factorization;
    • Fig. 4 shows a schematic diagram of a method 440 for determining a dictionary of base components from an audio signal by performing a WNMF decomposition according to an implementation form;
    • Fig. 5 shows a schematic diagram of a method 500 for decomposing an audio signal into a dictionary of base components and reconstructing a set of audio signals according to an implementation form;
    • Fig. 6 shows a schematic diagram of a method 600 for decomposing an audio signal into a dictionary of base components applied to a noisy speech signal according to an implementation form;
    • Fig. 7 shows a schematic diagram of a de-noising system 700 according to an implementation form; and
    • Fig. 8 shows a schematic diagram of a device 800 for determining a dictionary of base components 804 from an audio signal 802 according to an implementation form.
    DETAILED DESCRIPTION OF EMBODIMENTS OF THE INVENTION
  • Fig. 4 shows a schematic diagram of a method 440 for determining a dictionary of base components from an audio signal by performing a WNMF decomposition according to an implementation form.
  • The method 440 performs a WNMF decomposition 400 from a digital single-channel acoustic signal 401. The digital input signal 401 is input to a short-time transform module 410, which performs a windowing into short-time frames and a transform, so as to produce non-negative feature vectors 411, e.g. magnitude spectra. A buffer 420 stores these features in order to produce the matrix V 421. The WNMF module 430 then performs a decomposition of the matrix V 421, representing the magnitude spectra of the input signal. The outputs of this module are the matrices W 431 and H 432 which represent respectively the dictionary of feature bases and the temporal weights of these bases.
  • Fig. 5 shows a schematic diagram of a system 500 for decomposing an audio signal into a dictionary of base components and reconstructing a set of audio signals according to an implementation form. The system 500 is adapted for separating a single-channel acoustic signal into several components (here r = 3). The system 500 comprises a factorization element 400 performing the WNMF decomposition 400 as described above with respect to Fig. 4 and a reconstruction element 510. The factorization element 400 takes as input an acoustic signal 401 and estimates a dictionary of feature bases 431 and the corresponding temporal weights 432 describing the signal. The result of the decomposition is input to the reconstruction module 510, which produces several output signals 511, 512 and 513.
  • In an implementation form, the reconstruction module 510 exploits a so-called "soft mask" approach as described in the following. Fork = 1 ... 3, W :,k is the column-vector constituted by the k-th column of W and H k,: is the row-vector constituted by the k-th row of H. A magnitude spectrogram Sk is calculated as: S k = W : , k H k , : W H V
    Figure imgb0034
  • The three obtained matrices constitute the magnitude spectrograms of the three output signals. The time-domain signal are then obtained by a standard approach, involving an inverse Fourier transform exploiting the phase of the original complex spectrogram, followed by an overlap-add procedure.
  • In an implementation form, the output signals are then superposed in order to obtain signals corresponding to the combination of several components, for a source separation application.
  • In another implementation form, the magnitude spectrogram of the output signals are directly reconstructed as Sk = W :,k · H k,: .
  • The components of the system 500 described above may also be implemented as steps of a method.
  • Fig. 6 shows a schematic diagram of a system 600 for decomposing an audio signal into a dictionary of base components applied to a noisy speech signal according to an implementation form. The decomposition is applied to the reduction of noise in a noisy speech signal. This system 600 involves a prior training phase 610 which comprises a factorization element 400 performing the WNMF decomposition 400 as described above with respect to Fig. 4. In the training phase, a training speech signal 601 is input to the factorization element 400, which computes a dictionary of feature bases 611 and a matrix of temporal weights 612 corresponding to the WNMF decomposition of the training signal.
  • The system 600 further comprises a short-time transform 630, a buffer 640, a semi-supervised NMF module 650 and a reconstruction module 660. A single-channel noisy speech signal 621 undergoes a short-time transform 630 which calculates non-negative features 631, similarly to the element 410 described above with respect to Fig. 4. The buffer 640 stores these features to produce a matrix V 641. This matrix undergoes a decomposition using semi-supervised NMF 650, where the feature bases corresponding to speech are set to the values of the dictionary 611 given by the training phase. The other bases are updated by the semi-supervised NMF. The outputs of this decomposition 650 are the dictionary W 651 and the corresponding weights H 652. These matrices are used by a reconstruction element 660, which produces the de-noised speech signal.
  • The reconstruction is performed by the "soft mask" method. In an implementation form, H' is the matrix extracted from the matrix H 652 comprising the weights corresponding to the speech bases Ws . The magnitude spectrogram S of the de-noised output signal is calculated as: S = W : s H s W H V .
    Figure imgb0035
  • The time-domain signal is then obtained by the same approach as described above with respect to Fig. 5 for the reconstruction element 510.
  • In an implementation form, the semi-supervised NMF 650 is replaced by a WNMF decomposition 400 as described above with respect to Fig. 4.
  • In yet another implementation form, a noise training phase similar to 610 is performed to estimate a noise feature dictionary from a training recording of noise. In this case, the dictionary W 651 is defined as the concatenation of the speech dictionary 611 and the noise dictionary, and the semi-supervised NMF 650 is replaced by a supervised NMF.
  • The components of the system 600 described above may also be implemented as steps of a method.
  • Fig. 7 shows a schematic diagram of a de-noising system 700 according to an implementation form.
  • In a training phase, spectral components W speaker 713 and W noise 715 are estimated from clean speech W speaker 701 and noise V noise 703 separately using WNMF 707, 709. These spectral components W speaker 713 and W noise 715 are fed to a de-noising system 711, which exploits them to separate speech from noise. The noise components are estimated on the noisy speech V mix 705 without noise training by the de-noising system 711 which provides the de-noised speech 717.
  • In an implementation form, the de-noising system 711 is a supervised system. In an implementation form the de-noising system 711 is a semi-supervised system. In an implementation form the de-noising system 711 is an unsupervised NMF de-noising system where no a priori knowledge of the speech and noise models is available.
  • Fig. 8 shows a schematic diagram of a device 800 for determining a dictionary of base components 804 from an input signal 802 according to an implementation form. The input signal 802 is represented by an input matrix V. The device 800 comprises a buffer 803 for storing the input matrix V. The device 800 further comprises decomposing means 801 for decomposing the input matrix V into a product of a non-negative base matrix W and a non-negative weight matrix H, wherein the decomposing is constrained by a Wiener entropy measure and wherein components of the non-negative base matrix W represent the dictionary of base components 804 of the input signal 802.
  • In an implementation form, the dictionary of base components represents a specific audio source of a plurality of audio sources of the audio signal. In an implementation form, the decomposing means is configured for decomposing the input matrix V by using a Non-negative Matrix Factorization. In an implementation form, the decomposing means is configured to enforce a sparse decomposition of the non-negative base matrix W. In an implementation form, the decomposing means comprises means for forming the non-negative weight matrix H such that a Wiener entropy of each column of the non-negative weight matrix H is close to zero. In an implementation form, the decomposing means comprises means for minimizing a sum of Wiener entropy values of columns of the non-negative weight matrix H by using a cost function. In an implementation form, the cost function is according to V ln V W H V + W H 1 + λ j = 1 n ω j i = 1 r H i , j ϵ 1 / r 1 r i = 1 r H i , j ,
    Figure imgb0036
    where V denotes the input matrix, W denotes the non-negative base matrix, H denotes the non-negative weight matrix with elements Hi,j, the operation ∥·∥1 denotes the vector 1-norm, the symbol ⊗ denotes the Hadamard product, i.e. element-wise multiplication and the symbol ÷
    Figure imgb0037
    denotes the element-wise division. In an implementation form, the device 800 comprises means for updating the cost function by one of a multiplicative update rule and a gradient descent algorithm. In an implementation form, the multiplicative update rule is according to: W = W V W H H T I m , n H T and H = H W T V W H + λ G rA A W T I m , n + λ G rA H ,
    Figure imgb0038
    where V denotes the input matrix, W denotes the non-negative base matrix, H denotes the non-negative weight matrix with elements Hi,j,, the symbol ⊗ denotes the Hadamard product, i.e. element-wise multiplication,
    Figure imgb0010
    denotes a matrix of dimension m × n whose elements are all equal to one, A denotes a matrix of dimension r × n, defined by: A i , j = 1 r k = 1 r H k , j
    Figure imgb0040
    and G denotes a matrix of dimension r × n, defined by: G i , j = i = 1 r H k , j ϵ 1 / r .
    Figure imgb0041
  • In an implementation form, the device 800 comprises means for reconstructing a plurality of output signals from the audio signal, the reconstruction being based on the input matrix V, the non-negative base matrix W and the non-negative weight matrix H. In an implementation form, magnitude spectrograms Sk of the plurality of output signals are determined according to: S k = W : , k H k , : W H V ,
    Figure imgb0042
    where V denotes the input matrix, W denotes the non-negative base matrix, H denotes the non-negative weight matrix, W :,k denotes the column-vector constituted by the k-th column of W and H k,: denotes the row-vector constituted by the k-th row of H and the symbol ⊗ denotes the Hadamard product, i.e. element-wise multiplication.
  • In an implementation form, magnitude spectrograms Sk of the plurality of output signals are determined by a product of a column-vector W :,k constituted by the k-th column of the non-negative base matrix W and a row-vector H k,: constituted by the k-th row of the non-negative weight matrix H.
  • In an implementation form, the device 800 comprises means for determining a dictionary of base components from a training speech signal according to the method 400 as described above with respect to Fig. 4; and means for forming a non-negative base matrix W of a noisy speech signal by using a semi-supervised Non-negative Matrix Factorization; and means for updating the non-negative base matrix W of the noisy speech signal with the non-negative base matrix WS of the training speech signal. In an implementation form, the device 800 comprises means for reconstructing the speech signal based on the updated non-negative base matrix W of the noisy speech signal.
  • In another implementation form, the decomposing means comprises means for minimizing a weighted sum of Wiener entropy values of columns of the non-negative weight matrix H by using a cost function, the weighting parameters of the sum being the mean values of the columns of the matrix H. In an implementation form, the cost function is according to V ln V W H V + W H 1 + λ i = 1 r H i , j ϵ 1 / r .
    Figure imgb0043
  • In an implementation form, the device 800 comprises means for updating the cost function by multiplicative update rule according to: W = W V W H H T I m , n H T and H = H W T V W H W T I m , n + λ G rH .
    Figure imgb0044
  • From the foregoing, it will be apparent to those skilled in the art that a variety of methods, systems, computer programs on recording media, and the like, are provided.
  • The present disclosure also supports a computer program product including computer executable code or computer executable instructions that, when executed, causes at least one computer to execute the performing and computing steps described herein.
  • The present disclosure also supports a system configured to execute the performing and computing steps described herein.
  • Many alternatives, modifications, and variations will be apparent to those skilled in the art in light of the above teachings. Of course, those skilled in the art readily recognize that there are numerous applications of the invention beyond those described herein. While the present invention has been described with reference to one or more particular embodiments, those skilled in the art recognize that many changes may be made thereto . It is therefore to be understood that within the scope of the appended claims, the invention may be practiced otherwise than as specifically described herein.

Claims (13)

  1. A method (440) for determining a dictionary of base components (431) from an audio signal (401), the audio signal (401) being represented by an input matrix (V) which columns comprise features of the audio signal (401) at different instances in time, the method (440) comprising:
    decomposing (430) the input matrix (V) into a product of a non-negative base matrix (W) and a non-negative weight matrix (H), the decomposing (430) being constrained by a Wiener entropy measure with respect to elements of the non-negative weight matrix (H), wherein components of the non-negative base matrix (W) represent the dictionary of base components (431) of the audio signal (401;
    wherein the decomposing (430) constrained by the Wiener entropy measure comprises:
    minimizing a weighted sum of Wiener entropy values of columns of the non-negative weight matrix (H) by using a cost function, and
    wherein the cost function is according to V ln V W H V + W H 1 + λ j = 1 n ω j i = 1 r H i , j ϵ 1 / r 1 r i = 1 r H i , j
    Figure imgb0045
    where V denotes the input matrix, W denotes the non-negative base matrix, H denotes the non-negative weight matrix with elements Hi,j, the operation ∥·∥1 denotes the vector 1-norm, the symbol ⊗ denotes the Hadamard product, i.e. element-wise multiplication, the symbol ÷
    Figure imgb0046
    denotes the element-wise division , λ is a real non-negative parameter, ωj denote non-negative weighting parameters which can depend on the matrix H, the symbol ε denotes a positive real number and the operator [·] ε is defined as x ϵ = max x ϵ .
    Figure imgb0047
  2. The method (440) of claim 1, wherein the dictionary of base components (431) represents a specific audio source of a plurality of audio sources of the audio signal (401).
  3. The method (440) of claim 1 or claim 2, wherein the decomposing (430) uses a Non-negative Matrix Factorization.
  4. The method (440) of one of the preceding claims, wherein the decomposing (430) constrained by the Wiener entropy measure is configured to enforce a sparse decomposition of the non-negative base matrix (W).
  5. The method (440) of one of the preceding claims, wherein the decomposing (430) constrained by the Wiener entropy measure comprises:
    forming the non-negative weight matrix (H) such that a Wiener entropy of each column of the non-negative weight matrix (H) is close to zero.
  6. The method (440) of claim 1, comprising:
    updating the cost function by one of a multiplicative update rule and a gradient descent algorithm.
  7. The method (440) of claim 6, wherein the multiplicative update rule is according to: W = W V W H H τ I m , n H τ and H = H W τ V W H + λ G rA A W τ I m , n + λ G rA H ,
    Figure imgb0048
    where V denotes the input matrix, W denotes the non-negative base matrix, H denotes the non-negative weight matrix with elements Hi,j, λ is a real non-negative parameter, the symbol ⊗ denotes the Hadamard product, i.e. element-wise multiplication,
    Figure imgb0049
    denotes a matrix of dimension m × n whose elements are all equal to one, A denotes a matrix of dimension r × n, defined by: A i , j = 1 r k = 1 r H k , j
    Figure imgb0050
    and G denotes a matrix of dimension r × n, defined by: G i , j = k = 1 r H k , j ϵ 1 / r .
    Figure imgb0051
    where the symbol ε denotes a positive real number and the operator [·] ε is defined as x ϵ = max x ϵ .
    Figure imgb0052
  8. The method (440) of claim 6, wherein the multiplicative update rule is according to: W = W V W H H τ I m , n H τ and H = H W τ V W H W τ I m , n + λ G rH ,
    Figure imgb0053
    where V denotes the input matrix, W denotes the non-negative base matrix, H denotes the non-negative weight matrix with elements Hi,j, λ is a real non-negative parameter, the symbol ⊗ denotes the Hadamard product, i.e. element-wise multiplication,
    Figure imgb0049
    denotes a matrix of dimension m × n whose elements are all equal to one, G denotes a matrix of dimension r × n, defined by: G i , j = k = 1 r H k , j ϵ 1 / r ,
    Figure imgb0055
    where the symbol ε denotes a positive real number and the operator [·] ε is defined as x ϵ = max x ϵ .
    Figure imgb0056
  9. The method (500) of one of the preceding claims, comprising:
    reconstructing (510) a plurality of output signals (511, 512, 513) from the audio signal (401), the reconstruction (510) being based on the input matrix (V), the non-negative base matrix (W) and the non-negative weight matrix (H).
  10. The method (500) of claim 9, wherein magnitude spectrograms Sk of the plurality of output signals (511, 512, 513) are determined according to: S k = W : , k H k , : W H V ,
    Figure imgb0057
    where V denotes the input matrix, W denotes the non-negative base matrix, H denotes the non-negative weight matrix, W :,k denotes the column-vector constituted by the k-th column of W and H k,: denotes the row-vector constituted by the k-th row of H and the symbol ⊗ denotes the Hadamard product, i.e. element-wise multiplication; or
    wherein magnitude spectrograms Sk of the plurality of output signals (511, 512, 513) are determined by a product of a column-vector W :,k constituted by the k-th column of the non-negative base matrix W and a row-vector H k,: constituted by the k-th row of the non-negative weight matrix H.
  11. The method (500) of claim 10, comprising:
    constructing output spectrograms by summing several of the magnitude spectrograms Sk of the plurality of output signals (511, 512, 513).
  12. The method (600) of one of the preceding claims, comprising:
    determining (610) a dictionary of base components (611) from a training speech signal (601) according to one of the methods 1 to 13; and
    forming (651) a non-negative base matrix (W) of a noisy speech signal (621) by extending the non-negative base matrix (WS) of the training speech signal (601); and
    updating the non-negative base matrix (W) of the noisy speech signal (621) by using a semi-supervised Non-negative Matrix Factorization.
  13. Device (800) for determining a dictionary of base components (804) from an input signal (802) represented by an input matrix (V), the device (800) comprising:
    a buffer (803) for storing the input matrix (V); and
    means (801) for decomposing the input matrix (V) into a product of a non-negative base matrix (W) and a non-negative weight matrix (H), wherein the decomposing is constrained by a Wiener entropy measure and wherein components of the non-negative base matrix (W) represent the dictionary of base components (804) of the input signal (802);
    wherein the decomposing constrained by the Wiener entropy measure comprises:
    minimizing a weighted sum of Wiener entropy values of columns of the non-negative weight matrix (H) by using a cost function, and
    wherein the cost function is according to V ln V W H V + W H 1 + λ j = 1 n ω j i = 1 r H i , j ϵ 1 / r 1 r i = 1 r H i , j
    Figure imgb0058
    where V denotes the input matrix, W denotes the non-negative base matrix, H denotes the non-negative weight matrix with elements Hi,j, the operation ∥·∥1 denotes the vector 1-norm, the symbol ⊗ denotes the Hadamard product, i.e. element-wise multiplication, the symbol ÷
    Figure imgb0059
    denotes the element-wise division , λ is a real non-negative parameter, ωj denote non-negative weighting parameters which can depend on the matrix H, the symbol ε denotes a positive real number and the operator [·] ε is defined as x ϵ = max x ϵ .
    Figure imgb0060
EP12794680.4A 2012-11-21 2012-11-21 Method for determining a dictionary of base components from an audio signal Active EP2912660B1 (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2012/073149 WO2014079484A1 (en) 2012-11-21 2012-11-21 Method for determining a dictionary of base components from an audio signal

Publications (2)

Publication Number Publication Date
EP2912660A1 EP2912660A1 (en) 2015-09-02
EP2912660B1 true EP2912660B1 (en) 2017-01-11

Family

ID=47278271

Family Applications (1)

Application Number Title Priority Date Filing Date
EP12794680.4A Active EP2912660B1 (en) 2012-11-21 2012-11-21 Method for determining a dictionary of base components from an audio signal

Country Status (2)

Country Link
EP (1) EP2912660B1 (en)
WO (1) WO2014079484A1 (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017143095A1 (en) * 2016-02-16 2017-08-24 Red Pill VR, Inc. Real-time adaptive audio source separation
CN105976806B (en) * 2016-04-26 2019-08-02 西南交通大学 Active noise control method based on maximum entropy
WO2017217412A1 (en) * 2016-06-16 2017-12-21 日本電気株式会社 Signal processing device, signal processing method, and computer-readable recording medium
US10679646B2 (en) 2016-06-16 2020-06-09 Nec Corporation Signal processing device, signal processing method, and computer-readable recording medium
JP6615733B2 (en) * 2016-11-01 2019-12-04 日本電信電話株式会社 Signal analysis apparatus, method, and program
CN106897685A (en) * 2017-02-17 2017-06-27 深圳大学 Face identification method and system that dictionary learning and sparse features based on core Non-negative Matrix Factorization are represented
CN109829481B (en) * 2019-01-04 2020-10-30 北京邮电大学 Image classification method and device, electronic equipment and readable storage medium
CN110428848B (en) * 2019-06-20 2021-10-29 西安电子科技大学 Speech enhancement method based on public space speech model prediction
CN111009256B (en) * 2019-12-17 2022-12-27 北京小米智能科技有限公司 Audio signal processing method and device, terminal and storage medium
CN111179960B (en) * 2020-03-06 2022-10-18 北京小米松果电子有限公司 Audio signal processing method and device and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
None *

Also Published As

Publication number Publication date
WO2014079484A1 (en) 2014-05-30
EP2912660A1 (en) 2015-09-02

Similar Documents

Publication Publication Date Title
EP2912660B1 (en) Method for determining a dictionary of base components from an audio signal
Kameoka et al. A multipitch analyzer based on harmonic temporal structured clustering
Virtanen et al. Compositional models for audio processing: Uncovering the structure of sound mixtures
EP2877993B1 (en) Method and device for reconstructing a target signal from a noisy input signal
Grais et al. Single channel speech music separation using nonnegative matrix factorization and spectral masks
Ozerov et al. Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation
Sprechmann et al. Real-time Online Singing Voice Separation from Monaural Recordings Using Robust Low-rank Modeling.
Hassan et al. A comparative study of blind source separation for bioacoustics sounds based on FastICA, PCA and NMF
Nie et al. Deep learning based speech separation via NMF-style reconstructions
Mohammadiha et al. Speech dereverberation using non-negative convolutive transfer function and spectro-temporal modeling
Adiloğlu et al. Variational Bayesian inference for source separation and robust feature extraction
Duong et al. An interactive audio source separation framework based on non-negative matrix factorization
Lyubimov et al. Non-negative matrix factorization with linear constraints for single-channel speech enhancement
Jao et al. Monaural music source separation using convolutional sparse coding
Kantamaneni et al. Speech enhancement with noise estimation and filtration using deep learning models
Şimşekli et al. Non-negative tensor factorization models for Bayesian audio processing
Duong et al. Gaussian modeling-based multichannel audio source separation exploiting generic source spectral model
Li et al. Blind monaural singing voice separation using rank-1 constraint robust principal component analysis and vocal activity detection
Sprechmann et al. Learnable low rank sparse models for speech denoising
Baby Supervised speech dereverberation in noisy environments using exemplar-based sparse representations
Ben Messaoud et al. Sparse representations for single channel speech enhancement based on voiced/unvoiced classification
Li et al. FastMVAE: A fast optimization algorithm for the multichannel variational autoencoder method
Adiloğlu et al. A general variational Bayesian framework for robust feature extraction in multisource recordings
Lee et al. Discriminative training of complex-valued deep recurrent neural network for singing voice separation
Shin et al. Auxiliary-function-based independent vector analysis using generalized inter-clique dependence source models with clique variance estimation

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20150522

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME

DAX Request for extension of the european patent (deleted)
17Q First examination report despatched

Effective date: 20160404

GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

INTG Intention to grant announced

Effective date: 20160715

GRAS Grant fee paid

Free format text: ORIGINAL CODE: EPIDOSNIGR3

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

REG Reference to a national code

Ref country code: GB

Ref legal event code: FG4D

REG Reference to a national code

Ref country code: CH

Ref legal event code: EP

REG Reference to a national code

Ref country code: AT

Ref legal event code: REF

Ref document number: 861922

Country of ref document: AT

Kind code of ref document: T

Effective date: 20170115

REG Reference to a national code

Ref country code: IE

Ref legal event code: FG4D

REG Reference to a national code

Ref country code: DE

Ref legal event code: R096

Ref document number: 602012027800

Country of ref document: DE

REG Reference to a national code

Ref country code: LT

Ref legal event code: MG4D

REG Reference to a national code

Ref country code: NL

Ref legal event code: MP

Effective date: 20170111

REG Reference to a national code

Ref country code: AT

Ref legal event code: MK05

Ref document number: 861922

Country of ref document: AT

Kind code of ref document: T

Effective date: 20170111

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: NL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170111

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: FI

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170111

Ref country code: HR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170111

Ref country code: LT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170111

Ref country code: NO

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170411

Ref country code: GR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170412

Ref country code: IS

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170511

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: PT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170511

Ref country code: RS

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170111

Ref country code: AT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170111

Ref country code: ES

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170111

Ref country code: LV

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170111

Ref country code: SE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170111

Ref country code: BG

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170411

Ref country code: PL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170111

REG Reference to a national code

Ref country code: FR

Ref legal event code: PLFP

Year of fee payment: 6

Ref country code: DE

Ref legal event code: R097

Ref document number: 602012027800

Country of ref document: DE

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: IT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170111

Ref country code: CZ

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170111

Ref country code: EE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170111

Ref country code: RO

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170111

Ref country code: SK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170111

PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: DK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170111

Ref country code: SM

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170111

26N No opposition filed

Effective date: 20171012

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: SI

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170111

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: MC

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170111

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: CH

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20171130

Ref country code: LI

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20171130

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: LU

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20171121

REG Reference to a national code

Ref country code: BE

Ref legal event code: MM

Effective date: 20171130

REG Reference to a national code

Ref country code: IE

Ref legal event code: MM4A

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: MT

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20171121

REG Reference to a national code

Ref country code: FR

Ref legal event code: PLFP

Year of fee payment: 7

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: IE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20171121

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: BE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20171130

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: HU

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT; INVALID AB INITIO

Effective date: 20121121

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: CY

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170111

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: MK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170111

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: TR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170111

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: AL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20170111

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: FR

Payment date: 20230929

Year of fee payment: 12

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: GB

Payment date: 20231006

Year of fee payment: 12

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: DE

Payment date: 20230929

Year of fee payment: 12