WO2007025680A2 - A method and apparatus for blind source separation - Google Patents
A method and apparatus for blind source separation Download PDFInfo
- Publication number
- WO2007025680A2 WO2007025680A2 PCT/EP2006/008349 EP2006008349W WO2007025680A2 WO 2007025680 A2 WO2007025680 A2 WO 2007025680A2 EP 2006008349 W EP2006008349 W EP 2006008349W WO 2007025680 A2 WO2007025680 A2 WO 2007025680A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- mixtures
- histogram
- source
- signals
- mixing
- Prior art date
Links
- 238000000926 separation method Methods 0.000 title claims abstract description 25
- 238000000034 method Methods 0.000 title claims description 26
- 239000000203 mixture Substances 0.000 claims abstract description 51
- 238000004458 analytical method Methods 0.000 claims description 2
- 239000011159 matrix material Substances 0.000 description 20
- 239000013598 vector Substances 0.000 description 10
- 238000000354 decomposition reaction Methods 0.000 description 8
- 238000012545 processing Methods 0.000 description 6
- 238000013459 approach Methods 0.000 description 4
- 238000001514 detection method Methods 0.000 description 3
- 238000006073 displacement reaction Methods 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 238000012880 independent component analysis Methods 0.000 description 2
- 230000004807 localization Effects 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000005192 partition Methods 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 239000000654 additive Substances 0.000 description 1
- 230000000996 additive effect Effects 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000001427 coherent effect Effects 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 230000000873 masking effect Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 230000008054 signal transmission Effects 0.000 description 1
- 238000013518 transcription Methods 0.000 description 1
- 230000035897 transcription Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
- G10L21/028—Voice signal separating using properties of sound source
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
- G06F18/2134—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on separation criteria, e.g. independent component analysis
Definitions
- the present invention provides a method and apparatus for blind source separation (BSS).
- BSS blind source separation
- the "cocktail party phenomenon” illustrates the ability of the human auditory system to separate out a single speech source from the cacophony of a crowded room, using only two sensors and with no prior knowledge of the speakers or the channel presented by the room.
- Efforts to implement a receiver which emulates this sophistication are referred to as Blind Source Separation techniques, examples of which are described by A. J. Bell and T. J. Sejnowskl7"An information maximization approach to blind separation and blind deconvolution," Neural Computation, vol. 6, pp. 1129-1159, 1995. no. 5, pp. 530-538, September 2004; P. Comon, "Independent component analysis: A new concept?" Signal Processing, vol. vol. 36, no. 8, pp. 287-314, 1994; and A. Hyvarinen, J. Karhunen, and E.
- N time-varying source signals Si(t),s 2 (t),...,S N (O propagate across an isotropic, anechoic (direct path), non-dispersive medium and impinge upon an array of M sensors which are situated in the far-field of all sources.
- Si(t),s 2 (t),...,S N propagate across an isotropic, anechoic (direct path), non-dispersive medium and impinge upon an array of M sensors which are situated in the far-field of all sources.
- Xk (0 fl ⁇ - ⁇ - (t - tki) + n k (r)
- ⁇ & is attenuation of the i source at the k 1 sensor and nu(t) is additive noise for the k th sensor; and 4, is the delay from the i' h source to the k 11* sensor.
- blind source separation algorithms attempt to retrieve or estimate the source signals s(t) from the received mixtures x(t) with little, if any prior information about the mixing matrix or the source signals themselves.
- the ESPRIT algorithm relies on two subarrays of sensors. Each element of the first subarray is displaced in space from the corresponding element of the second subarray by the same displacement vector. It is also assumed that each signal source is sufficiently removed from the sensor arrays and so the time lag between the sensors of each pair for a source signal is constant.
- the original sensor array is a uniformly spaced linear array consisting of M sensors, as a result the array of M sensors is subdivided into two subarrays of M-I sensors each.
- the first subarray contains sensors 1,...,M-I and the second subarray contains sensors 2,...,M.
- ⁇ is a diagonal matrix with the N dominant entries associated with N signals
- the M-N remaining singular values are comparable to the noise variance and are contained in the diagonal matrix ⁇
- the N column vectors of E s are associated with the N dominant singular values
- the M-N column vectors of E n are associated with the M-N remaining singular values.
- the subspace spanned by E s is known as the signal subspace and the orthogonal subspace spanned by E n is known as the noise subspace.
- Both data vectors can be stacked to form
- the mixing matrix spans the same space as the signal subspace, i.e. there exists a non-singular matrix T such that
- the diagonal matrix ⁇ is related to E x + E y via a similarity transform
- a frequency domain based approach is also possible with the ESPRIT algorithm being performed at each point in the frequency domain using the covariance matrix
- DUET handles this permutation problem by mapping each delay estimate to a source using a weighted histogram.
- DUET makes a further simplifying assumption which ESPRIT does not require.
- the DUET method relies on the concept of approximate W-disjoint orthogonality (WDO), a measure of sparsity which quantifies the non-overlapping nature of the time- frequency representations of the sources. This property is exploited to facilitate the separation of any number of sources blindly from just two mixtures using the spatial signatures of each source. These spatial signatures arise out of the separation of the measuring sensors which produces a relative arrival delay, ⁇ ; , and a relative attenuation factor, ⁇ Xj for the i th source.
- WDO W-disjoint orthogonality
- the mixing parameters in (9) are only estimates of the true values. If we calculated these parameter estimates at every point in time-frequency space, we would expect the results to cluster around the true values of the actual mixing parameters. N sources produces N pairs of mixing parameters which creates N peaks in the parameter space histogram. We can then use these mixing parameter estimates to partition the time- frequency representation of one mixture to recover the source estimates.
- phase wrapping is not a problem.
- the present invention provides a method of blind source separation for demixing M mixtures of an arbitrary number of N signal sources (even when N>M) by: a. decomposing the mixtures into respective sparse representations where a small number of components of a signal carry a large percentage of the energy of the signal; b. performing analysis in local regions of the representations on the assumption that in that region only m ⁇ M sources are active to provide m sets of mixing parameter estimates and associated mixing parameter estimate weights; c. creating a multi-dimensional weighted histogram using the mixing parameter estimates as indices into the histogram and associated weights for the weights of the histogram; d. identifying peaks in the histogram to determine the number of sources N and their associated mixing parameters; and e. assigning m instantaneous demixtures to m of the N output representations for each local region based on said mixing parameters.
- the method further comprises converting the N output representations into the time domain.
- said sparse representations comprise one of a time-frequency or a time-scale representation.
- the mixing parameter weights comprise source energies associated with instantaneous demixing in this region.
- said identifying comprises using one of clustering or iterative thresholded peak finding.
- the associated mixing parameters for the histogram peaks are relative delay and
- said assigning comprises using a distance in mixing parameter space.
- the invention can be implemented in either a batch (off-line) or iterative (real-time) versions.
- the batch version all the data is analyzed in one pass and the histogram created. Then, the histogram peaks are identified. Then, in a second pass throughjhe_data,_the,sources-are demixed.
- the peaks are tracked from one time frame to the next and the demixtures created as new data comes in.
- This present invention estimates the delay (equivalently the angle of arrival) and the attenuation of N WDO source signals as they pass across an ESPRIT-like array of sensor pairs using two or more mixtures. Providing each source has a unique attenuation and delay estimate, a two dimensional histogram will have N peaks corresponding to N source signals. The centre of each peak provides an accurate estimate of the actual attenuation and delay of each source. Since the attenuation and delay parameter estimation is performed at each time-frequency point, the estimates for the mixing parameters of the N sources can be used to partition the time-frequency plane into N regions where the WDO sources are active. As a result N time-frequency masks with non-zero values at active time-frequency points and zeros elsewhere can be applied to any of the mixtures to demix these N source signals.
- the invention makes similar assumptions to ESPRIT as regards the layout of the sensors, namely that the sensors can be divided into two paired subarrays with each paired couplet of sensors sharing a common displacement vector.
- the invention can be performed at each point in the time-frequency domain using the localised spatial covariance matrix
- Rzz ( ⁇ , ⁇ ) E X w ( ⁇ . ⁇ ) ⁇ w ( ( O). T) [ X w ( ⁇ . ⁇ ) H Y ⁇ .r)" ]
- ⁇ may be recovered via an eigenvalue decomposition
- ⁇ ( ⁇ . ⁇ ) T [E X 1- (CO- T)E Y (W- T)] T ' at a given time-frequency point, up to N signals may be present and the resulting N-by-N diagonal matrix ⁇ ( ⁇ , ⁇ ) has up to N non-zero entries which are of the form
- cij and ⁇ j are the relative attenuation and delay parameters for the i th source. Note, this is an extension of the diagonal matrix used in the ESPRIT algorithm discussed above including relative attenuation scaling factors cij in addition to the associated phase factors stemming from the relative delays ⁇ j.
- the parameter estimation step of DUET fails.
- the present invention continues to work well providing that the number of sensors in the ESPRIT-like uniform linear array outnumber the number of sources that may coexist at a particular region in the time-frequency domain.
- the invention operates under the DUET strong WDO assumption (at most one source is active for every time-frequency point), whereas in a second embodiment, the invention operates under a weakened WDO assumption.
- Figure 1 shows blind source separation of 4 signals from 3 anechoic mixtures using a first embodiment of the present invention
- Figure 2 shows the parameter histograms for conventional 2 channel; as well as 3 and 4 mum-cnannel implementations of the first embodiment at Signal to Noise Ratios of OdB, 5dB and 1OdB (columns 1, 2 and 3);
- Figure 3 shows weighted parameter histograms associated with high, medium and low instantaneous power estimates
- Figure 4 shows blind source separation using a second embodiment of the invention for 5 speech signals (top left); 4 anechoic mixtures (top right); 2D-histogram (bottom left) and 5 demixed signals (bottom right); and
- Figure 5 shows blind source separation using a further embodiment of the invention for 2 speech signals travelling upon 3 and 2 echoic paths respectively (top left); 6 echoic mixtures of the two signals (top right); a 2D power weighted histogram showing 5 peaks (bottom left); and 5 demixed signals recovered, 3 corresponding to the first signal and 2 corresponding to the second signal (bottom right).
- E ⁇ ( ⁇ , ⁇ ) is a 2m-by-l vector so as a result the scalar ⁇ is given by
- Rzz (w- *) Y w ( ⁇ . ⁇ ) [ X w ( ⁇ , ⁇ f Y w ( ⁇ , ⁇ ⁇
- Step l A uniformly spaced linear array of M sensors receives M anechoic mixtures xi(t),x 2 (t),...,x M (t), of N WDO source signals. These M signals are represented in the 2(M-I)- by-1 time-varying vector
- a window W(t), of length L is formed and by shifting the position of the window by multiples of ⁇ seconds, localisation in time is possible.
- a two dimensional histogram of the attenuation and delay parameters ( ⁇ ( ⁇ , ⁇ ) and ⁇ ( ⁇ , ⁇ )) is constructed, weighting of histogram values is possible using X w ( ⁇ , ⁇ ) H X w ( ⁇ , ⁇ ) which is proportional to the power of the source present at each time-frequency point.
- N histogram peaks indicate N source signals, the ( ⁇ , ⁇ ) values corresponding to the centre of each peak are mapped back into the time-frequency domain to indicate in which regions each of the N source signals are active. Peak Detection is performed using a weighted K-means based technique or an iterated peak removal technique.
- Step 4 Under the assumption that the N source signals are strongly W-disjoint orthogonal, a binary time-frequency mask corresponding to the regions of the time-frequency plane where a source is active is created. Applying the i th mask to any of the received mixtures recovers the i th source signal. N such masks are used to separate the N sources.
- the implementation was used to blindly demix four 2.4 seconds long speech signals, using three anechoic mixtures of these signals each having been sampled at 16kHz. Plots of the original source signals, the received mixtures, the two-dimensional histogram and the demixed signals are given in Figure 1, a high SNR of 10OdB is assumed.
- the invention has clear advantages at lower values of Signal to Noise Ratios (SNRs) since an increase in the number of sensors improves parameter estimation when using the invention.
- Figure 2 shows the parameter histograms for conventional 2 channel; as well as 3 and 4 multi-channel implementations at Signal to Noise Ratios of OdB, 5dB and 1OdB (columns 1, 2 and 3).
- a second embodiment of the invention is based on a weak- WDO assumption that allows for more than one source to have significant energy in the same time-frequency coefficient.
- ESPRIT direction of arrival (as well as attenuation) estimation is performed at each time-frequency point by considering a group of neighbouring time frames for a given frequency.
- the estimated mixing parameters are used to create a two-dimensional weighted histogram.
- the weights for the histogram are obtained from the energy of the time- frequency localized demixtures found by applying a demixing matrix based on the mixing parameters estimates for that time-frequency point.
- N peaks are located corresponding to the N source mixing parameter pairs.
- Demixing is performed by matrix inversion at each time-frequency point, assigning the resulting demixtures based on the distance to the known source mixing parameters.
- a window W(t) of length L «(K-1)T is formed and by shifting the position of the window by multiples of ⁇ seconds, localisation in time is possible.
- the m attenuation ⁇ t ( ⁇ , ⁇ ) and delay S 1 ( ⁇ , ⁇ ) parameters and m source signal Estimates S 1 ( ⁇ , ⁇ ), ... , S m ( ⁇ , ⁇ ) are produced at each of the LKT/ ⁇ time-frequency points and then used to create a 2-D power weighted histogram. Unlike a count histogram a weighted histogram increments each bin by a weight associated with each different estimate instead of incrementing by unity for each estimate. We have weighted each
- Each of the m instantaneous source estimates S ⁇ ⁇ , ⁇ ),... ,S m ( ⁇ , ⁇ ) needs to be correctly assigned to one of the N demixed source estimates at each time-frequency point. Assignment is performed by determining which of the m instantaneous parameter estimates
- said measure of closeness of the i th estimate at ( ⁇ , ⁇ ) to the k th peak centre is given as
- Table 1 shows the percentage of the average instantaneous power associated with each of the 3 possible parameter estimates, with one source present the strongest eigenvalue is weighted by about 99.36% of the power and the next strongest eigenvalue is weighted by the remaining 0.64% of the power. As the number of sources increases the WDO assumption is weakened since the strongest eigenvalue receives weaker associated power weighting and the secondary and tertiary eigenvalues receive stronger weightings.
- Table 1 The percentage of the average instantaneous signal power associated with the eigenvalues ⁇ ⁇ . ⁇ n . and ⁇ ⁇ sorted according to highest associated signal power, when 2. 3. . . . . H sources and no noise are present.
- the second embodiment was used to blindly demix five 1.7 seconds long speech signals, using four anechoic mixtures of these signals each having been sampled at 16kHz.
- Figure 3 shows the two-dimensional histograms associated with high, medium and low power estimates. Operating under a strong WDO assumption the first embodiment has access only to the first histogram, whereas the invention operating under a weakened WDO assumption has access all 3 and so a single histogram containing 3 times the data may be constructed. Plots of the original source signals, the received mixtures, the two-dimensional histogram and the demixed signals are given in Figure 4.
- the invention may be applied to echoic environments. This is based on stacking M mixtures ⁇ xi(t), xi(t),..., X M (0 ⁇ of N possibly coherent narrowband source signals ⁇ si(t), S 2 (t),..., S N (t) ⁇ of centre frequency ⁇ o in a matrix of the form: ⁇ ⁇ ⁇ x [M /2j (0
- R 72 will have a maximum possible rank of N.
- R ⁇ of rank N there exists a singular value decomposition: and it follows that the N eigenvalues of:
- the ⁇ _M /2j mixing parameters estimates are obtained via an eigenvalue decomposition:
- a uniform linear array of M sensors may be used to estimate the mixing parameters of one signal travelling on P echoic paths, providing M ⁇ 2P .
- M echoic mixtures of an arbitrary number of speech source signals may be demixed providing the maximum number of echoic paths no more than half the number of sensors in the uniform linear array.
- Step 1 A uniform linear array of M sensors receives M possibly echoic mixtures (X 1 (O, x 2 (t),..., XM(O) of N speech signals. These M mixture signals are sampled every T seconds and a window W(t) of length L «KT seconds is shifted by multiples of ⁇ T seconds to perform K/ ⁇ L-point Discrete Windowed Fourier Transforms upon K samples of each mixture.
- the [_M/2j estimated mixing parameters are used to perform a demixing step at each time- frequency point via an inversion of the estimated mixing matrix and the Moore-Penrose pseudo-inverse [ ] is used to invert non-square matrices.
- the [A/ / 2 J mixing parameters are given as:
- an Ax D two-dimensional power weighted histogram H ⁇ of the relative attenuation and delay parameters is also constructed, i.e. a histogram is constructed in the usual way but instead of a bin being incremented by one when a mixing parameter estimate is entered into the histogram, each the signal power associated with the estimate is added.
- the power weighted histogram H ⁇ s will have a number of peaks N ' ⁇ N , each represents a signal received by the sensor array, in an echoic environment some of these signals may have the originated from the same source.
- the centres of each of the peaks provide estimates of the mixing parameters ( ⁇ , , S x j , ... , I ⁇ N , , S N , J . Peak detection may be performed using a suitable clustering technique.
- Figure 5 shows blind source separation using the above embodiment of the invention for 2 speech signals travelling upon 3 and 2 echoic paths respectively (top left); 6 echoic mixtures of the two signals (top right); a 2D power weighted histogram showing 5 peaks (bottom left); and 5 demixed signals recovered, 3 corresponding to the first signal and 2 corresponding to the second signal (bottom right).
- the weighted histogram approach of the DUET aspect of the above embodiments may be used in combination with other direction of arrival algorithms other than ESPRIT such as the MUSIC algorithm.
- the histogram has more than two-dimensions which allows for the sensors to be in arbitrary arrangements.
- mixing parameter estimates and mapped to a domain in which their value corresponds to physical location of the source and the weighted histogram constructed yields information about relative locations of the sources in addition as providing the means for separation.
- the invention is useful in several applications, where the ability to separate underlying signals for their mixtures is of critical importance.
- the ability to separate out one speaker from a number of speakers has applications in hearing aids; the ability to separate out a number of speakers from a mixture has application for automatic meeting transcription, monitoring or audio forensics; the ability to separate out the original sources of sound (valves, murmurs, etc..) from biomedical signals including heart sounds has
- diagnostic value for physicians ECG, ECG, PCG, MEG
- demultiplex wireless signals based on their spatial signature frequency-hopped waveforms
- other signals which could be processed include seismic signals or other terrestrial mapping signals, optics and optical signal transmissions, and optical and radio signals from space.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Quality & Reliability (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
The Direction of Arrival estimation algorithm ESPRIT is capable of estimating the angles of arrival of N narrowband source signals using M > N anechoic sensor mixtures from a uniform linear array (ULA). Using a similar parameter estimation step, the DUET Blind Source Separation algorithm can demix N > 2 speech signals using M = 2 anechoic mixtures of the signals. The present invention demixes N > M speech signals using M >= 2 anechoic mixtures.
Description
A method and apparatus for blind source separation
Field of the invention
The present invention provides a method and apparatus for blind source separation (BSS).
Background of the invention
The "cocktail party phenomenon" illustrates the ability of the human auditory system to separate out a single speech source from the cacophony of a crowded room, using only two sensors and with no prior knowledge of the speakers or the channel presented by the room. Efforts to implement a receiver which emulates this sophistication are referred to as Blind Source Separation techniques, examples of which are described by A. J. Bell and T. J. Sejnowskl7"An information maximization approach to blind separation and blind deconvolution," Neural Computation, vol. 6, pp. 1129-1159, 1995. no. 5, pp. 530-538, September 2004; P. Comon, "Independent component analysis: A new concept?" Signal Processing, vol. vol. 36, no. 8, pp. 287-314, 1994; and A. Hyvarinen, J. Karhunen, and E.
Oja, "Independent component analysis," Wiley Series on Adaptive and Learning Systems for Signal Processing, Communications and Control, 2001.
Generally, in the anechoic blind source separation model, N time-varying source signals Si(t),s2(t),...,SN(O propagate across an isotropic, anechoic (direct path), non-dispersive medium and impinge upon an array of M sensors which are situated in the far-field of all sources. Under such conditions the kth mixture can be expressed as:
N
Xk (0 = flλ-Λ- (t - tki) + nk (r)
where α& is attenuation of the i source at the k1 sensor and nu(t) is additive noise for the kth sensor; and 4, is the delay from the i'h source to the k11* sensor.
Generally blind source separation algorithms attempt to retrieve or estimate the source signals s(t) from the received mixtures x(t) with little, if any prior information about the mixing matrix or the source signals themselves.
Typically blind source separation and direction of arrival techniques require the number of sensors to be greater than or equal to the number of sources M>=N.
Classic Direction of Arrival estimation techniques such as MUSIC disclosed by R. O.
Schmidt, "Multiple emitter location and signal parameter estimation (MUSIC)," IEEE Trans, on Antennas and Propagation, vol. AP-34, no. 53, pp. 276-280, March 1986; and ESPRIT disclosed by R. Roy and T. Kailath, "ESPRIT - Estimation of Signal Parameters via Rotational Invariance Techniques," IEEE Transactions on Acoustics, Speech and Signal Processing, vol. vol. 37, no. 7, pp. 984-995, July 1989 aim to find the N angles of arrival for N narrowband signals Si(t),s2(t),...,SNO) impinging upon an array of M sensors. With accurate estimation, beamforming can be performed to separate the N signals if M>=N.
For narrowband signals of centre frequency α>o a time lag can be approximated by a phase rotation, i.e. s(t-τ)« s(t)exp{-j <x>o τ }, where s(t) is the analytic representation of a real signal. As a result the kth mixture can be expressed as
The ESPRIT algorithm relies on two subarrays of sensors. Each element of the first subarray is displaced in space from the corresponding element of the second subarray by the same displacement vector. It is also assumed that each signal source is sufficiently removed from the sensor arrays and so the time lag between the sensors of each pair for a source signal is constant.
Without loss of generality, in one common implementation of ESPRIT, it is assumed that the original sensor array is a uniformly spaced linear array consisting of M sensors, as a result the array of M sensors is subdivided into two subarrays of M-I sensors each. The first subarray contains sensors 1,...,M-I and the second subarray contains sensors 2,...,M.
where the mixing matrix A has complex entries, each column may be associated with an individual narrowband source signal, and m=M-l in the case of a uniform linear array. In the case where the sensors do not form a uniform linear array, M-l>m>=M/2, with m=M/2 if the two subarrays have no elements in common.
An estimate of the spatial co variance matrix
Λ,, =£{[x(0][x(0]"} (2)
can be calculated, where H denotes a complex conjugate transpose operation. The Singular Value Decomposition (SVD) of Rxx is of the form
where Λ is a diagonal matrix with the N dominant entries associated with N signals, the M-N remaining singular values are comparable to the noise variance and are contained in the diagonal matrix Σ, the N column vectors of Es are associated with the N dominant singular values and the M-N column vectors of En are associated with the M-N remaining singular values. The subspace spanned by Es is known as the signal subspace and the orthogonal subspace spanned by En is known as the noise subspace.
The M-I mixtures from the second array can be represented as y(t) = AΦs(t) + ny(t) where the diagonal matrix
-jaoδN
contains delay terms δi, i=l,...,N, which are unique to each source signal and are related geometrically to the angle of arrival, i.e. δj =Δcos(θ,)/c where Δ is the distance between the two subarrays, fy is the angle of arrival of the ith signal onto the array and c is the propagation speed.
Both data vectors can be stacked to form
which is a 2m-by-l vector of mixtures. It follows that the SVD of the spatial co variance matrix Rzz can be computed
For the no-noise case, the mixing matrix spans the same space as the signal subspace, i.e. there exists a non-singular matrix T such that
E1
E.. (5)
AΦ
furthermore the diagonal matrix Φ is related to Ex +Ey via a similarity transform
where * denotes the Moore-Penrose pseudo-inverse. As a result the N angles of arrival (θ,, i=l,...,N) can be recovered from the N complex eigenvalues Of Ex +Ey, which are of the form e-jω0(A∞s(θ,)Ic) j- _ l . . . J^
The original ESPRIT algorithm is a time-domain based technique, where Rzz is approximated by a time average
A frequency domain based approach is also possible with the ESPRIT algorithm being performed at each point in the frequency domain using the covariance matrix
Such a frequency domain approach has the advantage that the narrowband assumption placed upon the source signals is no longer necessary. However, at each frequency the N signal subspace vectors are permutated and so, without knowledge of this random permutation, combining results across frequencies becomes difficult as disclosed by H. Sawada, R. Mukai, S. Araki, and S. Makino, "A robust and precise method for solving the permutation problem of frequency-domain blind source separation," IEEE Trans. Speech and Audio Processing, vol. 12, no. 5, pp. 530-538, September 2004.
At the same time, A. Jourjine, S. Rickard, and O. Yilmaz, "Blind separation of disjoint orthogonal signals: Demixing N sources from 2 mixtures," IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP OO), vol. vol. 5, pp. 2985-2988, June 2000; and O. Yilmaz and S. Rickard, "Blind separation of speech mixtures via time- frequency masking," IEEE Trans, on Signal Processing, vol. vol. 52, no. 7, pp. 1830-1847, July 2004 disclose the DUET blind source separation algorithm which can demix N>2 signals using only 2 anechoic mixtures of the signals, providing these signals are W-disjoint orthogonal (WDO) i.e. that there is only ever at most one source active at any time-frequency point.
DUET handles this permutation problem by mapping each delay estimate to a source using a weighted histogram. DUET makes a further simplifying assumption which ESPRIT does not require. The DUET method relies on the concept of approximate W-disjoint orthogonality (WDO), a measure of sparsity which quantifies the non-overlapping nature of the time-
frequency representations of the sources. This property is exploited to facilitate the separation of any number of sources blindly from just two mixtures using the spatial signatures of each source. These spatial signatures arise out of the separation of the measuring sensors which produces a relative arrival delay, δ; , and a relative attenuation factor, <Xj for the ith source.
Using a windowed Fourier transform
S w] (ω W{t - τ)sj(t)e-Jωtdt
the WDO assumption can be written as
Assuming all sources are W-disjoint orthogonal, at a given time-frequency point only one of the N sources will have a non-zero value. This allows DUET to perform separation using only two mixtures. Thus the equations for the mixtures can be written as follows:
Xw(ω, τ) AV Yw(ω, τ) a,-e -jωδj S7 (ω, τ) (8)
where s,(t) is defined to be the il source measured at x(t). From this we can determine expressions for the mixing parameters at each point in the time-frequency domain of each of the mixtures Xw(ω,τ) and Yw(ω,τ). Approximate W-disjoint orthogonality suggests that the parameters at each point are equal to or, at least, tend towards those for one source only.
Note that, due to approximate nature of W-disjoint orthogonality along with the presence of noise, the mixing parameters in (9) are only estimates of the true values. If we calculated these parameter estimates at every point in time-frequency space, we would expect the results to cluster around the true values of the actual mixing parameters.
N sources produces N pairs of mixing parameters which creates N peaks in the parameter space histogram. We can then use these mixing parameter estimates to partition the time- frequency representation of one mixture to recover the source estimates.
It may be noted that the phase is defined modulo-π in (9), with closely space sensors of maximum separation Δmax=2c/fmaχ (where fmax is the highest frequency with non-negligible energy content and c is the propagation speed) so phase wrapping is not a problem.
Disclosure of the invention
The present invention provides a method of blind source separation for demixing M mixtures of an arbitrary number of N signal sources (even when N>M) by: a. decomposing the mixtures into respective sparse representations where a small number of components of a signal carry a large percentage of the energy of the signal; b. performing analysis in local regions of the representations on the assumption that in that region only m<M sources are active to provide m sets of mixing parameter estimates and associated mixing parameter estimate weights; c. creating a multi-dimensional weighted histogram using the mixing parameter estimates as indices into the histogram and associated weights for the weights of the histogram; d. identifying peaks in the histogram to determine the number of sources N and their associated mixing parameters; and e. assigning m instantaneous demixtures to m of the N output representations for each local region based on said mixing parameters.
The method further comprises converting the N output representations into the time domain.
Preferably, said sparse representations comprise one of a time-frequency or a time-scale representation.
Preferably, the mixing parameter weights comprise source energies associated with instantaneous demixing in this region.
Preferably, said identifying comprises using one of clustering or iterative thresholded peak finding.
Preferably, the associated mixing parameters for the histogram peaks are relative delay and
attenuation mixing parameter estimates ' D ' • '" ' <• • • • ■ • * α jV * LjV ' .
Preferably, said assigning comprises using a distance in mixing parameter space.
The invention can be implemented in either a batch (off-line) or iterative (real-time) versions. In the batch version, all the data is analyzed in one pass and the histogram created. Then, the histogram peaks are identified. Then, in a second pass throughjhe_data,_the,sources-are demixed. In the iterative online version, the peaks are tracked from one time frame to the next and the demixtures created as new data comes in.
This present invention estimates the delay (equivalently the angle of arrival) and the attenuation of N WDO source signals as they pass across an ESPRIT-like array of sensor pairs using two or more mixtures. Providing each source has a unique attenuation and delay estimate, a two dimensional histogram will have N peaks corresponding to N source signals. The centre of each peak provides an accurate estimate of the actual attenuation and delay of each source. Since the attenuation and delay parameter estimation is performed at each time-frequency point, the estimates for the mixing parameters of the N sources can be used to partition the time-frequency plane into N regions where the WDO sources are active. As a result N time-frequency masks with non-zero values at active time-frequency points and zeros elsewhere can be applied to any of the mixtures to demix these N source signals.
DUET requires M=2, whereas the present invention can be seen as an extension of DUET where M>2 mixtures are available. The invention makes similar assumptions to ESPRIT as regards the layout of the sensors, namely that the sensors can be divided into two paired subarrays with each paired couplet of sensors sharing a common displacement vector.
The invention can be performed at each point in the time-frequency domain using the localised spatial covariance matrix
Rzz (ω,τ) = E Xw (ω. τ) γw( ( O). T) [ Xw(ω.τ)H Y^ω.r)" ]
( 10) the singular value decomposition of Rzz(ω,τ) at each time-frequency point is of the form
H
Rzz (ω.τ) = Ex E Λ 0 Ex E «x EY E 11Y 0 Σ EY E 11Y
From equation (6) Φ may be recovered via an eigenvalue decomposition
Φ(ω. τ) = T [EX 1- (CO- T)EY (W- T)] T ' at a given time-frequency point, up to N signals may be present and the resulting N-by-N diagonal matrix Φ (ω,τ) has up to N non-zero entries which are of the form
where cij and δj are the relative attenuation and delay parameters for the ith source. Note, this is an extension of the diagonal matrix used in the ESPRIT algorithm discussed above including relative attenuation scaling factors cij in addition to the associated phase factors stemming from the relative delays δj.
It is discussed in the background how the DUET BSS algorithm constructs a two dimensional histogram of these parameters to identify any number of sources and ultimately separate them if they can be assumed to strongly W-disjoint orthogonal. By borrowing from both techniques the present invention is possible.
Under a weakened WDO assumption with possibly M-I sources overlapping at any point in the time-frequency domain, the parameter estimation step of DUET fails. However, the present invention continues to work well providing that the number of sensors in the ESPRIT-like uniform linear array outnumber the number of sources that may coexist at a particular region in the time-frequency domain.
In a first embodiment, the invention operates under the DUET strong WDO assumption (at most one source is active for every time-frequency point), whereas in a second embodiment, the invention operates under a weakened WDO assumption.
Brief Description of the Drawings
Embodiments of the invention will now be described by way of example with reference to the accompanying drawings, in which:
Figure 1 shows blind source separation of 4 signals from 3 anechoic mixtures using a first embodiment of the present invention;
Figure 2 shows the parameter histograms for conventional 2 channel; as well as 3 and 4 mum-cnannel implementations of the first embodiment at Signal to Noise Ratios of OdB, 5dB and 1OdB (columns 1, 2 and 3);
Figure 3 shows weighted parameter histograms associated with high, medium and low instantaneous power estimates;
Figure 4 shows blind source separation using a second embodiment of the invention for 5 speech signals (top left); 4 anechoic mixtures (top right); 2D-histogram (bottom left) and 5 demixed signals (bottom right); and
Figure 5 shows blind source separation using a further embodiment of the invention for 2 speech signals travelling upon 3 and 2 echoic paths respectively (top left); 6 echoic mixtures of the two signals (top right); a 2D power weighted histogram showing 5 peaks (bottom left); and 5 demixed signals recovered, 3 corresponding to the first signal and 2 corresponding to the second signal (bottom right).
Description of the Preferred Embodiments
In the first embodiment, where the invention operates under a strong WDO assumption Λ is a 1 -by- 1 scalar λ, Σ has all near zero entries and
"Eχ(ω, τ)" Eγ(ω. τ) is a 2m-by-l vector so as a result the scalar φ is given by
φ — Ex {ω, τ)"Eγ (ω. τ) . ( 1 1)
Furthermore when the expectation operator of equation (10) is approximated by an instantaneous estimate, i.e.
Xw (ω, τ)
Rzz (w- *) = Yw (ω. τ) [ Xw(ω, τf Yw(ω,τψ }
the expression (1 1) is equivalent to
Φ = X -.Ww 1(-ω. τV Y VwV (ω. τ ( 12)
and so in this case the subspace decomposition of the spatial covariance matrix is -unneeessary-In-the-M=2-case-this implementation reduces-to conventional"DUET: Thus7the" present invention applies to multichannel (M>2) implementations of this embodiment.
The steps involved in the multichannel implementation of the first embodiment are as follows:
Step l A uniformly spaced linear array of M sensors receives M anechoic mixtures xi(t),x2(t),...,xM(t), of N WDO source signals. These M signals are represented in the 2(M-I)- by-1 time-varying vector
where x(t)=(xi(t),x2(t),...,xM-i(t))T and y(t)=(x2(t),x3(t),...,xM(t))T represent signals taken from the first and second subarrays respectively. K samples are taken at t=kT, k=0,l,...,K-I, where T is the sampling period.
Step 2
A window W(t), of length L is formed and by shifting the position of the window by multiples of Δ seconds, localisation in time is possible.
for r = 0 : Δ : (K - \ )T z(j. τ) = W{t - τ)z(t) Zfω.τ) = l->KT(z(/s τ))
for ω = {Q : I : L - 1 j x 2π/X7
ψ(ω. τ) = X(ω, τ)tY(ωJ τ) δ {ω. τ) - - Im { log^ {ø ( ω. τ ) } j- /ω θf( ω. τ) = \φ(ω. τ) end end
Step 3
A two dimensional histogram of the attenuation and delay parameters (α(ω,τ) and δ(ω,τ)) is constructed, weighting of histogram values is possible using Xw(ω,τ) H Xw(ω,τ) which is proportional to the power of the source present at each time-frequency point. N histogram peaks indicate N source signals, the (α,δ) values corresponding to the centre of each peak are mapped back into the time-frequency domain to indicate in which regions each of the N source signals are active. Peak Detection is performed using a weighted K-means based technique or an iterated peak removal technique.
Step 4 Under the assumption that the N source signals are strongly W-disjoint orthogonal, a binary time-frequency mask corresponding to the regions of the time-frequency plane where a source is active is created. Applying the ith mask to any of the received mixtures recovers the ith source signal. N such masks are used to separate the N sources.
As an example of the results provided by the above embodiment, the implementation was used to blindly demix four 2.4 seconds long speech signals, using three anechoic mixtures of these signals each having been sampled at 16kHz. Plots of the original source signals, the
received mixtures, the two-dimensional histogram and the demixed signals are given in Figure 1, a high SNR of 10OdB is assumed.
The multichannel implementation (M>2) above can be compared with the classic implementation of DUET (M=2) for the same data as before under noisier conditions. The invention has clear advantages at lower values of Signal to Noise Ratios (SNRs) since an increase in the number of sensors improves parameter estimation when using the invention. Figure 2 shows the parameter histograms for conventional 2 channel; as well as 3 and 4 multi-channel implementations at Signal to Noise Ratios of OdB, 5dB and 1OdB (columns 1, 2 and 3).
A second embodiment of the invention is based on a weak- WDO assumption that allows for more than one source to have significant energy in the same time-frequency coefficient.
In this embodiment, ESPRIT direction of arrival (as well as attenuation) estimation is performed at each time-frequency point by considering a group of neighbouring time frames for a given frequency. As in DUET, the estimated mixing parameters are used to create a two-dimensional weighted histogram. The weights for the histogram are obtained from the energy of the time- frequency localized demixtures found by applying a demixing matrix based on the mixing parameters estimates for that time-frequency point.
From the histogram, N peaks are located corresponding to the N source mixing parameter pairs. Demixing is performed by matrix inversion at each time-frequency point, assigning the resulting demixtures based on the distance to the known source mixing parameters.
In more detail:
Step 1
A uniform linear array of M sensors receives M anechoic mixtures of N weak-WDO, possibly wideband source signals. These M mixture signals are stacked in a 2m-by-l time- varying vector
z(0 = χ(0
where m = M-I and the m mixtures x(t) are the first m mixtures of z(t) and the m mixtures y(t) are the last m mixtures of z(t).
Step 2
K samples of z(t) are taken at t=kT,k=0, 1 , ... ,K- 1 , where T is the sampling period. A window W(t) of length L«(K-1)T is formed and by shifting the position of the window by multiples of Δ seconds, localisation in time is possible.
forr = 0:Δ:(^-l)rdo for k = τ -mT :Δ :τ + mT do
Z(ω,k) = OFT{W(t-k)z(t))
end for ω = (0:l:L-ϊ)x2π/ LT
1 k=τ+mT
K(ω,τ) = — ∑ [Z(ω,k)][Z(ω,k)f
Z™ k = τ-ml
H SVD
[V(ω,τ)][D(ω,τ)][V(ω,τ)]" = R(ω,τ)
Φ\(∞,τ)
Eigenvalue Decomposition
E^ω,τ)^y{ω,τ)
ΦJω,τ)
end end Step 3
The m attenuation άt (ω, τ) and delay S1 (ω, τ) parameters and m source signal Estimates S1 (ω, τ), ... , Sm (ω, τ) are produced at each of the LKT/Δ time-frequency points and then used to create a 2-D power weighted histogram. Unlike a count histogram a weighted histogram increments each bin by a weight associated with each different estimate instead of incrementing by unity for each estimate. We have weighted each |ά(((y,τ),^((y,r)| estimate by the associated instantaneous power
~ 2
S,(ω,τ) i= 1 , ... ,m for high SNR values estimates associated with an actual source will be given a large weight and estimates associated with noise values will be given a small weight. As in DUET, the N histogram peaks indicate N source signals.
Peak detection is performed using a weighted K-means based technique which produces N mixing parameter estimates IanSA, i=l,...,N.
Step 4
Each of the m instantaneous source estimates Sλ{ω , τ),... ,S m(ω ,τ) needs to be correctly assigned to one of the N demixed source estimates at each time-frequency point. Assignment is performed by determining which of the m instantaneous parameter estimates
((ά, (ω, τ), S1 (ω, τ)) , ... , [άm (ω, τ), δm (ω, r)))
is closest to each of the N peak centres (α, , S1 J , ... , ( άN , SN ) .
Preferably, said measure of closeness of the ith estimate at (ω,τ) to the kth peak centre is given as
where mαand ma are normalising factors. Beginning with the instantaneous mixing parameter estimates associated with the instantaneous source estimates of lowest power, at each time- frequency point the closest peak centre is found and the lowest power instantaneous source estimate is assigned to the appropriate demixed source estimate. The assignment is then carried out for the instantaneous mixing parameter estimates associated with the instantaneous source estimates of next lowest power and so on. Assignments carried out in later stages are allowed to overwrite previous assignments in the belief that the instantaneous mixing parameter estimates associated with the instantaneous signal estimates of greater power are the more reliable, since they have been affected by noise the least. The N demixed source estimates are then synthesised back into the time-domain.
As discussed, the WDO assumption used by DUET assumes that there is only ever one source active at any time-frequency point. The second embodiment of the present invention uses an eigenvalue decomposition step to uncover m=M-l possible parameter estimates.
Table 1 shows the percentage of the average instantaneous power associated with each of the 3 possible parameter estimates, with one source present the strongest eigenvalue is weighted by about 99.36% of the power and the next strongest eigenvalue is weighted by the remaining 0.64% of the power. As the number of sources increases the WDO assumption is weakened since the strongest eigenvalue receives weaker associated power weighting and the secondary and tertiary eigenvalues receive stronger weightings.
? sources 96.393 3. 4794 0 .1276 sources 91.831 3426 0 .8261
4 sources 90.83 1 7. 9957 t .1 734
5 sources 86.820 1 .440 1 .7404
Table 1 : The percentage of the average instantaneous signal power associated with the eigenvalues φ\ . θn . and φ^} sorted according to highest associated signal power, when 2. 3. . . . . H sources and no noise are present.
The second embodiment was used to blindly demix five 1.7 seconds long speech signals, using four anechoic mixtures of these signals each having been sampled at 16kHz. Figure 3 shows the two-dimensional histograms associated with high, medium and low power estimates. Operating under a strong WDO assumption the first embodiment has access only to the first histogram, whereas the invention operating under a weakened WDO assumption has access all 3 and so a single histogram containing 3 times the data may be constructed. Plots of the original source signals, the received mixtures, the two-dimensional histogram and the demixed signals are given in Figure 4.
In variations of the above embodiments, the invention may be applied to echoic environments. This is based on stacking M mixtures {xi(t), xi(t),..., XM(0} of N possibly
coherent narrowband source signals {si(t), S2(t),..., SN(t)} of centre frequency ωo in a matrix of the form: ■ ■ ■ x[M /2j(0
In a no noise case this may be rewritten as:
1 φ{(ω0) - ^/2J-'K)
The spatial co variance matrix:
Rzz =£{z(OzH(θ} is of the form:
and by choosing M ≥ 2N , R72 will have a maximum possible rank of N. For RΏ of rank N there exists a singular value decomposition:
and it follows that the N eigenvalues of:
[E1]"1 [E2] are the mixing parameters {φi, φ2)..., φw}.
In general, the steps involved in implementing this variation are as follows:
M narrowband mixtures {xi(t), X2(O,..., XM(O } are used to construct the matrices
The \_M /2j mixing parameters estimates are obtained via an eigenvalue decomposition:
.
Now, a uniform linear array of M sensors may be used to estimate the mixing parameters of one signal travelling on P echoic paths, providing M ≥ 2P . This allows M echoic mixtures of an arbitrary number of speech source signals to be demixed providing the maximum number of echoic paths no more than half the number of sensors in the uniform linear array.
More particularly, the steps involved are as follows:
Step 1: A uniform linear array of M sensors receives M possibly echoic mixtures (X1(O, x2(t),..., XM(O) of N speech signals. These M mixture signals are sampled every T seconds and a window W(t) of length L«KT seconds is shifted by multiples of ΔT seconds to perform K/Δ L-point Discrete Windowed Fourier Transforms upon K samples of each mixture. At the time-frequency point (ω,τ) the 1st mixture is given as:
κ-\ Xχ{ω,τ) = γW(kT -τ)xx{kT)e -jωkT k=0 where W(t) is chosen such that the class of source signals of interest satisfy the W-disjoint orthogonal assumption as much as possible, for speech W(t) is chosen to be an L=30 millisecond long Hamming window and Δ=L/2T.
Step 2:
At each time-frequency point the new ESPRIT parameter estimation step is performed, the [_M/2j estimated mixing parameters are used to perform a demixing step at each time- frequency point via an inversion of the estimated mixing matrix and the Moore-Penrose pseudo-inverse [ ] is used to invert non-square matrices. At the time-frequency point (ω,τ) the [A/ / 2 J mixing parameters are given as:
Step 3:
At each time-frequency point and for i=l,2,..., LM /2 J the relative attenuation and delay mixing parameter estimates are calculated:
an Ax D two-dimensional power weighted histogram Hαδ of the relative attenuation and delay parameters is also constructed, i.e. a histogram is constructed in the usual way but instead of a bin being incremented by one when a mixing parameter estimate is entered into the histogram, each the signal power associated with the estimate is added.
Step 4:
The power weighted histogram Hαs will have a number of peaks N ' ≥ N , each represents a signal received by the sensor array, in an echoic environment some of these signals may have the originated from the same source. The centres of each of the peaks provide estimates of the mixing parameters ( ά, , Sx j , ... , I άN, , SN, J . Peak detection may be performed using a suitable clustering technique.
Step 5:
The permutation ambiguity associated with wideband implementations of narrowband techniques is overcome when each of the
instantaneous source estimates:
is correctly assigned to one of the N' ≥ N demixed estimates at each time-frequency point. Assignment is performed by determining which of the
/2j instantaneous parameter estimates:
((«, (ω,
is closest to each of the N' ≥ N peak centres ( ά, , S1 j , ... , ( άN, , SN. ) . The measure of closeness of the i"1 estimate at (ω,τ) to the nth peak centre is given as:
where Na and N15 are normalising factors. Beginning with the instantaneous mixing parameter estimates associated with the instantaneous source estimates of lowest power, at each time-frequency point the closest peak centre is found and the lowest power instantaneous source estimate is assigned to the appropriate demixed source estimate. The assignment is then carried out for the instantaneous mixing parameter estimates associated with the instantaneous source estimates of next lowest power and so on. Assignments carried out in later stages are allowed to overwrite previous assignments in the belief that the instantaneous mixing parameter estimates associated with the instantaneous signal estimates of greater power are the more reliable, since they have been affected by noise the least. The N' ≥ N demixed source estimates are then synthesised back into the time-domain.
Figure 5 shows blind source separation using the above embodiment of the invention for 2 speech signals travelling upon 3 and 2 echoic paths respectively (top left); 6 echoic mixtures of the two signals (top right); a 2D power weighted histogram showing 5 peaks (bottom left); and 5 demixed signals recovered, 3 corresponding to the first signal and 2 corresponding to the second signal (bottom right).
In other variations of the invention, the weighted histogram approach of the DUET aspect of the above embodiments may be used in combination with other direction of arrival algorithms other than ESPRIT such as the MUSIC algorithm.
In other variations of the invention, standard square mixing (N=M) or over-determined mixing (N<M) blind source separation techniques can be used in the local regions to determine the mixing parameters and instantaneous demixture weights for the histogram.
In other variations of the invention, the histogram has more than two-dimensions which allows for the sensors to be in arbitrary arrangements.
In other variations of the invention, mixing parameter estimates and mapped to a domain in which their value corresponds to physical location of the source and the weighted histogram constructed yields information about relative locations of the sources in addition as providing the means for separation.
It will be seen that the invention is useful in several applications, where the ability to separate underlying signals for their mixtures is of critical importance. For example, the ability to separate out one speaker from a number of speakers has applications in hearing aids; the ability to separate out a number of speakers from a mixture has application for automatic meeting transcription, monitoring or audio forensics; the ability to separate out the original sources of sound (valves, murmurs, etc..) from biomedical signals including heart sounds has
"diagnostic value for physicians (EEG, ECG, PCG, MEG); and the ability to demultiplex wireless signals based on their spatial signature (frequency-hopped waveforms) could result in improved multi-access wireless systems; other signals which could be processed include seismic signals or other terrestrial mapping signals, optics and optical signal transmissions, and optical and radio signals from space.
Claims
1. A method of blind source separation for demixing M mixtures of an arbitrary number of N signal sources comprising the steps of: a. decomposing the mixtures into respective sparse representations where a small number of components of a signal carry a large percentage of the energy of the signal; b. performing analysis in local regions of the representations on the assumption that in that region only m<M sources are active to provide m sets of mixing parameter estimates and associated mixing parameter estimate weights; c. creating a multi-dimensional weighted histogram using the mixing parameter estimates as indices into the histogram and associated weights for the weights of the histogram; d. identifying peaks in the histogram to determine the number of sources N and their associated mixing parameters; and e. assigning m instantaneous demixtures to m of the N output representations for each local region based on said mixing parameters.
2. A method according to claim 1 wherein said mixtures are anechoic and where N>M.
3. A method according to claim 1 wherein said mixtures are echoic and where M>2N.
3. A method according to claim 1 further comprising: converting the N output representations into the time domain.
4. A method according to claim 1 wherein said sparse representations comprise one of a time-frequency or a time-scale representation.
5. A method according to claim 1 wherein the mixing parameter weights comprise source energies associated with instantaneous demixing in this region.
6. A method according to claim 1 wherein said identifying comprises using one of clustering or iterative thresholded peak finding.
7. A method according to claim 1 wherein the associated mixing parameters for the histogram peaks are relative delay and attenuation mixing parameter estimates:
(αi .(5i), ... ,(ttjv. ON) ^
8. A method according to claim 1 wherein said assigning comprises using a distance in mixing parameter space.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP06791662A EP1932102A2 (en) | 2005-09-01 | 2006-08-25 | A method and apparatus for blind source separation |
US11/990,927 US20090268962A1 (en) | 2005-09-01 | 2006-08-25 | Method and apparatus for blind source separation |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
IES2005/0576 | 2005-09-01 | ||
IE20050576 | 2005-09-01 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2007025680A2 true WO2007025680A2 (en) | 2007-03-08 |
WO2007025680A3 WO2007025680A3 (en) | 2007-04-26 |
Family
ID=37667560
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/EP2006/008349 WO2007025680A2 (en) | 2005-09-01 | 2006-08-25 | A method and apparatus for blind source separation |
Country Status (3)
Country | Link |
---|---|
US (1) | US20090268962A1 (en) |
EP (1) | EP1932102A2 (en) |
WO (1) | WO2007025680A2 (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104198151A (en) * | 2014-09-01 | 2014-12-10 | 西北工业大学 | Air compressor aerodynamic instability signal detection method based on sparse decomposition |
WO2017108097A1 (en) * | 2015-12-22 | 2017-06-29 | Huawei Technologies Duesseldorf Gmbh | Localization algorithm for sound sources with known statistics |
CN109142507A (en) * | 2018-08-07 | 2019-01-04 | 四川钜莘信合科技有限公司 | Pipeline defect detection method and device |
CN110336574A (en) * | 2019-07-11 | 2019-10-15 | 中国人民解放军战略支援部队信息工程大学 | The restoration methods and device of one source signals |
CN110534130A (en) * | 2019-08-19 | 2019-12-03 | 上海师范大学 | A kind of deficient attribute tone deaf source separation method and device |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101233271B1 (en) * | 2008-12-12 | 2013-02-14 | 신호준 | Method for signal separation, communication system and voice recognition system using the method |
US8498863B2 (en) * | 2009-09-04 | 2013-07-30 | Massachusetts Institute Of Technology | Method and apparatus for audio source separation |
WO2011116186A1 (en) * | 2010-03-17 | 2011-09-22 | The Trustees Of Columbia University In The City Of New York | Methods and systems for blind analysis of resource consumption |
KR20130014895A (en) * | 2011-08-01 | 2013-02-12 | 한국전자통신연구원 | Device and method for determining separation criterion of sound source, and apparatus and method for separating sound source with the said device |
US20140229133A1 (en) * | 2013-02-12 | 2014-08-14 | Mitsubishi Electric Research Laboratories, Inc. | Method for Estimating Frequencies and Phases in Three Phase Power System |
US8958750B1 (en) * | 2013-09-12 | 2015-02-17 | King Fahd University Of Petroleum And Minerals | Peak detection method using blind source separation |
US10176818B2 (en) * | 2013-11-15 | 2019-01-08 | Adobe Inc. | Sound processing using a product-of-filters model |
CN103812808B (en) * | 2014-03-11 | 2016-08-24 | 集美大学 | A kind of it is applicable to plural blind source separation method and the system that source number dynamically changes |
CN105589099B (en) * | 2014-10-21 | 2018-03-06 | 中国石油化工股份有限公司 | A kind of polygonal band filtering approach of blind focus earthquake wave field |
CN105930857B (en) * | 2016-04-05 | 2019-04-23 | 西安电子科技大学 | Deficient based on block segmentation determines blind source separating hybrid matrix estimation method |
CN109214259A (en) * | 2017-12-20 | 2019-01-15 | 佛山科学技术学院 | Common space mode method based on the modulation of EEG signal locking phase |
CN110110619B (en) * | 2019-04-22 | 2021-02-09 | 西安交通大学 | Satellite micro-vibration source quantitative identification method based on sparse blind source separation |
-
2006
- 2006-08-25 US US11/990,927 patent/US20090268962A1/en not_active Abandoned
- 2006-08-25 WO PCT/EP2006/008349 patent/WO2007025680A2/en active Application Filing
- 2006-08-25 EP EP06791662A patent/EP1932102A2/en not_active Withdrawn
Non-Patent Citations (9)
Title |
---|
AOKI M ET AL: "SOUND SOURCE SEGREGATION BASED ON ESTIMATING INCIDENT ANGLE OF EACH FREQUENCY COMPONENT OF INPUT SIGNALS ACQUIRED BY MULTIPLE MICROPHONES" ACOUSTICAL SCIENCE AND TECHNOLOGY, ACOUSTICAL SOCIETY OF JAPAN, TOKYO, JP, vol. 22, no. 2, March 2001 (2001-03), pages 149-157, XP008073215 ISSN: 1346-3969 * |
CHANG C ET AL: "A MATRIX-PENCIL APPROACH TO BLIND SEPARATION OF COLORED NONSTATIONARY SIGNALS" IEEE TRANSACTIONS ON SIGNAL PROCESSING, IEEE SERVICE CENTER, NEW YORK, NY, US, vol. 48, no. 3, March 2000 (2000-03), pages 900-907, XP000937254 ISSN: 1053-587X * |
FUTOSHI ASANO ET AL: "Combined Approach of Array Processing and Independent Component Analysis for Blind Separation of Acoustic Signals" IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, IEEE SERVICE CENTER, NEW YORK, NY, US, vol. 11, May 2003 (2003-05), XP011079702 ISSN: 1063-6676 * |
JOURJINE A ET AL: "BLIND SEPARATION OF DISJOINT ORTHOGONAL SIGNALS: DEMIXING N SOURCESFROM 2 MIXTURES" 2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING. PROCEEDINGS. (ICASSP). ISTANBUL, TURKEY, JUNE 5-9, 2000, IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), NEW YORK, NY : IEEE, US, vol. VOL. 5 OF 6, 5 June 2000 (2000-06-05), pages 2985-2988, XP001035813 ISBN: 0-7803-6294-2 cited in the application * |
MELIA THOMAS ET AL: "UNDERDETERMINED BLIND SOURCE SEPARATION IN ECHOIC ENVIRONMENTS USING DESPRIT" EURASIP JOURNAL OF APPLIED SIGNAL PROCESSING, HINDAWI PUBLISHING CO., CUYAHOGA FALLS, OH, US, 2007, pages 1-19, XP008075105 ISSN: 1110-8657 * |
O'GRADY P D ET AL: "SURVEY OF SPARSE AND NON-SPARSE METHODS IN SOURCE SEPARATION" INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, WILEY AND SONS, NEW YORK, US, vol. 15, no. 1, 2005, pages 18-33, XP008069913 ISSN: 0899-9457 * |
PAULRAJ A ET AL: "Estimation Of Signal Parameters Via Rotational Invariance Techniques<1>- Esprit" ASILOMAR CONF. ON CIRCUITS, SYSTEMS AND COMPUTERS, 6 November 1985 (1985-11-06), pages 83-89, XP010277820 * |
ROY R ET AL: "ESPRIT-ESTIMATION OF SIGNAL PARAMETERS VIA ROTATIONAL INVARIANCE TECHNIQUES" IEEE TRANSACTIONS ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, IEEE INC. NEW YORK, US, vol. 37, no. 7, 1 July 1989 (1989-07-01), pages 984-995, XP000037318 ISSN: 0096-3518 cited in the application * |
SCOTT RICKARD ET AL: "Blind separation of speech mixtures via time-frequency masking" IEEE TRANSACTIONS ON SIGNAL PROCESSING, IEEE SERVICE CENTER, NEW YORK, NY, US, vol. 52, no. 7, July 2004 (2004-07), pages 1830-1847, XP002999675 ISSN: 1053-587X cited in the application * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104198151A (en) * | 2014-09-01 | 2014-12-10 | 西北工业大学 | Air compressor aerodynamic instability signal detection method based on sparse decomposition |
WO2017108097A1 (en) * | 2015-12-22 | 2017-06-29 | Huawei Technologies Duesseldorf Gmbh | Localization algorithm for sound sources with known statistics |
US10901063B2 (en) | 2015-12-22 | 2021-01-26 | Huawei Technologies Duesseldorf Gmbh | Localization algorithm for sound sources with known statistics |
CN109142507A (en) * | 2018-08-07 | 2019-01-04 | 四川钜莘信合科技有限公司 | Pipeline defect detection method and device |
CN110336574A (en) * | 2019-07-11 | 2019-10-15 | 中国人民解放军战略支援部队信息工程大学 | The restoration methods and device of one source signals |
CN110534130A (en) * | 2019-08-19 | 2019-12-03 | 上海师范大学 | A kind of deficient attribute tone deaf source separation method and device |
Also Published As
Publication number | Publication date |
---|---|
WO2007025680A3 (en) | 2007-04-26 |
EP1932102A2 (en) | 2008-06-18 |
US20090268962A1 (en) | 2009-10-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2007025680A2 (en) | A method and apparatus for blind source separation | |
Pedersen et al. | Convolutive blind source separation methods | |
US7496482B2 (en) | Signal separation method, signal separation device and recording medium | |
US7647209B2 (en) | Signal separating apparatus, signal separating method, signal separating program and recording medium | |
EP3387648B1 (en) | Localization algorithm for sound sources with known statistics | |
Boashash et al. | Robust multisensor time–frequency signal processing: A tutorial review with illustrations of performance enhancement in selected application areas | |
EP2203731B1 (en) | Acoustic source separation | |
US8521477B2 (en) | Method for separating blind signal and apparatus for performing the same | |
US10818302B2 (en) | Audio source separation | |
Mirzaei et al. | Blind audio source counting and separation of anechoic mixtures using the multichannel complex NMF framework | |
Nikunen et al. | Multichannel audio separation by direction of arrival based spatial covariance model and non-negative matrix factorization | |
Kubo et al. | Efficient full-rank spatial covariance estimation using independent low-rank matrix analysis for blind source separation | |
Kitamura et al. | Relaxation of rank-1 spatial constraint in overdetermined blind source separation | |
Mitsufuji et al. | Multichannel blind source separation based on non-negative tensor factorization in wavenumber domain | |
Kim et al. | Efficient online target speech extraction using DOA-constrained independent component analysis of stereo data for robust speech recognition | |
Goto et al. | Geometrically constrained independent vector analysis with auxiliary function approach and iterative source steering | |
Li et al. | An EM algorithm for audio source separation based on the convolutive transfer function | |
Bourennane et al. | Locating wide band acoustic sources using higher order statistics | |
Melia et al. | Histogram-based blind source separation of more sources than sensors using a DUET-ESPRIT technique | |
Nakashima et al. | Faster independent low-rank matrix analysis with pairwise updates of demixing vectors | |
Fontaine et al. | Scalable source localization with multichannel α-stable distributions | |
Taniguchi et al. | Linear demixed domain multichannel nonnegative matrix factorization for speech enhancement | |
Xie et al. | A fast and efficient frequency-domain method for convolutive blind source separation | |
CN109074811B (en) | Audio source separation | |
Douglas et al. | Convolutive blind source separation for audio signals |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 11990927 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2006791662 Country of ref document: EP |
|
WWP | Wipo information: published in national office |
Ref document number: 2006791662 Country of ref document: EP |