CN108364659B

CN108364659B - Frequency domain convolution blind signal separation method based on multi-objective optimization

Info

Publication number: CN108364659B
Application number: CN201810112970.9A
Authority: CN
Inventors: 张伟涛; 孙瑾铃; 李扬; 楼顺天
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2018-02-05
Filing date: 2018-02-05
Publication date: 2021-06-01
Anticipated expiration: 2038-02-05
Also published as: CN108364659A

Abstract

The invention provides a frequency domain convolution blind signal separation method based on multi-objective optimization, which is used for solving the problem that the convergence to a degradation solution is easy in the prior art, and can realize the frequency domain convolution blind signal separation with source signals smaller than the number of observation signals, and the realization steps are as follows: obtaining a set of target matrices

(ii) a Constructing a diagonalized matrix B (ω)_k) (ii) a Constructing a non-orthogonal joint diagonalization multi-objective optimization model; using a non-orthogonal joint diagonalization multi-target optimization model to set a target matrix

Separation matrix W (omega) on each frequency point_k) Carrying out estimation; and acquiring an estimated value of the time domain source signal. The invention has high reliability and wide application range, and can be applied to blind separation of convolution mixed signals such as voice signals, communication signals and the like under an overdetermined condition.

Description

Frequency domain convolution blind signal separation method based on multi-objective optimization

Technical Field

The invention belongs to the technical field of blind signal processing, relates to a frequency domain convolution blind signal separation method, in particular to a frequency domain convolution blind signal separation method based on multi-objective optimization joint diagonalization, and can be applied to blind separation of convolution mixed signals such as voice signals, communication signals and the like under an overdetermined condition.

Background

The objective optimization problem generally refers to obtaining an optimal solution of an objective function through a certain optimization algorithm. When the optimized objective function is one, it is called Single-object Optimization (SOP). When there are two or more optimized objective functions, it is called Multi-objective Optimization (MOP). Unlike the solution of single-objective optimization which is a finite solution, the solution of multi-objective optimization is usually a set of equilibrium solutions.

In signal processing problems such as wireless communication, radar, sonar and the like, a problem of recovering a source signal from a plurality of observation signals often exists, and a blind signal separation technology provides a potential solution for the problems. Early studies of the blind signal separation problem focused on relatively simple transient mixing situations, but in practical applications, such as the "cocktail party" problem, the observed mixed speech signal was actually a convolutional mixed speech signal, taking into account the multipath effects of sound propagation.

The existing blind separation method of the convolution mixed speech signal is mainly divided into a frequency domain method and a time domain method, the time domain method generally adopts a method of performing joint block diagonalization on a correlation matrix to estimate a separation matrix, and the method has the defects of large calculation amount and often causing the problem of high-dimensional joint block diagonalization, for example, the calculation becomes difficult under the high-order convolution mixing (severe reverberation environment).

In the frequency domain, a method for performing joint diagonalization estimation on a power spectral density matrix to obtain a separation matrix is generally adopted, and the method has the problems of easiness in convergence to a degenerate solution, requirement on a mixed matrix to be a square matrix, uncertain sequencing and the like. This greatly limits the application of this method to the separation of convolved blind signals. Joint diagonalization algorithms are further classified into orthogonal joint diagonalization algorithms and non-orthogonal joint diagonalization algorithms, the orthogonal joint diagonalization algorithms require that a separation matrix must be an orthogonal matrix, and although the separation matrix can meet an orthogonality condition through whitening processing in many cases, the whitening processing introduces additional errors, which results in poor separation performance. In order to avoid the separation performance deterioration caused by the error introduced by the whitening process, a non-orthogonal joint diagonalization algorithm which does not require the separation matrix to be an orthogonal matrix is frequently used at present.

The application research of the non-orthogonal joint diagonalization algorithm is still in a primary stage, the existing NOODLES method, QDIAG method and ACDC method have the problems that convergence to a degenerate solution is easy to happen, and the reliability of separation is poor. Although the J-Di method, the FFDIAG method and the Jacobilike method avoid the problem of easy convergence to a degradation solution, the frequency domain convolution blind signal separation with the same number of source signals and observation signals can be realized only because the separation matrix is limited to a square matrix, and the application range is limited.

Disclosure of Invention

The invention aims to overcome the defects in the prior art, provides a frequency domain convolution blind signal separation method based on multi-objective optimization, is used for solving the problem that convergence to a degradation solution is easy in the prior art, and can realize frequency domain convolution blind signal separation with source signals smaller than the number of observation signals.

The technical idea of the invention is as follows: transforming the observed time domain convolution mixed signal into a transient mixed model of a frequency domain, estimating a separation matrix of each frequency point by using a multi-objective optimization non-orthogonal joint diagonalization algorithm, recovering a source signal in the frequency domain by using the separation matrix, and obtaining a source signal time domain waveform after Fourier inversion, wherein the specific implementation steps are as follows:

(1) obtaining a set of target matrices

(1a) M sensors receive observed signals x from N source signal sensors_m(t) forming an observed signal vector x (t), x (t) x [ x ]₁(t),...,x_M(t)]^TWherein N is more than or equal to 1, M is more than or equal to N, M represents a sensor serial number, and M is 1.

(1b) Dividing x (t) to obtain Q observation signal sub-vectors, and calculating a target matrix through each observation signal sub-vector to obtain a target matrix set consisting of Q multiplied by K target matrices

Wherein, R (k, q) represents a target matrix on the kth frequency point of the qth section of the observed signal sub vectorK represents the sequence number of the target matrix calculated by each observation signal sub-vector, q represents the sequence number of the observation signal sub-vector, and K represents the number of the target matrix calculated by each observation signal sub-vector;

(2) constructing a diagonalized matrix B (ω)_k)：

Constructing a diagonalized matrix B (ω) with dimension M N_k) Wherein, ω is_kRepresenting a set of target matrices

The kth frequency point;

(3) constructing a non-orthogonal joint diagonalization multi-objective optimization model:

using R (k, q) and B (ω)_k) Constructing a non-orthogonal joint diagonalization multi-objective optimization model:

wherein, b_nRepresents the diagonalized matrix B (ω)_k) Min represents the minimize operation, max represents the maximize operation, Off (-) represents the diagonal operation of the nulling matrix, (·)^HRepresenting the complex conjugate operation of the matrix, det (-) represents the determinant operation of the matrix;

(4) using a non-orthogonal joint diagonalization multi-target optimization model to set a target matrix

Separation matrix W (omega) on each frequency point_k) And (3) estimating:

(4a) setting a set of target matrices

The diagonalized matrix for the first bin has an initial value of B (ω)₁)＝[I，0]^TSetting a condition number threshold to be psi and an iteration stop condition threshold to be lambda, and making k equal to 1, where I represents an N × N dimensional identity matrix, [ ·]^TA transpose operation representing a matrix;

(4b)for the target matrix set

Separation matrix W (omega) on the k-th frequency point_k) And (3) estimating:

(4b.1) making n 1;

(4b.2) calculating the Hessian matrix Q_nAnd orthogonal projection matrix

Wherein, B_nRepresents the diagonalized matrix B (ω)_k) Deleting the matrix formed by the residual column vectors after the nth column, wherein I represents a unit matrix [ ·]^-1An inversion operation of the representation matrix;

(4b.3) calculating the Hessian matrix Q_nCondition number of (K) (Q)_n) And determining kappa (Q)_n) If yes, executing step (4b.5), otherwise executing step (4 b.4);

(4b.4) computing the matrix pair (

Q_n) And the eigenvector corresponding to the largest generalized eigenvalue is taken as the diagonalized matrix B (omega)_k) And performing step (4 b.7);

(4b.5) calculating the intermediate matrix C:

wherein, U₀Representation matrix Q_nThe eigenvector matrix corresponding to the M-N +1 minimum eigenvalues;

(4b.6) computing the diagonalized matrix B (ω)_k) N column vector b_nVector value of (d):

b_n＝U₀w

wherein w represents the eigenvector corresponding to the maximum eigenvalue of the intermediate matrix C;

(4b.7) making N equal to N +1, and judging whether N is less than or equal to N, if so, executing the step (4b.2), otherwise, executing the step (4 b.8);

(4b.8) calculating a cost function J (B (ω)_k) And | J (B (ω)_k))-J(B(ω_k-1) If yes, executing step (4b.1), otherwise executing step (4 b.9);

(4b.9) diagonal matrix B (ω)_k) Taking complex conjugation to obtain a separation matrix W (omega)_k)；

(4c) Let K be K +1, and judge whether K ≦ K holds, if yes, let B (ω ≦ K, let B (ω) be_k)＝W^H(ω_k-1) And executing the step (4b), otherwise executing the step (5);

(5) obtaining an estimated value of a time domain source signal:

(5a) calculating the estimated value of the source signal vector on the kth frequency point of the qth section

Wherein x (k, q) represents an observed signal vector on the kth frequency point of the qth section;

(5b) to pair

And performing inverse Fourier transform to obtain a time domain source signal estimation value, and realizing the separation of the frequency domain convolution blind signals.

Compared with the prior art, the invention has the following advantages:

(1) when the separation matrix is estimated, the multi-objective optimization non-orthogonal joint diagonalization model is adopted, the condition number of the diagonalization matrix is considered, the problem that convergence to a degradation solution is easy to occur is avoided, and compared with the prior art, the reliability of the separation of the convolution blind signals is improved.

(2) The method changes the constraint of the diagonalization matrix into the constraint of the product of the conjugate transpose of the diagonalization matrix and the diagonalization matrix, eliminates the limitation that the separation matrix is a square matrix, can realize the frequency domain convolution blind signal separation of the source signals which are less than or equal to the number of the observation signals, and has wider application range while avoiding easy convergence to a degradation solution compared with the prior art.

Drawings

FIG. 1 is a flow chart of an implementation of the present invention;

FIG. 2 is a pair of sheets of paper of the present invention

Separation matrix W (omega) on each frequency point_k) A flow chart for performing the estimation;

FIG. 3(a) is a waveform diagram of 3 source signals used in the simulation of the present invention;

FIG. 3(b) is a diagram of a signal waveform recovered by the NOODLES method;

FIG. 3(c) is a diagram of a signal waveform recovered by the ACDC method;

fig. 3(d) is a diagram of a signal waveform recovered by the method of the present invention.

Detailed Description

The invention is described in further detail below with reference to the following figures and specific examples:

the embodiment is based on a cocktail party scene, and the speech of the conversation of three persons is separated by using the invention according to the conversation contents of the three persons received from the 4 microphone sensors. In this example, the sensor is a microphone and the received convolved aliased signal is a speech signal.

Referring to fig. 1, a frequency domain convolution blind signal separation method based on multi-objective optimization includes the following steps:

step 1) obtaining a target matrix set

(1a) M electrical signal sensors receive observation signals x from N source signal sensors_m(t) forming an observed signal vector x (t), x (t) x [ x ]₁(t)，...，x_M(t)]^TWhere N ≧ 1, and M ≧ N, M denotes a sensor number, M ═ 1.., M, in the present embodiment, M ═ 4, N ═ 3;

Wherein, R (K, Q) represents a target matrix at the kth frequency point in the qth segment of the observed signal sub-vector, K represents a target matrix number calculated by each observed signal sub-vector, Q represents an observed signal sub-vector number, and K represents the number of target matrices calculated by each observed signal sub-vector, in this embodiment, x (t) is divided into 20 observed signal sub-vectors:

(1b.1) to the observed Signal x_m(t) performing short-time fourier transform to obtain a frequency-domain mixed signal vector x (K, Q), where K denotes a target matrix number calculated for each observation signal sub-vector, Q denotes an observation signal sub-vector number, and K is 1,..,. K, Q is 1,..,. Q, and K denotes a number of fourier transform frequency points, which is equal to the number of target matrices calculated for each observation signal sub-vector, and in this embodiment, K is 256:

(1b.11) calculating a K-point discrete windowed Fourier transform:

wherein, the subscript i represents the frame number, the superscript m represents the Fourier transform of the mth path of observation signal,

a q-th section representing the m-th observation signal, the window function length is K, the window function slides along the forward direction, the sliding distance of adjacent frames is d ═ 1-mu) K, mu is an overlapping factor of two adjacent frames, and mu is generally 50%;

(1b.12) combining the frame spectra of the m-th mixed signal into a vector

Wherein I represents the frame number of the q-th section of discrete windowed Fourier transform;

(1b.13) Using vector x^m(k, q) constructing a frequency domain mixed signal vector x (k, q) ═ x¹(k，q),...，x^M(k，q)]^T；

(1b.2) calculating the Power spectral Density matrix R_x(k，q)：

R_x(k,q)＝E[x(k,q)x^H(k,q)]；

(1b.3) estimation of noise variance σ by principal component analysis²；

(1b.4) calculating a target matrix R (k, q) on the kth frequency point of the qth section of the observation signal sub vector, wherein R (k, q) is R_x(k，q)-σ²I, wherein I represents an identity matrix.

Step 2) constructing a diagonalized matrix B (omega)_k)：

The kth frequency point;

step 3), constructing a non-orthogonal joint diagonalization multi-objective optimization model:

wherein, b_nPresentation pairThe matrix B (ω) is formed by angle_k) Min represents the minimize operation, max represents the maximize operation, Off (-) represents the diagonal operation of the nulling matrix, (·)^HRepresenting the complex conjugate operation of the matrix, det (-) represents the determinant operation of the matrix;

step 4) utilizing a non-orthogonal joint diagonalization multi-target optimization model to perform target matrix set

Separation matrix W (omega) on each frequency point_k) The estimation is carried out, and the implementation process of the estimation is shown in fig. 2:

(4a) setting a set of target matrices

The diagonalized matrix for the first bin has an initial value of B (ω)₁)＝[I，0]^TSetting the condition number threshold to be psi, the order of magnitude of psi typically being 10³The iteration stop condition threshold is set to λ, which is typically of the order of 10^-2Let k be 1, where I represents an N × N-dimensional identity matrix [ ·]^TA transpose operation representing a matrix;

(4b) for the target matrix set

Separation matrix W (omega) on the k-th frequency point_k) And (3) estimating:

(4b.1) making n 1;

(4b.2) calculating the Hessian matrix Q_nAnd orthogonal projection matrix

(4b.3) calculating the Hessian matrix Q_nCondition number of (K) (Q)_n) And determining kappa (Q)_n) If psi is true, the Hessian matrix Q is determined_nIs ill-conditioned, step 4b.5) is performed, otherwise step 4b.4) is performed), hessian matrix Q_nCondition number of (K) (Q)_n) The calculation is performed as follows:

wherein [ ·]^-1Expressing the inversion operation of the matrix, and expressing the norm operation by | DEG |;

(4b.4) computing the matrix pair (

Q_n) And the eigenvector corresponding to the largest generalized eigenvalue is taken as the diagonalized matrix B (omega)_k) And step 4b.7) is executed, wherein, the feature vector corresponding to the maximum generalized eigenvalue is obtained by the following steps:

for matrix pair (

Q_n) Performing generalized eigenvalue decomposition to obtain a matrix pair (

Q_n) And a matrix V consisting of a diagonal matrix D of generalized eigenvalues and eigenvectors corresponding to the generalized eigenvalues, and the first column of V being a matrix pair: (

Q_n) The maximum generalized eigenvalue corresponding eigenvector, the generalized eigenvalue decomposition formula is:

wherein, eig (·) represents generalized eigenvalue decomposition operation;

(4b.5) calculating the intermediate matrix C:

b_n＝U₀w

(4b.8) calculating a cost function J (B (ω)_k) And | J (B (ω)_k))-J(B(ω_k-1) If λ is true, if yes, step (4b.1) is performed, otherwise step (4b.9) is performed, where B (ω) is considered at the first iteration₀) Is a zero matrix;

(4c) Let K be K +1, and judge whether K ≦ K holds, if yes, let B (ω ≦ K, let B (ω) be_k)＝W^H(ω_k-1) And step (4B) is performed, otherwise step (5), B (ω), is performed_k)＝W^H(ω_k-1)＝B(ω_k-1) Taking the iteration result of the diagonalized matrix B as the initial value of the next iteration, wherein the step aims to solve the sequencing problem which can occur in frequency domain separation;

step 5) obtaining an estimated value of the time domain source signal:

(5a) calculating the source signal direction of the kth frequency point of the qth sectionQuantity estimation

Wherein x (k, q) represents an observed signal vector at the kth frequency point of the qth segment, and W (omega)_k) Representing a separation matrix;

(5b) to pair

The technical effects of the present invention will be further explained by simulation experiments.

1. Simulation conditions and contents:

simulation conditions are as follows: MATLAB (R2013a), Intel (R) core (TM) i7-2600CPU 6503.40 GHz, Window 7 Professional.

Simulation content: the source signals are N-3 sinusoidal signals with different frequencies, Gaussian white noise is superposed on the sinusoidal signals, the SNR (signal to noise ratio) is set to be 10dB, and an 8-tap FIR (finite impulse response) filter is used for establishing a convolution mixed model. 20000 sample points are acquired from the three source signals, and mixed signals are acquired by using M ═ 4 receiving sensors, wherein elements of the mixed matrix A are randomly generated and obey standard normal distribution. The performance of the blind signal separation method is measured by a signal-to-interference ratio (SIR), the larger the SIR, the better the blind separation performance, and the SIR is defined as:

wherein, G (ω)_k)＝W(ω_k)A(ω_k) In order to be a frequency domain global transformation matrix,

g_nj(ω_k) Is a matrix G (omega)_k) Row n and column j.

2. And (3) simulation results:

the waveform diagrams of the 3 source signals used in the simulation of the present invention are shown in fig. 3 (a). The method of the present invention (JD-NS) is now compared to two other methods, one of which is an alternating column update diagonalization (ACDC) based method and the other of which is a non-orthonormal Jacobian approximation joint diagonalization (NOODLES) method. The three recovered source signals separated by the NOODLES method are shown in FIG. 3(b), and the three recovered source signals separated by the ACDC method are shown in FIG. 3 (c).

It can be seen that the three signals recovered using the ACDC method are all similar to the second source signal in the source signal waveform diagram, indicating that the ACDC method has in fact converged to a degenerate solution. The method of the invention can effectively recover all source signals, and the recovered source signals do not contain components of other source signals, which shows better separation effect.

Table 1 summarizes the SIR performance of the source signal recovered by the method of the present invention and the nodles method, when SNR is 10dB, 100 independent experiments are performed. It can be seen that the method of the present invention is superior to the NOODLES method in terms of SIR performance of three recovered source signals, and has higher reliability of convolution blind signal separation than the NOODLES method.

TABLE 1

Claims

1. A frequency domain convolution blind signal separation method based on multi-objective optimization is characterized by comprising the following steps:

(1) obtaining a set of target matrices

(1a) M sensors receive observed signals x from N source signal sensors_m(t) forming an observed signal vector x (t), x (t) x [ x ]₁(t),...,x_M(t)]^TWherein N is more than or equal to 1, M is more than or equal to N, and M represents transmissionA sensor number, M ═ 1.., M;

Wherein R (K, q) represents a target matrix on a kth frequency point of a qth section of an observation signal sub-vector, K represents a target matrix serial number calculated by each observation signal sub-vector, q represents an observation signal sub-vector serial number, and K represents the number of the target matrices calculated by each observation signal sub-vector;

(2) constructing a diagonalized matrix B (ω)_k)：

The kth frequency point;

Separation matrix W (omega) on each frequency point_k) And (3) estimating:

(4a) setting a set of target matrices

The diagonalized matrix for the first bin has an initial value of B (ω)₁)＝[I,0]^TSetting a condition number threshold to be psi and an iteration stop condition threshold to be lambda, and making k equal to 1, where I represents an N × N dimensional identity matrix, [ ·]^TA transpose operation representing a matrix;

(4b) for the target matrix set

Separation matrix W (omega) on the k-th frequency point_k) And (3) estimating:

(4b.1) making n 1;

(4b.2) calculating the Hessian matrix Q_nAnd orthogonal projection matrix

(4b.3) calculating the Hessian matrix Q_nCondition number of (K) (Q)_n) And determining kappa (Q)_n) If yes, executing step 4b.5), otherwise executing step 4 b.4);

(4b.4) computing the matrix pairs

And the eigenvector corresponding to the largest generalized eigenvalue is taken as the diagonalized matrix B (omega)_k) And step 4b.7 is performed);

(4b.5) calculating the intermediate matrix C:

b_n＝U₀w

(5) obtaining an estimated value of a time domain source signal:

(5b) to pair

2. The frequency-domain convolution blind signal separation method based on multi-objective optimization according to claim 1, wherein R (k, q) in step (1b) is calculated by:

(1b.1) to the observed Signal x_m(t) performing short-time Fourier transform to obtain a frequency domain mixed signal vector x (K, Q), wherein K represents a target matrix serial number calculated by each observation signal sub-vector, Q represents an observation signal sub-vector serial number, and K is 1, a.

(1b.2) calculating the Power spectral Density matrix R_x(k,q)：

R_x(k,q)＝E[x(k,q)x^H(k,q)]；

(1b.3) estimation of noise variance σ by principal component analysis²；

(1b.4) calculating a target matrix R (k, q) on the kth frequency point of the qth section of the observation signal sub vector, wherein R (k, q) is R_x(k,q)-σ²I, wherein I represents an identity matrix.

3. The method for frequency-domain blind convolutional signal separation based on multi-objective optimization of claim 1, wherein said step (4b.3) of computing hessian matrix Q_nCondition number of (K) (Q)_n) The following is calculated:

wherein [ ·]^-1The inverse operation of the matrix is represented, and | | · | | represents the norm operation.

4. The method for separating frequency-domain convolution blind signals based on multi-objective optimization according to claim 1, wherein the obtaining step of the eigenvector corresponding to the maximum generalized eigenvalue in step (4b.4) is:

to matrix pair

Carrying out generalized eigenvalue decomposition to obtain matrix pairs

A diagonal matrix D formed by generalized eigenvalues and a matrix V formed by eigenvectors corresponding to the generalized eigenvalues, and the first column of V is taken as a matrix pair

The maximum generalized eigenvalue corresponding eigenvector, the generalized eigenvalue decomposition formula is:

wherein, eig (·) represents generalized eigenvalue decomposition operation.