CN108875824B

CN108875824B - Single-channel blind source separation method

Info

Publication number: CN108875824B
Application number: CN201810599522.6A
Authority: CN
Inventors: 孙林慧; 谢可丽
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2018-06-11
Filing date: 2018-06-11
Publication date: 2022-09-27
Anticipated expiration: 2038-06-11
Also published as: CN108875824A

Abstract

The invention provides a single-channel blind source separation method, which gives training samples T of different source audio signals in a training stage _i Obtaining the corresponding initial identity sub-dictionary D _i And an initial common sub-dictionary D _c Thereby obtaining a sub-dictionary D including the initial identity _i And an initial public sonDictionary D _c The joint dictionary is updated by adopting an optimization function iteration; solving a sparse projection coefficient of the mixed signal under the joint dictionary by adopting a sparse coding algorithm; recovering each source signal according to the sub-dictionaries and the sparse vectors; aiming at the problem of 'cross projection' caused by weak distinguishing capability of a joint dictionary, the invention adopts the joint dictionary with high distinguishing capability, and compared with other single-channel blind source separation algorithms based on sparse representation, the invention reduces source interference and obviously improves the separation quality.

Description

Single-channel blind source separation method

Technical Field

The invention relates to a single-channel blind source separation method.

Background

Single-channel blind source separation (SCBSS) refers to a process of recovering a multi-dimensional source signal from a one-dimensional mixed signal. Compared with common blind source separation, single-channel blind source separation only has one path of observation signals, which is a morbid problem and is extremely difficult to solve. However, the single-channel blind source separation only needs to use one sensor to receive signals, and the system is relatively simple and low in cost. Compared with the traditional blind source separation algorithm, the single-channel blind source separation method has wider applicability in the real field and has wide application prospect and practical significance in the fields of biomedical signal processing, array signal processing, voice recognition, image processing, communication and the like.

The process of acquiring the sparse representation or sparse approximation of the signal under the over-complete dictionary is called sparse decomposition of the signal, and the concise representation of the signal can be obtained through the sparse decomposition of the signal. Sparse representations of signals have been applied to many aspects of signal processing, especially in the problems of compressed sensing of speech signals, underdetermined blind source separation, and the like. Sparse representation theory is currently one of the most popular techniques of SCBSS, and mainly includes dictionary training and separation of two parts. The construction of the sparse combined dictionary is the most important link of single-channel blind source separation based on sparse decomposition, and is also the key point of the research of the invention, because the construction directly influences the quality of a separation signal. That is, the large space expressed by the mixed signal is composed of a plurality of subspaces, each subspace can express the signal of a certain source to the greatest extent, and then the signal of a single source can be reconstructed through the sparse coefficient and the basis of the corresponding subspace, so that signal separation is realized. Typical training dictionary methods include non-Negative Matrix Factorization (NMF), K-SVD, and the like. The dictionary learning method is to learn the corresponding sub-dictionaries by utilizing training samples of specific source signals, namely, the dictionary of each source is independently trained, so that the dictionaries have certain distinctiveness. The separation quality of single-channel blind source separation based on sparse representation depends on the discriminativity of the joint dictionary. If the united dictionary is not highly distinguishable, the separation effect is poor. This means that in addition to the expectation that each dictionary represents the corresponding source well, we also expect some distinctiveness between sub-dictionaries and other sub-dictionaries.

The discriminative dictionary learning theory is widely applied in the fields of Pattern Classification (PC) and Face Recognition (FR), and has achieved great success. Bao et al propose a Distinguishable Dictionary Learning (DDL) algorithm, which reduces cross coherence among dictionaries by optimizing functions to achieve the purpose that a joint dictionary has a distinguishing capability. Pearlcuter et al adopted l ₁ The norm optimization algorithm comprises the steps of obtaining a sparse dictionary of a speaker through training, combining the dictionaries obtained through training to obtain a mixed dictionary, and finally obtaining a mixed dictionary through l ₁ And (4) solving a sparse projection vector of each source speech signal by a norm optimization algorithm, and reconstructing to obtain a separated speech signal. In a recently proposed joint dictionary (CJD) for SCBSSs, we first learn an identity sub-dictionary using source speech signals corresponding to each speaker. Then we discard similar atoms between two identical sub-dictionaries and use these similar atoms to construct a common sub-dictionary.

The above methods all achieve a certain separation effect, but they do not utilize the relationship between different source signals to suppress the similarity between atoms of different sub-dictionaries. In practice, there are always some similar components between different source signals, which reduces the distinguishing capability of different identity sub-dictionaries and creates the problem of "cross projection" caused by mutual interference between dictionaries. That is, when the mixed speech signal is represented under the joint dictionary, the signal of a certain source will respond to the sub-dictionary corresponding to other sources, resulting in poor separation.

Disclosure of Invention

The invention aims to provide a single-channel blind source separation method to solve the problem of 'cross projection' caused by weak distinguishing capability of a sparse joint dictionary in the prior art.

The technical solution of the invention is as follows:

a single-channel blind source separation method comprises the following steps,

s1, training stage, giving training samples T of different source audio signals _i Obtaining the corresponding initial identity sub-dictionary D _i And an initial common sub-dictionary D _c Thereby obtaining a sub-dictionary D including the initial identity _i And an initial common sub-dictionary D _c The initial joint dictionary adopts an optimization function to update the joint dictionary in an iterative manner;

s2, separating the language signals, and solving sparse projection coefficients of the mixed signals under the joint dictionary by adopting a sparse coding algorithm; and recovering each source signal according to the sub-dictionary and the sparse vector.

Further, in step S1, specifically,

s11, training by different source training samples to obtain an initial joint dictionary;

s12, fixing the initial joint dictionary D to obtain a sparse vector of the training sample on the initial joint dictionary;

and S13, fixing the current sparse vector, and updating through an optimization function to obtain the joint dictionary.

Further, in step S11, the initial identity sub-dictionary D is obtained _i And an initial common sub-dictionary D _c The initial joint dictionary of (a) is specifically,

training by a K-SVD method to obtain an identity sub-dictionary D _i The dictionary training is of the form:

wherein D _i Is a dictionary with normalized atoms obtained by training, and alpha is T _i In dictionary D _i The projection coefficients of (a);

splicing two training samples T ═ T ₁ ，T ₂ ]Taking a DCT dictionary as an initial dictionary and training by a K-SVD methodGet a common sub-dictionary D _c And splicing to obtain an initial joint dictionary D ═ D ₁ ，D ₂ ，D _c ]。

Further, in step S12, when the initial joint dictionary D is fixed, a sparse coding BP algorithm is selected to update to obtain a sparse coding coefficient, and the coding coefficient is updated by using the following optimization function:

where "1" represents a 1-norm, i.e. the sparse matrix X _i Of each column of non-zero elements.

Further, in step S13, the optimization function is:

wherein the content of the first and second substances,

here the reconstruction error is measured by the F-norm, J is the proposed objective function, r (D, X) _i ，T _i ) Is the cross-projection penalty term of the ith speaker, where D ═ D in equation (2) ₁ ，D ₂ ，…，D _m ，D _c ]Joint dictionary representing training, D _m Is the identity sub-dictionary of the mth speaker, D _c Is a common sub-dictionary; t is a unit of _i (i ═ 1,2, …, m) represents training samples of clean sound sources; x _i Represents T _i A sparse vector matrix at D;

represents the coefficient X _i And sub-dictionary D _i A corresponding sparse coefficient;

represents the coefficient X _i And seedDictionary D _j A corresponding sparse coefficient;

represents the coefficient X _i And sub-dictionary D _c A corresponding sparse coefficient; alpha is alpha _i Is a weight vector.

Further, in step S2, the sparse projection coefficient of the mixed signal under the joint dictionary is solved by using the sparse coding algorithm, specifically,

when a joint dictionary is used that contains a common sub-dictionary, the equation is D ' x E's, s being the mixture of the two source speech signals, E ' E ₁ ，E ₂ ，E _c ]，E ₁ ,E ₂ And E _c Are each D' ₁ ,D' ₂ And D _c The sparse projection coefficient of s above, the sparse projection coefficient E' is obtained by solving the following equation:

where K is the sparsity, i.e., the number of non-zero elements of matrix E'.

Further, in step S2, restoring each source signal according to the sub-dictionary and the sparse vector is specifically,

after the mixed signal is sparsely represented on the trained joint dictionary, the sub-dictionary D is used ₁ And D ₂ Response on plus common sub-dictionary D _c Restoring each source signal according to the response of a certain proportion, and calculating to obtain an estimated source audio signal by the formula (6) when obtaining a sparse coding matrix E

Where α is a weight vector.

Further, α is set to 0.1, and 1- α is set to 0.9.

The invention has the beneficial effects that: compared with the traditional dictionary learning method, the single-channel blind source separation method makes full use of the characteristics of the voice signals, starts from the commonness and difference of different source signals, and constructs an optimization function to inhibit the non-corresponding part in the sparse representation coefficient. Different components of the sound source are projected on the corresponding identity sub-dictionary as much as possible, and similar components of the sound source are projected on the common sub-dictionary, so that the phenomenon of 'cross projection' is weakened, and signals can be better separated. Aiming at the problem of 'cross projection' caused by weak distinguishing capability of a joint dictionary, the invention adopts the joint dictionary with high distinguishing capability, and compared with other single-channel blind source separation algorithms based on sparse representation, the invention reduces source interference and obviously improves the separation quality.

Drawings

Fig. 1 is an explanatory block diagram of the single-channel blind source separation method of the present invention.

Fig. 2 is an explanatory diagram of source signal separation in the embodiment.

Fig. 3 is a schematic diagram of the variation of the separation effect with the weight vector α in the embodiment.

Fig. 4 is a schematic diagram of the variation of the separation effect with the weight vector 1-alpha in the embodiment.

FIG. 5 is a diagram of separated speech in an embodiment.

FIG. 6 is a diagram showing the variation of the separation effect with the number of sub-dictionary atoms in the example.

Detailed Description

Preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

Examples

According to the single-channel blind source separation method, a public sub-dictionary containing similar information is introduced, the relationship between the sub-dictionaries is constrained by constructing a new optimization function, and the difference of the sub-dictionaries is further improved. The unique components of each speaker sample are sparsely represented by the identity sub-dictionary corresponding to the speaker sample as much as possible, while the components with high correlation are sparsely represented by the public sub-dictionary as much as possible, and when the source signal is represented by the joint dictionary, the problem of cross projection can be effectively avoided, so that the quality of voice separation is improved.

A single-channel blind source separation method, as shown in fig. 1, comprising the following steps:

s1, training phase. Given different source speech signal training samples T _i Obtaining the corresponding initial identity sub-dictionary D _i And an initial common sub-dictionary D _c Thereby obtaining a sub-dictionary D including the initial identity _i And an initial common sub-dictionary D _c The joint dictionary adopts an optimization function to update the joint dictionary in an iterative manner.

The proposal of a new optimization function of the embodiment. Compared with the existing single-channel blind source separation algorithm based on dictionary learning, the embodiment method divides the self characteristics of each source signal into two parts, wherein one part is the unique characteristics of different speaker samples, and the other part is the common characteristics among different speaker samples. Wherein the unique features of the speaker samples are independent of each other. Accordingly, for different speakers, the joint dictionary includes three parts: a respective sub-dictionary of speakers and a common sub-dictionary used to represent the similarity of different speakers.

In order to construct a joint dictionary with strong distinctiveness, the difference of the identity sub-dictionaries is further improved. In the training process, the embodiment method restricts the relation between the sub-dictionaries by constructing a proper objective function, so that the unique components of each speaker sample are sparsely represented by the sub-dictionaries corresponding to the unique components as much as possible, and the components with high correlation are sparsely represented by the common sub-dictionary as much as possible.

Here, in order not to lose generality, discussion is developed based on the case of m speakers. Suppose T _i Is the training sample of the ith source signal, which corresponds to the sub-dictionary D _i The corresponding sparse representation coefficient is X ═ X ₁ ,X ₂ ,...,X _m ,X _c ] ^T . In order to improve the separation performance of the algorithm, the cross-projections between the source signals must be suppressed. To this end, a new objective function is proposed, as shown in equation (1):

wherein, the first and the second end of the pipe are connected with each other,

here the reconstruction error is measured by the F-norm, J is the proposed objective function, r (D, X) _i ,T _i ) Is the cross-projection penalty term of the ith speaker, where D ═ D ₁ ,D ₂ ,...,D _m ,D _c ]Joint dictionary representing training, D _m Is the identity sub-dictionary of the mth speaker, D _c Is a common sub-dictionary; t is _i (i ═ 1, 2.. times, m) represent training samples of clean sound sources; x _i Represents T _i A sparse vector matrix on D;

represents the coefficient X _i And sub-dictionary D _j A corresponding sparse coefficient;

It is noted that in the first term of equation (4), the different source training signals should be sparsely represented by the corresponding identity sub-dictionary and the common sub-dictionary. The punishment item mainly considers the identity sub-dictionary with certain distinguishing capability and the public sub-dictionary containing public information, so that the comprehensiveness of the combined dictionary is ensured. Considering the existence of the problem of "cross projection", the second term constrains the cross projection of one training signal on another identity sub-dictionary. That is, when one source signal is sparsely represented on the joint dictionary, the representations on the other sub-dictionaries should be as few as possible in order to reduce the influence of cross-projection. Taking the example of mixing two source signals, sparseness is defined hereinMatrix E ═ E ₁ ,E ₂ ,E _c ] ^T In which E ₁ 、E ₂ And E _c Is that the mixed signal s is respectively in the identity sub-dictionary D ₁ ，D ₂ And a common sub-dictionary D _c The sparse coefficient of (2). After the mixed signal s is sparsely represented on the trained joint dictionary, each source signal can be recovered by adding a certain proportion of responses on the sub-dictionary and the common sub-dictionary. The source separation module is shown in fig. 2.

In step S1, specifically, the step,

and S11, training by different source training samples to obtain an initial joint dictionary. The embodiment selects 2 speakers T with different genders from a Chinese speech library of an automated research institute of Chinese academy of sciences _i For each speaker, there are 150 speech data samples for each speaker, the sampling frequency of the signal is 16kHz, and 256 sampling points are taken for each frame of signal.

Selecting DCT dictionary as initial dictionary, using single sound source signal sample T _i As training data, training by a K-SVD method to obtain an identity sub-dictionary D _i The dictionary training is of the form:

wherein D _i Is a dictionary with normalized atoms obtained by training, and alpha is T _i In the dictionary D _i The projection coefficients of (a).

Splicing two training samples T ═ T ₁ ,T ₂ ]The DCT dictionary, namely the discrete cosine transform dictionary is used as an initial dictionary, and the common sub-dictionary D is obtained through training by a K-SVD method _c And splicing to obtain an initial joint dictionary D ═ D ₁ ,D ₂ ,D _c ]。

And S12, fixing the initial joint dictionary D to obtain a sparse vector of the training sample on the initial joint dictionary. And when the initial D is fixed, selecting a sparse coding BP algorithm to update to obtain a sparse coding coefficient. Embodiments update the coding coefficients with the following optimization function:

where "1" represents a 1-norm, i.e. the sparse matrix X _i The sum of the absolute values of each column of non-zero elements.

And S13, fixing the current sparse vector, and updating through an optimization function to obtain a joint dictionary. When the coding coefficients are fixed, embodiments update the joint dictionary with the following optimization function:

wherein the content of the first and second substances,

to solve this optimization problem, embodiments introduce a matrix Q _i ,i＝1,2,3，

Where 0 denotes an all-zero matrix and I denotes an identity matrix. Thus, equation (6) can be written as

In the new error function, only the joint dictionary is unknown, and when the joint dictionary is updated, the identity sub-dictionary and the public sub-dictionary can be updated simultaneously. The optimization problem shown in equation (3) can be implemented by a quasi-newton method. The quasi-newton method only requires the gradient of the objective function and does not require information on the second derivative, so such methods tend to perform better than the steepest descent method and newton method. The embodiment selects a limited memory BFGS algorithm (L-BFGS) in a quasi-Newton method to solve the optimization problem.

The approach to solving SCBSS can be translated into the solution equation D × E ═ s, where D is the joint dictionary and s is the mix signal. When a joint dictionary containing common sub-dictionaries is used, the equation can be modified to D 'x E's. s is a mixture of two source speech signals, E ═ E ₁ ，E ₂ ，E _c ]，E ₁ ,E ₂ And E _c Are each D' ₁ ,D' ₂ And D _c Sparse projection coefficients of s. By solving equations

(5) To obtain E ', where K is the sparsity, i.e. the number of non-zero elements of the matrix E'. The BP algorithm is used here to solve this optimization problem.

After the mixed signal is sparsely represented on the trained joint dictionary, the sub-dictionary D can be used ₁ And D ₂

The responses above plus a proportion of the responses above the common sub-dictionary to recover the respective source signals. When E' is obtained, the estimated source can be obtained by the calculation of equation (6)

A speech signal. Where α is a weight vector, which has a large influence on the value of the reconstructed signal. In order to obtain the optimal separation effect, the influence of the weight vector alpha on the performance of the voice separation algorithm is compared through experiments, and the experimental data records are shown in fig. 3 and fig. 4. Figure 3 shows a graph of male-SNR versus weight. It can be concluded that the separation performance is best when a is set to 0.1. While fig. 4 shows the female-SNR versus weight curve, the separation performance is best when 1-alpha is set to 0.9.

Evaluation of experiments

The traditional K-SVD dictionary training algorithm, the DDL method, the CJD method and the new optimization function combined dictionary learning method are adopted for experiments, the voice separation effect is evaluated, and the quality of the separation quality analysis method is improved. And performing experiments on the size of the dictionary influencing the separation effect, and selecting a proper dictionary size from the signal-to-noise ratio analysis of the separated source signal.

In order to compare the performances of various speech separation algorithms and prove the effectiveness of the algorithm provided by the embodiment, the quality of the speech separation performance of different algorithms needs to be measured. The Signal-to-Noise Ratio (SNR) of the Signal and Noise is used as a measure. A higher signal-to-noise ratio indicates less distortion of the separated source signal and better separation performance. Since the dictionary size has a great influence on the algorithm proposed in the embodiment, the numbers of atoms of the sub-dictionaries are set to 384, 512, 640, 768, and 896, respectively, and the frame lengths are set to 256, and the male-SNR, female-SNR, and average SNR at different dictionary sizes are obtained, respectively, and the separation effects thereof are compared. For the convenience of description, the single-channel blind source separation algorithm for optimizing the function learning joint dictionary proposed in the embodiment is referred to as njdl (new optimization method for joint dictionary separation).

Comparison of separation Performance of one and different algorithms

The algorithm proposed in the example was compared with SCBSS based on K-SVD, DDL dictionary learning method and CJD common dictionary construction method. Objectively, the signal-to-noise ratio is solved according to equation (10) to measure the reconstruction effect of the source signal. The separation effect of the SCBSS algorithm proposed by the embodiment is shown in fig. 5, and it can be seen from fig. 5 that the present invention can well separate the source signal from the mixed signal, which is very close to the waveform of the source speech signal.

Wherein s is _i Is the source audio signal of the audio signal,

is s _i Source estimation ofSignals were measured and experimental data are recorded in table 1.

TABLE 1 Signal-to-noise ratio (dB) after separation using different dictionary learning algorithms

SNR	Male sex	Female with a view to preventing the formation of wrinkles	Average out
				K-SVD	5.9214	0.2508	3.0861
DDL	8.0623	2.8803	5.4713
				CJD	7.03	1.7436	4.3868
NJDL	8.4676	3.3059	5.8868

From the experimental results of Table 1, it can be seen that the SCBSS method proposed by the example method is generally superior to the SCBSS methods based on K-SVD, DDL and CJD, regardless of male-SNR, female-SNR or average signal-to-noise ratio. Compared with other algorithms, the embodiment method separates the individual component and the common component of the signal, and effectively reduces source confusion. Meanwhile, the superiority of the learning joint dictionary is verified by the higher signal-to-noise ratio.

Specifically, the SNR of the example method is improved by about 2.5dB for male voice signals and about 3.0dB for female voice signals compared to the K-SVD method. Compared to the method in DDL, the example method improves by about 0.4dB for male voice signals and 0.4dB for female voice signals. Compared to the method in CJD, the example method improves the SNR of male voice signals by about 1.4dB and female voice signals by 1.6 dB. Compared with K-SVD, the average improvement is 2.8dB, compared with DDL, the improvement is 0.4dB, and compared with CJD, the improvement is 1.5 dB. Therefore, compared with other algorithms, the single-channel blind source separation algorithm based on the optimization function learning joint dictionary provided by the embodiment can obviously improve the separation performance.

Second, dictionary size impact on separation Performance

In the following experiments, the separation effect of the example method at different dictionary sizes was examined. The size of the sub-dictionary is set to l × c, and the size of the joint dictionary is l × 3 c. The atomic numbers are set to 384, 512, 640, 768, and 896, respectively, and the frame length sizes are all set to 256. The sizes of the experimental joint dictionary D are 256 × 1152, 256 × 1536, 256 × 1920, 256 × 2304, and 256 × 2688, respectively.

Fig. 6 shows the effect of male and female signal separation at different dictionary sizes. As can be seen in FIG. 6, it shows that the male SNR increases slightly as the sub-dictionary columns increase from 384 to 786. This is mainly because the number of dictionary atoms in the embodiment method is closely related to the separation performance. More dictionary atoms means that more atoms can effectively represent a signal. That is, as the size of the dictionary increases, the dictionary captures two unique and similar source signals more and more completely. As can be seen from the graph, the SNR of the separated signal is highest when the number of atoms in the sub-dictionary is 768. As can be seen from fig. 6, the traces of female SNR and average SNR are consistent with the traces for male SNR. Compared to our 512 dictionary size method at bar 4.2 of the experiment, this method improves the male SNR by about 0.4dB and the female SNR by about 0.5dB at dictionary atom 768. From the results, it can be seen that the best performance of the separation is in the case where the sub-dictionary size is 256 × 768. This means that the separation performance can be improved by selecting an appropriate dictionary size. It can be concluded that it is more important to choose a suitable dictionary size to achieve an optimal separation performance, while not negligible is that an increase in the number of dictionary atoms also leads to more time expenditure.

The above experimental results show that: compared with other algorithms, the single-channel blind source separation algorithm of the learning joint dictionary based on the new optimization function can effectively reduce the problem of cross projection and improve the voice separation effect. Moreover, the reconstruction effect of the source speech is improved by selecting a proper dictionary size.

Claims

1. A single-channel blind source separation method is characterized in that: comprises the following steps of (a) carrying out,

s11, training by different source training samples to obtain an initial joint dictionary; in step S11, the initial identity sub-dictionary D is obtained _i And an initial common sub-dictionary D _c The initial joint dictionary of (2) is specifically,

splicing two training samples T ═ T ₁ ,T ₂ ]And taking the DCT dictionary as an initial dictionary, and training by a K-SVD method to obtain a common sub-dictionary D _c And splicing to obtain an initial joint dictionary D ═ D ₁ ,D ₂ ,D _c ]；

in step S12, when the initial joint dictionary D is fixed, a sparse coding BP algorithm is selected to update to obtain a sparse coding coefficient, and the coding coefficient is updated using the following optimization function:

min||X _i || _l ，

subject to T _i ＝DX _i (2)

where "1" represents a 1-norm, i.e. the sparse matrix X _i The sum of the absolute values of each column of non-zero elements of (a);

s13, fixing the current sparse vector, and updating through an optimization function to obtain a joint dictionary;

in step S13, the optimization function is:

here the reconstruction error is measured by the F-norm, J is the proposed objective function, r (D, X) _i ,T _i ) Is the cross-projection penalty term of the ith speaker, where D ═ D in equation (4) ₁ ,D ₂ ,...,D _m ,D _c ]Joint dictionary representing training, D _m Is the identity sub-dictionary of the mth speaker, D _c Is a common sub-dictionary; t is a unit of _i A training sample representing a clean sound source, wherein i ═ 1, 2.., m; x _i Represents T _i A sparse vector matrix on D;

represents the coefficient X _i And sub-dictionary D _c A corresponding sparse coefficient; alpha is alpha _i Is a weight vector;

s2, separating the language signals, and solving sparse projection coefficients of the mixed signals under the joint dictionary by adopting a sparse coding algorithm; and recovering each source signal according to the sub-dictionaries and the sparse vectors.

2. The single channel blind source separation method of claim 1, characterized by: in step S2, the sparse coding algorithm is used to solve the sparse projection coefficient of the mixed signal under the joint dictionary,

when a joint dictionary containing a common sub-dictionary is used, the equation is D ' x E's, s is the mixed signal of the two source speech signals, E ' E ₁ ，E ₂ ，E _c ]，E ₁ ,E ₂ And E _c Are each D' ₁ ,D' ₂ And D _c The sparse projection coefficient of s above, the sparse projection coefficient E' is obtained by solving the following equation:

s.t.||E'|| ₀ ≤K (5)

where K is the sparsity, i.e., the number of non-zero elements of matrix E'.

3. The single channel blind source separation method of claim 1, characterized by: in step S2, each source signal is recovered from the sub-dictionary and the sparse vector, specifically,

after the mixed signal is sparsely represented on the trained joint dictionary, the sub-dictionary D is used ₁ And D ₂ Response on plus common sub-dictionary D _c Restoring each source signal according to the response of a certain proportion, and calculating to obtain an estimated source audio signal by the formula (5) when obtaining a sparse coding matrix E

Where α is a weight vector.

4. The single channel blind source separation method of claim 3, characterized by: α is set to 0.1 and 1- α is set to 0.9.