CN108875824B - Single-channel blind source separation method - Google Patents

Single-channel blind source separation method Download PDF

Info

Publication number
CN108875824B
CN108875824B CN201810599522.6A CN201810599522A CN108875824B CN 108875824 B CN108875824 B CN 108875824B CN 201810599522 A CN201810599522 A CN 201810599522A CN 108875824 B CN108875824 B CN 108875824B
Authority
CN
China
Prior art keywords
dictionary
sub
sparse
initial
joint
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810599522.6A
Other languages
Chinese (zh)
Other versions
CN108875824A (en
Inventor
孙林慧
谢可丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN201810599522.6A priority Critical patent/CN108875824B/en
Publication of CN108875824A publication Critical patent/CN108875824A/en
Application granted granted Critical
Publication of CN108875824B publication Critical patent/CN108875824B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/28Determining representative reference patterns, e.g. by averaging or distorting; Generating dictionaries

Abstract

The invention provides a single-channel blind source separation method, which gives training samples T of different source audio signals in a training stage i Obtaining the corresponding initial identity sub-dictionary D i And an initial common sub-dictionary D c Thereby obtaining a sub-dictionary D including the initial identity i And an initial public sonDictionary D c The joint dictionary is updated by adopting an optimization function iteration; solving a sparse projection coefficient of the mixed signal under the joint dictionary by adopting a sparse coding algorithm; recovering each source signal according to the sub-dictionaries and the sparse vectors; aiming at the problem of 'cross projection' caused by weak distinguishing capability of a joint dictionary, the invention adopts the joint dictionary with high distinguishing capability, and compared with other single-channel blind source separation algorithms based on sparse representation, the invention reduces source interference and obviously improves the separation quality.

Description

Single-channel blind source separation method
Technical Field
The invention relates to a single-channel blind source separation method.
Background
Single-channel blind source separation (SCBSS) refers to a process of recovering a multi-dimensional source signal from a one-dimensional mixed signal. Compared with common blind source separation, single-channel blind source separation only has one path of observation signals, which is a morbid problem and is extremely difficult to solve. However, the single-channel blind source separation only needs to use one sensor to receive signals, and the system is relatively simple and low in cost. Compared with the traditional blind source separation algorithm, the single-channel blind source separation method has wider applicability in the real field and has wide application prospect and practical significance in the fields of biomedical signal processing, array signal processing, voice recognition, image processing, communication and the like.
The process of acquiring the sparse representation or sparse approximation of the signal under the over-complete dictionary is called sparse decomposition of the signal, and the concise representation of the signal can be obtained through the sparse decomposition of the signal. Sparse representations of signals have been applied to many aspects of signal processing, especially in the problems of compressed sensing of speech signals, underdetermined blind source separation, and the like. Sparse representation theory is currently one of the most popular techniques of SCBSS, and mainly includes dictionary training and separation of two parts. The construction of the sparse combined dictionary is the most important link of single-channel blind source separation based on sparse decomposition, and is also the key point of the research of the invention, because the construction directly influences the quality of a separation signal. That is, the large space expressed by the mixed signal is composed of a plurality of subspaces, each subspace can express the signal of a certain source to the greatest extent, and then the signal of a single source can be reconstructed through the sparse coefficient and the basis of the corresponding subspace, so that signal separation is realized. Typical training dictionary methods include non-Negative Matrix Factorization (NMF), K-SVD, and the like. The dictionary learning method is to learn the corresponding sub-dictionaries by utilizing training samples of specific source signals, namely, the dictionary of each source is independently trained, so that the dictionaries have certain distinctiveness. The separation quality of single-channel blind source separation based on sparse representation depends on the discriminativity of the joint dictionary. If the united dictionary is not highly distinguishable, the separation effect is poor. This means that in addition to the expectation that each dictionary represents the corresponding source well, we also expect some distinctiveness between sub-dictionaries and other sub-dictionaries.
The discriminative dictionary learning theory is widely applied in the fields of Pattern Classification (PC) and Face Recognition (FR), and has achieved great success. Bao et al propose a Distinguishable Dictionary Learning (DDL) algorithm, which reduces cross coherence among dictionaries by optimizing functions to achieve the purpose that a joint dictionary has a distinguishing capability. Pearlcuter et al adopted l 1 The norm optimization algorithm comprises the steps of obtaining a sparse dictionary of a speaker through training, combining the dictionaries obtained through training to obtain a mixed dictionary, and finally obtaining a mixed dictionary through l 1 And (4) solving a sparse projection vector of each source speech signal by a norm optimization algorithm, and reconstructing to obtain a separated speech signal. In a recently proposed joint dictionary (CJD) for SCBSSs, we first learn an identity sub-dictionary using source speech signals corresponding to each speaker. Then we discard similar atoms between two identical sub-dictionaries and use these similar atoms to construct a common sub-dictionary.
The above methods all achieve a certain separation effect, but they do not utilize the relationship between different source signals to suppress the similarity between atoms of different sub-dictionaries. In practice, there are always some similar components between different source signals, which reduces the distinguishing capability of different identity sub-dictionaries and creates the problem of "cross projection" caused by mutual interference between dictionaries. That is, when the mixed speech signal is represented under the joint dictionary, the signal of a certain source will respond to the sub-dictionary corresponding to other sources, resulting in poor separation.
Disclosure of Invention
The invention aims to provide a single-channel blind source separation method to solve the problem of 'cross projection' caused by weak distinguishing capability of a sparse joint dictionary in the prior art.
The technical solution of the invention is as follows:
a single-channel blind source separation method comprises the following steps,
s1, training stage, giving training samples T of different source audio signals i Obtaining the corresponding initial identity sub-dictionary D i And an initial common sub-dictionary D c Thereby obtaining a sub-dictionary D including the initial identity i And an initial common sub-dictionary D c The initial joint dictionary adopts an optimization function to update the joint dictionary in an iterative manner;
s2, separating the language signals, and solving sparse projection coefficients of the mixed signals under the joint dictionary by adopting a sparse coding algorithm; and recovering each source signal according to the sub-dictionary and the sparse vector.
Further, in step S1, specifically,
s11, training by different source training samples to obtain an initial joint dictionary;
s12, fixing the initial joint dictionary D to obtain a sparse vector of the training sample on the initial joint dictionary;
and S13, fixing the current sparse vector, and updating through an optimization function to obtain the joint dictionary.
Further, in step S11, the initial identity sub-dictionary D is obtained i And an initial common sub-dictionary D c The initial joint dictionary of (a) is specifically,
training by a K-SVD method to obtain an identity sub-dictionary D i The dictionary training is of the form:
Figure GDA0003772145150000031
wherein D i Is a dictionary with normalized atoms obtained by training, and alpha is T i In dictionary D i The projection coefficients of (a);
splicing two training samples T ═ T 1 ,T 2 ]Taking a DCT dictionary as an initial dictionary and training by a K-SVD methodGet a common sub-dictionary D c And splicing to obtain an initial joint dictionary D ═ D 1 ,D 2 ,D c ]。
Further, in step S12, when the initial joint dictionary D is fixed, a sparse coding BP algorithm is selected to update to obtain a sparse coding coefficient, and the coding coefficient is updated by using the following optimization function:
Figure GDA0003772145150000032
where "1" represents a 1-norm, i.e. the sparse matrix X i Of each column of non-zero elements.
Further, in step S13, the optimization function is:
Figure GDA0003772145150000033
wherein the content of the first and second substances,
Figure GDA0003772145150000034
here the reconstruction error is measured by the F-norm, J is the proposed objective function, r (D, X) i ,T i ) Is the cross-projection penalty term of the ith speaker, where D ═ D in equation (2) 1 ,D 2 ,…,D m ,D c ]Joint dictionary representing training, D m Is the identity sub-dictionary of the mth speaker, D c Is a common sub-dictionary; t is a unit of i (i ═ 1,2, …, m) represents training samples of clean sound sources; x i Represents T i A sparse vector matrix at D;
Figure GDA0003772145150000035
represents the coefficient X i And sub-dictionary D i A corresponding sparse coefficient;
Figure GDA0003772145150000036
represents the coefficient X i And seedDictionary D j A corresponding sparse coefficient;
Figure GDA0003772145150000037
represents the coefficient X i And sub-dictionary D c A corresponding sparse coefficient; alpha is alpha i Is a weight vector.
Further, in step S2, the sparse projection coefficient of the mixed signal under the joint dictionary is solved by using the sparse coding algorithm, specifically,
when a joint dictionary is used that contains a common sub-dictionary, the equation is D ' x E's, s being the mixture of the two source speech signals, E ' E 1 ,E 2 ,E c ],E 1 ,E 2 And E c Are each D' 1 ,D' 2 And D c The sparse projection coefficient of s above, the sparse projection coefficient E' is obtained by solving the following equation:
Figure GDA0003772145150000041
where K is the sparsity, i.e., the number of non-zero elements of matrix E'.
Further, in step S2, restoring each source signal according to the sub-dictionary and the sparse vector is specifically,
after the mixed signal is sparsely represented on the trained joint dictionary, the sub-dictionary D is used 1 And D 2 Response on plus common sub-dictionary D c Restoring each source signal according to the response of a certain proportion, and calculating to obtain an estimated source audio signal by the formula (6) when obtaining a sparse coding matrix E
Figure GDA0003772145150000042
Where α is a weight vector.
Further, α is set to 0.1, and 1- α is set to 0.9.
The invention has the beneficial effects that: compared with the traditional dictionary learning method, the single-channel blind source separation method makes full use of the characteristics of the voice signals, starts from the commonness and difference of different source signals, and constructs an optimization function to inhibit the non-corresponding part in the sparse representation coefficient. Different components of the sound source are projected on the corresponding identity sub-dictionary as much as possible, and similar components of the sound source are projected on the common sub-dictionary, so that the phenomenon of 'cross projection' is weakened, and signals can be better separated. Aiming at the problem of 'cross projection' caused by weak distinguishing capability of a joint dictionary, the invention adopts the joint dictionary with high distinguishing capability, and compared with other single-channel blind source separation algorithms based on sparse representation, the invention reduces source interference and obviously improves the separation quality.
Drawings
Fig. 1 is an explanatory block diagram of the single-channel blind source separation method of the present invention.
Fig. 2 is an explanatory diagram of source signal separation in the embodiment.
Fig. 3 is a schematic diagram of the variation of the separation effect with the weight vector α in the embodiment.
Fig. 4 is a schematic diagram of the variation of the separation effect with the weight vector 1-alpha in the embodiment.
FIG. 5 is a diagram of separated speech in an embodiment.
FIG. 6 is a diagram showing the variation of the separation effect with the number of sub-dictionary atoms in the example.
Detailed Description
Preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
Examples
According to the single-channel blind source separation method, a public sub-dictionary containing similar information is introduced, the relationship between the sub-dictionaries is constrained by constructing a new optimization function, and the difference of the sub-dictionaries is further improved. The unique components of each speaker sample are sparsely represented by the identity sub-dictionary corresponding to the speaker sample as much as possible, while the components with high correlation are sparsely represented by the public sub-dictionary as much as possible, and when the source signal is represented by the joint dictionary, the problem of cross projection can be effectively avoided, so that the quality of voice separation is improved.
A single-channel blind source separation method, as shown in fig. 1, comprising the following steps:
s1, training phase. Given different source speech signal training samples T i Obtaining the corresponding initial identity sub-dictionary D i And an initial common sub-dictionary D c Thereby obtaining a sub-dictionary D including the initial identity i And an initial common sub-dictionary D c The joint dictionary adopts an optimization function to update the joint dictionary in an iterative manner.
The proposal of a new optimization function of the embodiment. Compared with the existing single-channel blind source separation algorithm based on dictionary learning, the embodiment method divides the self characteristics of each source signal into two parts, wherein one part is the unique characteristics of different speaker samples, and the other part is the common characteristics among different speaker samples. Wherein the unique features of the speaker samples are independent of each other. Accordingly, for different speakers, the joint dictionary includes three parts: a respective sub-dictionary of speakers and a common sub-dictionary used to represent the similarity of different speakers.
In order to construct a joint dictionary with strong distinctiveness, the difference of the identity sub-dictionaries is further improved. In the training process, the embodiment method restricts the relation between the sub-dictionaries by constructing a proper objective function, so that the unique components of each speaker sample are sparsely represented by the sub-dictionaries corresponding to the unique components as much as possible, and the components with high correlation are sparsely represented by the common sub-dictionary as much as possible.
Here, in order not to lose generality, discussion is developed based on the case of m speakers. Suppose T i Is the training sample of the ith source signal, which corresponds to the sub-dictionary D i The corresponding sparse representation coefficient is X ═ X 1 ,X 2 ,...,X m ,X c ] T . In order to improve the separation performance of the algorithm, the cross-projections between the source signals must be suppressed. To this end, a new objective function is proposed, as shown in equation (1):
Figure GDA0003772145150000061
wherein, the first and the second end of the pipe are connected with each other,
Figure GDA0003772145150000062
here the reconstruction error is measured by the F-norm, J is the proposed objective function, r (D, X) i ,T i ) Is the cross-projection penalty term of the ith speaker, where D ═ D 1 ,D 2 ,...,D m ,D c ]Joint dictionary representing training, D m Is the identity sub-dictionary of the mth speaker, D c Is a common sub-dictionary; t is i (i ═ 1, 2.. times, m) represent training samples of clean sound sources; x i Represents T i A sparse vector matrix on D;
Figure GDA0003772145150000065
represents the coefficient X i And sub-dictionary D i A corresponding sparse coefficient;
Figure GDA0003772145150000063
represents the coefficient X i And sub-dictionary D j A corresponding sparse coefficient;
Figure GDA0003772145150000064
represents the coefficient X i And sub-dictionary D c A corresponding sparse coefficient; alpha is alpha i Is a weight vector.
It is noted that in the first term of equation (4), the different source training signals should be sparsely represented by the corresponding identity sub-dictionary and the common sub-dictionary. The punishment item mainly considers the identity sub-dictionary with certain distinguishing capability and the public sub-dictionary containing public information, so that the comprehensiveness of the combined dictionary is ensured. Considering the existence of the problem of "cross projection", the second term constrains the cross projection of one training signal on another identity sub-dictionary. That is, when one source signal is sparsely represented on the joint dictionary, the representations on the other sub-dictionaries should be as few as possible in order to reduce the influence of cross-projection. Taking the example of mixing two source signals, sparseness is defined hereinMatrix E ═ E 1 ,E 2 ,E c ] T In which E 1 、E 2 And E c Is that the mixed signal s is respectively in the identity sub-dictionary D 1 ,D 2 And a common sub-dictionary D c The sparse coefficient of (2). After the mixed signal s is sparsely represented on the trained joint dictionary, each source signal can be recovered by adding a certain proportion of responses on the sub-dictionary and the common sub-dictionary. The source separation module is shown in fig. 2.
In step S1, specifically, the step,
and S11, training by different source training samples to obtain an initial joint dictionary. The embodiment selects 2 speakers T with different genders from a Chinese speech library of an automated research institute of Chinese academy of sciences i For each speaker, there are 150 speech data samples for each speaker, the sampling frequency of the signal is 16kHz, and 256 sampling points are taken for each frame of signal.
Selecting DCT dictionary as initial dictionary, using single sound source signal sample T i As training data, training by a K-SVD method to obtain an identity sub-dictionary D i The dictionary training is of the form:
Figure GDA0003772145150000071
wherein D i Is a dictionary with normalized atoms obtained by training, and alpha is T i In the dictionary D i The projection coefficients of (a).
Splicing two training samples T ═ T 1 ,T 2 ]The DCT dictionary, namely the discrete cosine transform dictionary is used as an initial dictionary, and the common sub-dictionary D is obtained through training by a K-SVD method c And splicing to obtain an initial joint dictionary D ═ D 1 ,D 2 ,D c ]。
And S12, fixing the initial joint dictionary D to obtain a sparse vector of the training sample on the initial joint dictionary. And when the initial D is fixed, selecting a sparse coding BP algorithm to update to obtain a sparse coding coefficient. Embodiments update the coding coefficients with the following optimization function:
Figure GDA0003772145150000072
where "1" represents a 1-norm, i.e. the sparse matrix X i The sum of the absolute values of each column of non-zero elements.
And S13, fixing the current sparse vector, and updating through an optimization function to obtain a joint dictionary. When the coding coefficients are fixed, embodiments update the joint dictionary with the following optimization function:
Figure GDA0003772145150000073
wherein the content of the first and second substances,
Figure GDA0003772145150000074
to solve this optimization problem, embodiments introduce a matrix Q i ,i=1,2,3,
Figure GDA0003772145150000075
Figure GDA0003772145150000081
Where 0 denotes an all-zero matrix and I denotes an identity matrix. Thus, equation (6) can be written as
Figure GDA0003772145150000082
In the new error function, only the joint dictionary is unknown, and when the joint dictionary is updated, the identity sub-dictionary and the public sub-dictionary can be updated simultaneously. The optimization problem shown in equation (3) can be implemented by a quasi-newton method. The quasi-newton method only requires the gradient of the objective function and does not require information on the second derivative, so such methods tend to perform better than the steepest descent method and newton method. The embodiment selects a limited memory BFGS algorithm (L-BFGS) in a quasi-Newton method to solve the optimization problem.
S2, separating the language signals, and solving sparse projection coefficients of the mixed signals under the joint dictionary by adopting a sparse coding algorithm; and recovering each source signal according to the sub-dictionary and the sparse vector.
The approach to solving SCBSS can be translated into the solution equation D × E ═ s, where D is the joint dictionary and s is the mix signal. When a joint dictionary containing common sub-dictionaries is used, the equation can be modified to D 'x E's. s is a mixture of two source speech signals, E ═ E 1 ,E 2 ,E c ],E 1 ,E 2 And E c Are each D' 1 ,D' 2 And D c Sparse projection coefficients of s. By solving equations
Figure GDA0003772145150000085
(5) To obtain E ', where K is the sparsity, i.e. the number of non-zero elements of the matrix E'. The BP algorithm is used here to solve this optimization problem.
After the mixed signal is sparsely represented on the trained joint dictionary, the sub-dictionary D can be used 1 And D 2
The responses above plus a proportion of the responses above the common sub-dictionary to recover the respective source signals. When E' is obtained, the estimated source can be obtained by the calculation of equation (6)
Figure GDA0003772145150000084
A speech signal. Where α is a weight vector, which has a large influence on the value of the reconstructed signal. In order to obtain the optimal separation effect, the influence of the weight vector alpha on the performance of the voice separation algorithm is compared through experiments, and the experimental data records are shown in fig. 3 and fig. 4. Figure 3 shows a graph of male-SNR versus weight. It can be concluded that the separation performance is best when a is set to 0.1. While fig. 4 shows the female-SNR versus weight curve, the separation performance is best when 1-alpha is set to 0.9.
Evaluation of experiments
The traditional K-SVD dictionary training algorithm, the DDL method, the CJD method and the new optimization function combined dictionary learning method are adopted for experiments, the voice separation effect is evaluated, and the quality of the separation quality analysis method is improved. And performing experiments on the size of the dictionary influencing the separation effect, and selecting a proper dictionary size from the signal-to-noise ratio analysis of the separated source signal.
In order to compare the performances of various speech separation algorithms and prove the effectiveness of the algorithm provided by the embodiment, the quality of the speech separation performance of different algorithms needs to be measured. The Signal-to-Noise Ratio (SNR) of the Signal and Noise is used as a measure. A higher signal-to-noise ratio indicates less distortion of the separated source signal and better separation performance. Since the dictionary size has a great influence on the algorithm proposed in the embodiment, the numbers of atoms of the sub-dictionaries are set to 384, 512, 640, 768, and 896, respectively, and the frame lengths are set to 256, and the male-SNR, female-SNR, and average SNR at different dictionary sizes are obtained, respectively, and the separation effects thereof are compared. For the convenience of description, the single-channel blind source separation algorithm for optimizing the function learning joint dictionary proposed in the embodiment is referred to as njdl (new optimization method for joint dictionary separation).
Comparison of separation Performance of one and different algorithms
The algorithm proposed in the example was compared with SCBSS based on K-SVD, DDL dictionary learning method and CJD common dictionary construction method. Objectively, the signal-to-noise ratio is solved according to equation (10) to measure the reconstruction effect of the source signal. The separation effect of the SCBSS algorithm proposed by the embodiment is shown in fig. 5, and it can be seen from fig. 5 that the present invention can well separate the source signal from the mixed signal, which is very close to the waveform of the source speech signal.
Figure GDA0003772145150000091
Wherein s is i Is the source audio signal of the audio signal,
Figure GDA0003772145150000092
is s i Source estimation ofSignals were measured and experimental data are recorded in table 1.
TABLE 1 Signal-to-noise ratio (dB) after separation using different dictionary learning algorithms
SNR Male sex Female with a view to preventing the formation of wrinkles Average out
K-SVD 5.9214 0.2508 3.0861
DDL 8.0623 2.8803 5.4713
CJD 7.03 1.7436 4.3868
NJDL 8.4676 3.3059 5.8868
From the experimental results of Table 1, it can be seen that the SCBSS method proposed by the example method is generally superior to the SCBSS methods based on K-SVD, DDL and CJD, regardless of male-SNR, female-SNR or average signal-to-noise ratio. Compared with other algorithms, the embodiment method separates the individual component and the common component of the signal, and effectively reduces source confusion. Meanwhile, the superiority of the learning joint dictionary is verified by the higher signal-to-noise ratio.
Specifically, the SNR of the example method is improved by about 2.5dB for male voice signals and about 3.0dB for female voice signals compared to the K-SVD method. Compared to the method in DDL, the example method improves by about 0.4dB for male voice signals and 0.4dB for female voice signals. Compared to the method in CJD, the example method improves the SNR of male voice signals by about 1.4dB and female voice signals by 1.6 dB. Compared with K-SVD, the average improvement is 2.8dB, compared with DDL, the improvement is 0.4dB, and compared with CJD, the improvement is 1.5 dB. Therefore, compared with other algorithms, the single-channel blind source separation algorithm based on the optimization function learning joint dictionary provided by the embodiment can obviously improve the separation performance.
Second, dictionary size impact on separation Performance
In the following experiments, the separation effect of the example method at different dictionary sizes was examined. The size of the sub-dictionary is set to l × c, and the size of the joint dictionary is l × 3 c. The atomic numbers are set to 384, 512, 640, 768, and 896, respectively, and the frame length sizes are all set to 256. The sizes of the experimental joint dictionary D are 256 × 1152, 256 × 1536, 256 × 1920, 256 × 2304, and 256 × 2688, respectively.
Fig. 6 shows the effect of male and female signal separation at different dictionary sizes. As can be seen in FIG. 6, it shows that the male SNR increases slightly as the sub-dictionary columns increase from 384 to 786. This is mainly because the number of dictionary atoms in the embodiment method is closely related to the separation performance. More dictionary atoms means that more atoms can effectively represent a signal. That is, as the size of the dictionary increases, the dictionary captures two unique and similar source signals more and more completely. As can be seen from the graph, the SNR of the separated signal is highest when the number of atoms in the sub-dictionary is 768. As can be seen from fig. 6, the traces of female SNR and average SNR are consistent with the traces for male SNR. Compared to our 512 dictionary size method at bar 4.2 of the experiment, this method improves the male SNR by about 0.4dB and the female SNR by about 0.5dB at dictionary atom 768. From the results, it can be seen that the best performance of the separation is in the case where the sub-dictionary size is 256 × 768. This means that the separation performance can be improved by selecting an appropriate dictionary size. It can be concluded that it is more important to choose a suitable dictionary size to achieve an optimal separation performance, while not negligible is that an increase in the number of dictionary atoms also leads to more time expenditure.
The above experimental results show that: compared with other algorithms, the single-channel blind source separation algorithm of the learning joint dictionary based on the new optimization function can effectively reduce the problem of cross projection and improve the voice separation effect. Moreover, the reconstruction effect of the source speech is improved by selecting a proper dictionary size.

Claims (4)

1. A single-channel blind source separation method is characterized in that: comprises the following steps of (a) carrying out,
s1, training stage, giving training samples T of different source audio signals i Obtaining the corresponding initial identity sub-dictionary D i And an initial common sub-dictionary D c Thereby obtaining a sub-dictionary D including the initial identity i And an initial common sub-dictionary D c The initial joint dictionary adopts an optimization function to update the joint dictionary in an iterative manner;
s11, training by different source training samples to obtain an initial joint dictionary; in step S11, the initial identity sub-dictionary D is obtained i And an initial common sub-dictionary D c The initial joint dictionary of (2) is specifically,
training by a K-SVD method to obtain an identity sub-dictionary D i The dictionary training is of the form:
Figure FDA0003772145140000011
wherein D i Is a dictionary with normalized atoms obtained by training, and alpha is T i In dictionary D i The projection coefficients of (a);
splicing two training samples T ═ T 1 ,T 2 ]And taking the DCT dictionary as an initial dictionary, and training by a K-SVD method to obtain a common sub-dictionary D c And splicing to obtain an initial joint dictionary D ═ D 1 ,D 2 ,D c ];
S12, fixing the initial joint dictionary D to obtain a sparse vector of the training sample on the initial joint dictionary;
in step S12, when the initial joint dictionary D is fixed, a sparse coding BP algorithm is selected to update to obtain a sparse coding coefficient, and the coding coefficient is updated using the following optimization function:
min||X i || l
subject to T i =DX i (2)
where "1" represents a 1-norm, i.e. the sparse matrix X i The sum of the absolute values of each column of non-zero elements of (a);
s13, fixing the current sparse vector, and updating through an optimization function to obtain a joint dictionary;
in step S13, the optimization function is:
Figure FDA0003772145140000012
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003772145140000021
here the reconstruction error is measured by the F-norm, J is the proposed objective function, r (D, X) i ,T i ) Is the cross-projection penalty term of the ith speaker, where D ═ D in equation (4) 1 ,D 2 ,...,D m ,D c ]Joint dictionary representing training, D m Is the identity sub-dictionary of the mth speaker, D c Is a common sub-dictionary; t is a unit of i A training sample representing a clean sound source, wherein i ═ 1, 2.., m; x i Represents T i A sparse vector matrix on D;
Figure FDA0003772145140000022
represents the coefficient X i And sub-dictionary D i A corresponding sparse coefficient;
Figure FDA0003772145140000023
represents the coefficient X i And sub-dictionary D j A corresponding sparse coefficient;
Figure FDA0003772145140000024
represents the coefficient X i And sub-dictionary D c A corresponding sparse coefficient; alpha is alpha i Is a weight vector;
s2, separating the language signals, and solving sparse projection coefficients of the mixed signals under the joint dictionary by adopting a sparse coding algorithm; and recovering each source signal according to the sub-dictionaries and the sparse vectors.
2. The single channel blind source separation method of claim 1, characterized by: in step S2, the sparse coding algorithm is used to solve the sparse projection coefficient of the mixed signal under the joint dictionary,
when a joint dictionary containing a common sub-dictionary is used, the equation is D ' x E's, s is the mixed signal of the two source speech signals, E ' E 1 ,E 2 ,E c ],E 1 ,E 2 And E c Are each D' 1 ,D' 2 And D c The sparse projection coefficient of s above, the sparse projection coefficient E' is obtained by solving the following equation:
Figure FDA0003772145140000025
s.t.||E'|| 0 ≤K (5)
where K is the sparsity, i.e., the number of non-zero elements of matrix E'.
3. The single channel blind source separation method of claim 1, characterized by: in step S2, each source signal is recovered from the sub-dictionary and the sparse vector, specifically,
after the mixed signal is sparsely represented on the trained joint dictionary, the sub-dictionary D is used 1 And D 2 Response on plus common sub-dictionary D c Restoring each source signal according to the response of a certain proportion, and calculating to obtain an estimated source audio signal by the formula (5) when obtaining a sparse coding matrix E
Figure FDA0003772145140000026
Where α is a weight vector.
4. The single channel blind source separation method of claim 3, characterized by: α is set to 0.1 and 1- α is set to 0.9.
CN201810599522.6A 2018-06-11 2018-06-11 Single-channel blind source separation method Active CN108875824B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810599522.6A CN108875824B (en) 2018-06-11 2018-06-11 Single-channel blind source separation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810599522.6A CN108875824B (en) 2018-06-11 2018-06-11 Single-channel blind source separation method

Publications (2)

Publication Number Publication Date
CN108875824A CN108875824A (en) 2018-11-23
CN108875824B true CN108875824B (en) 2022-09-27

Family

ID=64337993

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810599522.6A Active CN108875824B (en) 2018-06-11 2018-06-11 Single-channel blind source separation method

Country Status (1)

Country Link
CN (1) CN108875824B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110544482B (en) * 2019-09-09 2021-11-12 北京中科智极科技有限公司 Single-channel voice separation system
CN112329855B (en) * 2020-11-05 2023-06-02 华侨大学 Underdetermined working mode parameter identification method and detection method based on self-adaptive dictionary

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102288285B (en) * 2011-05-24 2012-11-28 南京航空航天大学 Blind source separation method for single-channel vibration signals
CN104378320A (en) * 2014-11-13 2015-02-25 中国人民解放军总参谋部第六十三研究所 Anti-interference communication method and receiving device based on single-channel blind source separation
CN107024352A (en) * 2017-05-03 2017-08-08 哈尔滨理工大学 A kind of Rolling Bearing Fault Character extracting method based on slip entropy ICA algorithm

Also Published As

Publication number Publication date
CN108875824A (en) 2018-11-23

Similar Documents

Publication Publication Date Title
US8751227B2 (en) Acoustic model learning device and speech recognition device
Kwon et al. Phoneme recognition using ICA-based feature extraction and transformation
CN108962229B (en) Single-channel and unsupervised target speaker voice extraction method
Bao et al. A compressed sensing approach to blind separation of speech mixture based on a two-layer sparsity model
US20140236593A1 (en) Speaker recognition method through emotional model synthesis based on neighbors preserving principle
Ozerov et al. Uncertainty-based learning of acoustic models from noisy data
CN108875824B (en) Single-channel blind source separation method
Delcroix et al. Speech recognition in living rooms: Integrated speech enhancement and recognition system based on spatial, spectral and temporal modeling of sounds
Moritz et al. Multi-channel speech enhancement and amplitude modulation analysis for noise robust automatic speech recognition
Nesta et al. Robust Automatic Speech Recognition through On-line Semi Blind Signal Extraction
CN112992172A (en) Single-channel time domain bird song separating method based on attention mechanism
CN110265039B (en) Speaker recognition method based on dictionary learning and low-rank matrix decomposition
Şimşekli et al. Non-negative tensor factorization models for Bayesian audio processing
CN112037813B (en) Voice extraction method for high-power target signal
Liu et al. Use of bimodal coherence to resolve the permutation problem in convolutive BSS
Koldovský et al. Performance analysis of source image estimators in blind source separation
Nesta et al. Audio/video supervised independent vector analysis through multimodal pilot dependent components
He et al. Spectrum enhancement with sparse coding for robust speech recognition
JP6910609B2 (en) Signal analyzers, methods, and programs
Arberet et al. A tractable framework for estimating and combining spectral source models for audio source separation
Shahnawazuddin et al. Sparse coding over redundant dictionaries for fast adaptation of speech recognition system
CN109727219A (en) A kind of image de-noising method and system based on image sparse expression
Hamaidi et al. Multi-speaker voice activity detection by an improved multiplicative non-negative independent component analysis with sparseness constraints
Chowdhury et al. Speech enhancement using k-sparse autoencoder techniques
Wang et al. Real-Time Independent Vector Analysis Using Semi-Supervised Nonnegative Matrix Factorization as a Source Model.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant