CN108875824A - Single channel blind source separation method - Google Patents

Single channel blind source separation method Download PDF

Info

Publication number
CN108875824A
CN108875824A CN201810599522.6A CN201810599522A CN108875824A CN 108875824 A CN108875824 A CN 108875824A CN 201810599522 A CN201810599522 A CN 201810599522A CN 108875824 A CN108875824 A CN 108875824A
Authority
CN
China
Prior art keywords
dictionary
sub
sparse
joint
initial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810599522.6A
Other languages
Chinese (zh)
Other versions
CN108875824B (en
Inventor
孙林慧
谢可丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Post and Telecommunication University
Original Assignee
Nanjing Post and Telecommunication University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Post and Telecommunication University filed Critical Nanjing Post and Telecommunication University
Priority to CN201810599522.6A priority Critical patent/CN108875824B/en
Publication of CN108875824A publication Critical patent/CN108875824A/en
Application granted granted Critical
Publication of CN108875824B publication Critical patent/CN108875824B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/28Determining representative reference patterns, e.g. by averaging or distorting; Generating dictionaries

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The present invention provides a kind of single channel blind source separation method, by the training stage, gives not homologous voice signal training sample Ti, acquire the corresponding sub- dictionary D of initial identityiWith initial public sub- dictionary Dc, so that obtaining includes the sub- dictionary D of initial identityiWith initial public sub- dictionary DcJoint dictionary, using majorized function iteration update joint dictionary;Sparse projection coefficient of the mixed signal in the case where combining dictionary is solved using sparse coding algorithm;Restore each source signal according to sub- dictionary and sparse vector;For " cross projection " problem of joint dictionary separating capacity not strong production, the present invention uses the joint dictionary of high distinction, and the single channel blind source separation algorithm compared to others based on rarefaction representation reduces source interference, and disintegrate-quality is significantly improved.

Description

Single-channel blind source separation method
Technical Field
The invention relates to a single-channel blind source separation method.
Background
Single-channel blind source separation (SCBSS) refers to a process of recovering a multi-dimensional source signal from a one-dimensional mixed signal. Compared with common blind source separation, single-channel blind source separation only has one path of observation signals, which is a pathological problem and is extremely difficult to solve. However, the single-channel blind source separation only needs to use one sensor to receive signals, and the system is relatively simple and low in cost. Compared with the traditional blind source separation algorithm, the single-channel blind source separation method has wider applicability in the real field and has wide application prospect and practical significance in the fields of biomedical signal processing, array signal processing, voice recognition, image processing, communication and the like.
The process of acquiring the sparse representation or sparse approximation of the signal under the over-complete dictionary is called sparse decomposition of the signal, and the concise representation of the signal can be obtained through the sparse decomposition of the signal. Sparse representations of signals have been applied to many aspects of signal processing, especially in the problems of compressed sensing of speech signals, underdetermined blind source separation, and the like. Sparse representation theory is currently one of the most popular techniques of SCBSS, and mainly includes dictionary training and separation of two parts. The construction of the sparse combined dictionary is the most important link of single-channel blind source separation based on sparse decomposition, and is also the key point of the research of the invention, because the construction directly influences the quality of a separation signal. That is, the large space expressed by the mixed signal is composed of a plurality of subspaces, each subspace can express the signal of a certain source as much as possible, and then the signal of a single source can be reconstructed through the sparse coefficient and the basis of the corresponding subspace, so that signal separation is realized. Typical training dictionary methods include non-Negative Matrix Factorization (NMF), K-SVD, and the like. The dictionary learning method is to learn the corresponding sub-dictionaries by utilizing training samples of specific source signals, namely, the dictionary of each source is independently trained, so that the dictionaries have certain distinctiveness. The separation quality of single-channel blind source separation based on sparse representation depends on the distinguishability of the joint dictionary. If the united dictionary is not highly distinguishable, the separation effect is poor. This means that in addition to the expectation that each dictionary represents the corresponding source well, we also expect some distinctiveness between sub-dictionaries and other sub-dictionaries.
The discriminative dictionary learning theory is widely applied in the fields of Pattern Classification (PC) and Face Recognition (FR), and has achieved great success. Bao et al propose a Distinguishable Dictionary Learning (DDL) algorithm, which reduces cross coherence among dictionaries by optimizing functions to achieve the purpose that a joint dictionary has a distinguishing capability. Adopted by Pearlcuter et alThe norm optimization algorithm comprises the steps of obtaining a sparse dictionary of a speaker through training, combining the dictionaries obtained through training to obtain a mixed dictionary, and finally obtaining a mixed dictionary through trainingAnd (4) solving a sparse projection vector of each source speech signal by a norm optimization algorithm, and reconstructing to obtain a separated speech signal. In a recently proposed joint dictionary (CJD) for SCBSSs, we first learn an identity sub-dictionary using source speech signals corresponding to each speaker. Then we discard similar atoms between two identical sub-dictionaries and use these similar atoms to construct a common sub-dictionary.
The above methods all achieve a certain separation effect, but they do not utilize the relationship between different source signals to suppress the similarity between atoms of different sub-dictionaries. In practice, there are always some similar components between different source signals, which reduces the distinguishing capability of different identity sub-dictionaries and creates the problem of "cross projection" caused by mutual interference between dictionaries. That is, when the mixed speech signal is represented under the joint dictionary, the signal of a certain source will respond to the sub-dictionary corresponding to other sources, resulting in poor separation.
Disclosure of Invention
The invention aims to provide a single-channel blind source separation method to solve the problem of 'cross projection' caused by weak distinguishing capability of a sparse joint dictionary in the prior art.
The technical solution of the invention is as follows:
a single-channel blind source separation method comprises the following steps,
s1, training stage, giving training samples T of different source audio signalsiObtaining the corresponding initial identity sub-dictionary DiAnd an initial common sub-dictionary DcThereby obtaining a sub-dictionary D including the initial identityiAnd an initial common sub-dictionary DcThe initial joint dictionary adopts an optimization function to update the joint dictionary in an iterative manner;
s2, separating the language signals, and solving sparse projection coefficients of the mixed signals under the joint dictionary by adopting a sparse coding algorithm; and recovering each source signal according to the sub-dictionary and the sparse vector.
Further, in step S1, specifically,
s11, training by different source training samples to obtain an initial joint dictionary;
s12, fixing the initial joint dictionary D to obtain a sparse vector of the training sample on the initial joint dictionary;
and S13, fixing the current sparse vector, and updating through an optimization function to obtain a joint dictionary.
Further, in step S11, the initial identity sub-dictionary D is obtainediAnd an initial common sub-dictionary DcThe initial joint dictionary of (a) is specifically,
training by a K-SVD method to obtain an identity sub-dictionary DiThe dictionary training is of the form:
wherein DiIs a dictionary with normalized atoms obtained by training, α is TiIn dictionary DiThe projection coefficient of (c);
splicing two training samples T ═ T1,T2]And taking the DCT dictionary as an initial dictionary, and training by a K-SVD method to obtain a common sub-dictionary DcAnd splicing to obtain an initial joint dictionary D ═ D1,D2,Dc]。
Further, in step S12, when the initial joint dictionary D is fixed, a sparse coding BP algorithm is selected to update to obtain a sparse coding coefficient, and the coding coefficient is updated by using the following optimization function:
min||Xi||1,
subject to Ti=DXi(4)
where "1" represents a 1-norm, i.e. the sparse matrix XiThe sum of the absolute values of each column of non-zero elements.
Further, in step S13, the optimization function is:
wherein,
here the reconstruction error is measured by the F-norm, J is the proposed objective function, r (D, X)i,Ti) Is the i-th speaker's cross-projection penalty term, where D ═ D in equation (2)1,D2,...,Dm,Dc]Joint dictionary representing training, DmIs the identity sub-dictionary of the mth speaker, DcIs a common sub-dictionary; t isi(i ═ 1, 2.. times, m) represent training samples of clean sound sources; xiRepresents TiA sparse vector matrix on D;represents the coefficient XiAnd sub-dictionary DiA corresponding sparse coefficient;represents the coefficient XiAnd sub-dictionary DjA corresponding sparse coefficient;represents the coefficient XiAnd sub-dictionary DcCorresponding sparse coefficients, α is a weight vector.
Further, in step S2, the sparse coding algorithm is used to solve the sparse projection coefficient of the mixed signal under the joint dictionary specifically,
when a joint dictionary containing a common sub-dictionary is used, the equation is D ' × E ' ═ s, s is the mixed signal of the two source speech signals, E ' ═ E1,E2,Ec],E1,E2And EcAre each D'1,D'2And DcThe sparse projection coefficient of s above, the sparse projection coefficient E' is obtained by solving the following equation:
s.t.||E'||0≤K (8)
where K is the sparsity, i.e., the number of non-zero elements of matrix E'.
Further, in step S2, the respective source signals are recovered from the sub-dictionaries and the sparse vectors, specifically,
after the mixed signal is sparsely represented on the trained joint dictionary, the sub-dictionary D is used1And D2Response on plus common sub-dictionary DcRecovering each source signal according to the response of the above certain proportion, and obtaining an estimated source audio signal by calculating according to formula (9) when obtaining a sparse coding matrix E
Wherein α is a weight vector.
Further, α was set to 0.1, and 1- α was set to 0.9.
The invention has the beneficial effects that: compared with the traditional dictionary learning method, the single-channel blind source separation method makes full use of the characteristics of the voice signals, starts from the commonness and difference of different source signals, and constructs an optimization function to inhibit the non-corresponding part in the sparse representation coefficient. Different components of the sound source are projected on the corresponding identity sub-dictionary as much as possible, and similar components of the sound source are projected on the common sub-dictionary, so that the phenomenon of 'cross projection' is weakened, and signals can be better separated. Aiming at the problem of 'cross projection' caused by weak distinguishing capability of a joint dictionary, the invention adopts the joint dictionary with high distinguishing capability, and compared with other single-channel blind source separation algorithms based on sparse representation, the invention reduces source interference and obviously improves the separation quality.
Drawings
Fig. 1 is an explanatory block diagram of the single-channel blind source separation method of the present invention.
Fig. 2 is an explanatory diagram of source signal separation in the embodiment.
Fig. 3 is a diagram illustrating the variation of the separation effect with the weight vector α in the embodiment.
FIG. 4 is a diagram illustrating the variation of the separation effect with the weight vectors 1- α in the embodiment.
FIG. 5 is a diagram of separated speech in an embodiment.
FIG. 6 is a diagram showing the variation of the separation effect with the number of sub-dictionary atoms in the example.
Detailed Description
Preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
Examples
According to the single-channel blind source separation method, a common sub-dictionary containing similar information is introduced, and the relationship between the sub-dictionaries is constrained by constructing a new optimization function, so that the difference of the identity sub-dictionaries is further improved. The unique components of each speaker sample are sparsely represented by the identity sub-dictionary corresponding to the speaker sample as much as possible, while the components with high correlation are sparsely represented by the public sub-dictionary as much as possible, and when the source signal is represented by the joint dictionary, the problem of cross projection can be effectively avoided, so that the quality of voice separation is improved.
A single-channel blind source separation method, as shown in fig. 1, comprising the following steps:
s1, training phase. Given different source speech signal training samples TiObtaining the corresponding initial identityDictionary DiAnd an initial common sub-dictionary DcThereby obtaining a sub-dictionary D including the initial identityiAnd an initial common sub-dictionary DcThe joint dictionary is updated by adopting an optimization function iteration.
The proposal of a new optimization function of the embodiment. Compared with the existing single-channel blind source separation algorithm based on dictionary learning, the embodiment method divides the self characteristics of each source signal into two parts, wherein one part is the unique characteristics of different speaker samples, and the other part is the common characteristics among different speaker samples. Wherein the unique features of the speaker samples are independent of each other. Accordingly, for different speakers, a joint dictionary includes three parts: a respective sub-dictionary of speakers and a common sub-dictionary used to represent the similarity of different speakers.
In order to construct a joint dictionary with strong distinctiveness, the difference of the identity sub-dictionaries is further improved. In the training process, the embodiment method restricts the relation between the sub-dictionaries by constructing a proper objective function, so that the unique components of each speaker sample are sparsely represented by the sub-dictionaries corresponding to the unique components, and the components with high correlation are sparsely represented by the common sub-dictionaries as much as possible.
Here, in order not to lose generality, discussion is developed based on the case of m speakers. Let T beiIs the training sample of the ith source signal, which corresponds to the sub-dictionary DiThe corresponding sparse representation coefficient is X ═ X1,X2,...,Xm,Xc]T. To improve the separation performance of the algorithm, the cross-projections between the source signals must be suppressed. To this end, a new objective function is proposed, as shown in equation (1):
wherein,
here the reconstruction error is measured by the F-norm, J is the proposed objective function, r (D, X)i,Ti) Is the i-th talker's cross-projection penalty term, where D ═ D1,D2,...,Dm,Dc]Joint dictionary representing training, DmIs the identity sub-dictionary of the mth speaker, DcIs a common sub-dictionary; t isi(i ═ 1, 2.. times, m) represent training samples of clean sound sources; xiRepresents TiA sparse vector matrix on D;represents the coefficient XiAnd sub-dictionary DiA corresponding sparse coefficient;represents the coefficient XiAnd sub-dictionary DjA corresponding sparse coefficient;represents the coefficient XiAnd sub-dictionary DcCorresponding sparse coefficients, α is a weight vector.
It is worth noting that in the first term of equation (2), the different source training signals should be sparsely represented by the corresponding identity sub-dictionary and the common sub-dictionary. The punishment item mainly considers the identity sub-dictionary with certain distinguishing capability and the public sub-dictionary containing public information, so that the comprehensiveness of the joint dictionary is ensured. Considering the existence of the "cross-projection" problem, the second term constrains the cross-projection that a training signal produces on another identity sub-dictionary. That is, when one source signal is sparsely represented on the joint dictionary, the representations on the other sub-dictionaries should be as few as possible in order to reduce the effect of cross-projection. Taking the example of mixing two source signals, a sparse matrix E ═ E is defined1,E2,Ec]TIn which E1、E2And EcIs that the mixed signal s is respectively in the identity sub-dictionary D1,D2And a common sub-dictionary DcThe sparse coefficient of (2). After the mixed signal s is sparsely represented on the trained joint dictionary, each source signal can be recovered by adding a certain proportion of responses on the sub-dictionary and the common sub-dictionary. The source separation module is shown in fig. 2.
In step S1, specifically, the step,
and S11, training by different source training samples to obtain an initial joint dictionary. The embodiment selects 2 speakers T with different genders from a Chinese speech library of an automated research institute of Chinese academy of sciencesiFor each speaker, each speaker has a total of 150 speech data samples, the sampling frequency of the signal is 16kHz, and each frame of the signal takes 256 sampling points.
Selecting DCT dictionary as initial dictionary, using single sound source signal sample TiAs training data, training by a K-SVD method to obtain an identity sub-dictionary DiThe dictionary training is of the form:
wherein DiIs a dictionary with normalized atoms obtained by training, α is TiIn dictionary DiUpper projection factor.
Splicing two training samples T ═ T1,T2]The DCT dictionary, namely the discrete cosine transform dictionary is used as an initial dictionary, and the common sub-dictionary D is obtained through training by a K-SVD methodcAnd splicing to obtain an initial joint dictionary D ═ D1,D2,Dc]。
And S12, fixing the initial joint dictionary D to obtain a sparse vector of the training sample on the initial joint dictionary. And when the initial D is fixed, selecting a sparse coding BP algorithm to update to obtain a sparse coding coefficient. Embodiments update the coding coefficients with the following optimization function:
min||Xi||1,
subject to Ti=DXi(4)
where "1" represents a 1-norm, i.e. the sparse matrix XiThe sum of the absolute values of each column of non-zero elements.
And S13, fixing the current sparse vector, and updating through an optimization function to obtain a joint dictionary. When the coding coefficients are fixed, embodiments update the joint dictionary with the following optimization function:
wherein,
to solve this optimization problem, embodiments introduce a matrix Qi,i=1,2,3, Andwhere 0 denotes an all-zero matrix and I denotes an identity matrix. Thus, the formula (6) can be written as
In the new error function, only the joint dictionary is unknown, and when the joint dictionary is updated, the identity sub-dictionary and the public sub-dictionary can be updated simultaneously. The optimization problem shown in equation (5) can be implemented using a quasi-newton method. The quasi-newton method requires only the gradient of the objective function and no information on the second derivative, so such methods tend to perform better than the steepest descent method and newton method. The embodiment selects a limited memory BFGS algorithm (L-BFGS) in a quasi-Newton method to solve the optimization problem.
S2, separating the language signals, and solving sparse projection coefficients of the mixed signals under the joint dictionary by adopting a sparse coding algorithm; and recovering each source signal according to the sub-dictionary and the sparse vector.
The approach to solving SCBSS can be translated into the solution equation D × E ═ s, where D is the joint dictionary and s is the mix signal. When a joint dictionary containing common sub-dictionaries is used, the equation can be modified to D 'x E's. s is a mixture of two source speech signals, E ═ E1,E2,Ec],E1,E2And EcAre each D'1,D'2And DcSparse projection coefficients of s. By solving equationsTo obtain E ', where K is the sparsity, i.e. the number of non-zero elements of the matrix E'. The BP algorithm is used here to solve this optimization problem.
After the mixed signal is sparsely represented on the trained joint dictionary, the sub-dictionary D can be used1And D2The responses above plus a proportion of the responses above the common sub-dictionary to recover the respective source signals. When E' is obtained, the estimated source can be calculated by equation (9)
To obtain the best separation effect, the influence of weight vector α on the performance of the speech separation algorithm is compared through experiments, and experimental data are recorded as in fig. 3 and fig. 4. fig. 3 shows the curve of male-SNR with weight change.
Evaluation of experiments
The traditional K-SVD dictionary training algorithm, the DDL method, the CJD method and the new optimization function combined dictionary learning method are adopted to carry out experiments respectively, the voice separation effect is evaluated, and the quality of the separation quality analysis method is improved. And performing experiments on the dictionary size influencing the separation effect, and selecting a proper dictionary size from the signal-to-noise ratio analysis of the separated source signals.
In order to compare the performances of various voice separation algorithms, the effectiveness of the algorithm provided by the embodiment is proved, and the performances of the voice separation algorithms with different algorithms need to be measured. The Signal-to-Noise Ratio (SNR) of the Signal and Noise is used as a measure. A higher signal-to-noise ratio indicates less distortion of the separated source signal and better separation performance. Since the dictionary size has a great influence on the algorithm proposed in the embodiment, the atomic numbers of the sub-dictionaries are set to 384, 512, 640, 768, 896, respectively, the frame lengths are 256, and the male-SNR, female-SNR, and average SNR at different dictionary sizes are obtained, respectively, and the separation effects thereof are compared. For the convenience of description, the single-channel blind source separation algorithm for optimizing the function learning joint dictionary proposed in the embodiment is referred to as njdl (new optimization method for joint dictionary separation).
Comparison of separation Performance of one and different algorithms
The algorithm proposed in the example was compared with SCBSS which constructs a common dictionary method based on K-SVD, DDL dictionary learning method and CJD. Objectively, the signal-to-noise ratio is solved according to equation (10) to measure the reconstruction effect of the source signal. The separation effect of the SCBSS algorithm proposed by the embodiment is shown in fig. 5, and it can be seen from fig. 5 that the present invention can well separate the source signal from the mixed signal, which is very close to the waveform of the source speech signal.
Wherein s isiIs the source audio signal of the audio signal,is siExperimental data are recorded in table 1.
TABLE 1 Signal-to-noise ratio (dB) after separation using different dictionary learning algorithms
SNR Male sex Female with a view to preventing the formation of wrinkles Average
K-SVD 5.9214 0.2508 3.0861
DDL 8.0623 2.8803 5.4713
CJD 7.03 1.7436 4.3868
NJDL 8.4676 3.3059 5.8868
From the experimental results of Table 1, it can be seen that the SCBSS method proposed by the example method is generally superior to the SCBSS methods based on K-SVD, DDL and CJD, regardless of male-SNR, female-SNR or average signal-to-noise ratio. Compared with other algorithms, the embodiment method separates the individual component and the common component of the signal, and effectively reduces source confusion. Meanwhile, the superiority of the learning combined dictionary is verified by the higher signal-to-noise ratio.
Specifically, the SNR of the example method is improved by about 2.5dB for male voice signals and about 3.0dB for female voice signals compared to the K-SVD method. Compared to the method in DDL, the example method improves by about 0.4dB for male voice signals and 0.4dB for female voice signals. Compared to the method in CJD, the example method improves the SNR of male voice signals by about 1.4dB and female voice signals by 1.6 dB. Compared with K-SVD, the average improvement is 2.8dB, compared with DDL, the improvement is 0.4dB, and compared with CJD, the improvement is 1.5 dB. Therefore, compared with other algorithms, the single-channel blind source separation algorithm based on the optimization function learning joint dictionary provided by the embodiment can obviously improve the separation performance.
Second, dictionary size impact on separation Performance
In the following experiments, the separation effect of the example method at different dictionary sizes was examined. The size of the sub-dictionary is set to l × c, and the size of the joint dictionary is l × 3 c. The atomic numbers are set to 384, 512, 640, 768, and 896, respectively, and the frame length sizes are all set to 256. The sizes of the experimental joint dictionary D are 256 × 1152, 256 × 1536, 256 × 1920, 256 × 2304, and 256 × 2688, respectively.
Fig. 6 shows the effect of separating male and female signals at different dictionary sizes. As can be seen in FIG. 6, it shows that the male SNR increases slightly as the sub-dictionary columns increase from 384 to 786. This is mainly due to the fact that the number of dictionary atoms in the embodiment method is closely related to the separation performance. More dictionary atoms means that more atoms can effectively represent a signal. That is, as the size of the dictionary increases, the dictionary more completely captures two unique and similar source signals. As can be seen from the graph, the SNR of the separated signal is highest when the number of atoms in the sub-dictionary is 768. As can be seen from fig. 6, the traces of female SNR and average SNR are consistent with the traces for male SNR. Compared to our experimental 512 dictionary size method in 4.2 bars, this method improves male SNR by about 0.4dB and female SNR by about 0.5dB at dictionary atom 768. From the results, the best performance of the separation can be seen in the case of a sub-dictionary size of 256 × 768. This means that the separation performance can be improved by selecting an appropriate dictionary size. It can be concluded that it is more important to choose a suitable dictionary size to achieve the best separation performance, while it is not negligible that an increase in the number of dictionary atoms also results in more time spent.
The above experimental results show that: compared with other algorithms, the single-channel blind source separation algorithm of the learning joint dictionary based on the new optimization function can effectively reduce the problem of cross projection and improve the voice separation effect. Moreover, the reconstruction effect of the source speech is improved by selecting a proper dictionary size.

Claims (8)

1. A single-channel blind source separation method is characterized in that: comprises the following steps of (a) carrying out,
s1, training stage, giving training samples T of different source audio signalsiObtaining the corresponding initial identity sub-dictionary DiAnd an initial common sub-dictionary DcThereby obtaining a sub-dictionary D including the initial identityiAnd an initial common sub-dictionary DcThe initial joint dictionary adopts an optimization function to update the joint dictionary in an iterative manner;
s2, separating the language signals, and solving sparse projection coefficients of the mixed signals under the joint dictionary by adopting a sparse coding algorithm; and recovering each source signal according to the sub-dictionary and the sparse vector.
2. The single channel blind source separation method of claim 1, characterized by: in step S1, specifically, the step,
s11, training by different source training samples to obtain an initial joint dictionary;
s12, fixing the initial joint dictionary D to obtain a sparse vector of the training sample on the initial joint dictionary;
and S13, fixing the current sparse vector, and updating through an optimization function to obtain a joint dictionary.
3. The single channel blind source separation method of claim 2, characterized by: in step S11, the initial identity sub-dictionary D is obtainediAnd an initial common sub-dictionary DcThe initial joint dictionary of (a) is specifically,
training by a K-SVD method to obtain an identity sub-dictionary DiThe dictionary training is of the form:
wherein DiIs a dictionary with normalized atoms obtained by training, α is TiIn dictionary DiThe projection coefficients of (a);
splicing two training samples T ═ T1,T2]And taking the DCT dictionary as an initial dictionary, and training by a K-SVD method to obtain a common sub-dictionary DcAnd splicing to obtain an initial joint dictionary D ═ D1,D2,Dc]。
4. The single-channel blind source separation method of claim 2, wherein in step S12, when the initial joint dictionary D is fixed, a sparse coding BP algorithm is selected to update to obtain sparse coding coefficients, and the coding coefficients are updated by using the following optimization function:
min||Xi||1,
subject to Ti=DXi(4)
where "1" represents a 1-norm, i.e. the sparse matrix XiThe sum of the absolute values of each column of non-zero elements.
5. The single channel blind source separation method of claim 2, characterized by: in step S13, the optimization function is:
wherein,
here the reconstruction error is measured by the F-norm, J is the proposed objective function, r (D, X)i,Ti) Is the cross-projection penalty term of the ith speaker, where D ═ D in equation (2)1,D2,...,Dm,Dc]Joint dictionary representing training, DmIs the identity sub-dictionary of the mth speaker, DcIs a common sub-dictionary; t isi(i ═ 1, 2.. times, m) represent training samples of clean sound sources; xiRepresents TiA sparse vector matrix on D;represents the coefficient XiAnd sub-dictionary DiA corresponding sparse coefficient;represents the coefficient XiAnd sub-dictionary DjA corresponding sparse coefficient;represents the coefficient XiAnd sub-dictionary DcCorresponding sparse coefficients, α is a weight vector.
6. The single channel blind source separation method of any of claims 1-5, characterized by: in step S2, the sparse coding algorithm is used to solve the sparse projection coefficient of the mixed signal under the joint dictionary,
when a joint dictionary is used that contains a common sub-dictionary, the equation is D ' x E's, s being the mixture of the two source speech signals, E ' E1,E2,Ec],E1,E2And EcAre each D'1,D'2And DcThe sparse projection coefficient of s above, the sparse projection coefficient E' is obtained by solving the following equation:
s.t.||E'||0≤K (8)
where K is the sparsity, i.e., the number of non-zero elements of matrix E'.
7. The single channel blind source separation method of claim 5, characterized by: in step S2, each source signal is recovered from the sub-dictionary and the sparse vector, specifically,
after the mixed signal is sparsely represented on the trained joint dictionary, the sub-dictionary D is used1And D2Response on plus common sub-dictionary DcRecovering each source signal according to the response of the last certain proportion, and obtaining a sparse coding matrix
E', an estimated source speech signal is calculated by equation (9)
Wherein α is a weight vector.
8. The single channel blind source separation method of claim 7 wherein α is set to 0.1 and 1- α is set to 0.9.
CN201810599522.6A 2018-06-11 2018-06-11 Single-channel blind source separation method Active CN108875824B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810599522.6A CN108875824B (en) 2018-06-11 2018-06-11 Single-channel blind source separation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810599522.6A CN108875824B (en) 2018-06-11 2018-06-11 Single-channel blind source separation method

Publications (2)

Publication Number Publication Date
CN108875824A true CN108875824A (en) 2018-11-23
CN108875824B CN108875824B (en) 2022-09-27

Family

ID=64337993

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810599522.6A Active CN108875824B (en) 2018-06-11 2018-06-11 Single-channel blind source separation method

Country Status (1)

Country Link
CN (1) CN108875824B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110544482A (en) * 2019-09-09 2019-12-06 极限元(杭州)智能科技股份有限公司 single-channel voice separation system
CN112329855A (en) * 2020-11-05 2021-02-05 华侨大学 Underdetermined working modal parameter identification method and detection method based on adaptive dictionary
CN118230753A (en) * 2024-04-15 2024-06-21 武汉理工大学 Blind source separation method, device and medium based on NOODL algorithm

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102288285A (en) * 2011-05-24 2011-12-21 南京航空航天大学 Blind source separation method for single-channel vibration signals
CN104378320A (en) * 2014-11-13 2015-02-25 中国人民解放军总参谋部第六十三研究所 Anti-interference communication method and receiving device based on single-channel blind source separation
CN107024352A (en) * 2017-05-03 2017-08-08 哈尔滨理工大学 A kind of Rolling Bearing Fault Character extracting method based on slip entropy ICA algorithm

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102288285A (en) * 2011-05-24 2011-12-21 南京航空航天大学 Blind source separation method for single-channel vibration signals
CN104378320A (en) * 2014-11-13 2015-02-25 中国人民解放军总参谋部第六十三研究所 Anti-interference communication method and receiving device based on single-channel blind source separation
CN107024352A (en) * 2017-05-03 2017-08-08 哈尔滨理工大学 A kind of Rolling Bearing Fault Character extracting method based on slip entropy ICA algorithm

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
田元荣等: "一种新的基于稀疏表示的单通道盲源分离算法", 《电子与信息学报》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110544482A (en) * 2019-09-09 2019-12-06 极限元(杭州)智能科技股份有限公司 single-channel voice separation system
CN110544482B (en) * 2019-09-09 2021-11-12 北京中科智极科技有限公司 Single-channel voice separation system
CN112329855A (en) * 2020-11-05 2021-02-05 华侨大学 Underdetermined working modal parameter identification method and detection method based on adaptive dictionary
CN112329855B (en) * 2020-11-05 2023-06-02 华侨大学 Underdetermined working mode parameter identification method and detection method based on self-adaptive dictionary
CN118230753A (en) * 2024-04-15 2024-06-21 武汉理工大学 Blind source separation method, device and medium based on NOODL algorithm
CN118230753B (en) * 2024-04-15 2024-08-30 武汉理工大学 Blind source separation method, device and medium based on NOODL algorithm

Also Published As

Publication number Publication date
CN108875824B (en) 2022-09-27

Similar Documents

Publication Publication Date Title
Chien Source separation and machine learning
US8751227B2 (en) Acoustic model learning device and speech recognition device
CN103345923B (en) A kind of phrase sound method for distinguishing speek person based on rarefaction representation
Kwon et al. Phoneme recognition using ICA-based feature extraction and transformation
CN108875824B (en) Single-channel blind source separation method
US20140236593A1 (en) Speaker recognition method through emotional model synthesis based on neighbors preserving principle
CN108962229B (en) Single-channel and unsupervised target speaker voice extraction method
CN106847301A (en) A kind of ears speech separating method based on compressed sensing and attitude information
Ozerov et al. Uncertainty-based learning of acoustic models from noisy data
CN108198566B (en) Information processing method and device, electronic device and storage medium
US20150348537A1 (en) Source Signal Separation by Discriminatively-Trained Non-Negative Matrix Factorization
Halperin et al. Neural separation of observed and unobserved distributions
Delcroix et al. Speech recognition in living rooms: Integrated speech enhancement and recognition system based on spatial, spectral and temporal modeling of sounds
Moritz et al. Multi-channel speech enhancement and amplitude modulation analysis for noise robust automatic speech recognition
CN112992172A (en) Single-channel time domain bird song separating method based on attention mechanism
CN107103913B (en) Speech recognition method based on power spectrum Gabor characteristic sequence recursion model
Nesta et al. Robust Automatic Speech Recognition through On-line Semi Blind Signal Extraction
Şimşekli et al. Non-negative tensor factorization models for Bayesian audio processing
Liu et al. Use of bimodal coherence to resolve the permutation problem in convolutive BSS
CN112037813B (en) Voice extraction method for high-power target signal
Nesta et al. Audio/video supervised independent vector analysis through multimodal pilot dependent components
Zhou et al. Improved phoneme-based myoelectric speech recognition
JP6910609B2 (en) Signal analyzers, methods, and programs
CN116612779A (en) Single-channel voice separation method based on deep learning
Arberet et al. A tractable framework for estimating and combining spectral source models for audio source separation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant