CN113470689B

CN113470689B - Voice separation method

Info

Publication number: CN113470689B
Application number: CN202110968974.9A
Authority: CN
Inventors: 梁骏; 沈旭东; 叶丰; 卢燕; 姚欢
Original assignee: Hangzhou Nationalchip Science & Technology Co ltd
Current assignee: Hangzhou Guoxin Microelectronics Co.,Ltd.
Priority date: 2021-08-23
Filing date: 2021-08-23
Publication date: 2024-01-30
Anticipated expiration: 2041-08-23
Also published as: CN113470689A

Abstract

The invention discloses a voice separation method. The existing method does not process the human voice residues of other residual channels of the blind source separation algorithm. Firstly, processing voice signals of a plurality of channels through a blind source separation module to obtain voice signals after multipath separation, wherein each path of separated voice signals contains voice signals remained in other channels; then, after being processed by a residual selection module, a multi-path residual output signal is obtained; the residual output signals are filtered by a residual suppression module to obtain expected residual signals, and each path of separated voice signals output by the blind source separation module is subtracted from the corresponding expected residual signals to serve as residual signals, namely the final output voice signals of the path; the adaptive filter of the residual suppression module updates the filter coefficients using an iterative algorithm using the residual signal. The method can separate the voice residual signals of other channels, reduce the energy of the voice residual, does not damage the expected signals, and improves the voice separation quality.

Description

Voice separation method

Technical Field

The invention belongs to the field of voice processing, and particularly relates to a method for separating voice.

Background

Speech extraction in noisy environments, as well as in multi-acoustic and environmental environments, is a key technology. In a scenario where multiple microphones are used, independent Component Analysis (ICA) is a well established approach. ICA was first discussed in us patent No. 5706402. Assuming that each microphone signal is a mixture of multiple independent speech signals, the independent component analysis iterates a unmixed (un-mixing) matrix, which is multiplied by the mixed signal, resulting in a separate signal. The objective of the independent component analysis iterations is that the relative entropy of these signals be maximized, thereby minimizing information redundancy. ICA does not require information of each signal source, and is also a kind of blind source separation method (BSS).

But ICA algorithms cannot completely separate the signals recorded by the microphones. Each path of the output signal of the ICA algorithm includes signal energy of undesired speech of other paths of smaller amplitude in addition to the signal of the desired target object. Since the human hearing system is a very sensitive system, the signal energy of the undesired speech is clearly audible to the human even if it is suppressed to a large extent. After post-processing such as noise reduction, the ICA output signal can further reduce the signal energy of the undesired voice, but this generally causes damage to the desired signal.

The patent number 200380109681.5 invention discloses a system and method for separating a mixture of audio signals into a desired audio signal and a noise signal. The method sets the microphone to receive the mixed audio signal and the independent component analysis ICA uses stability constraints to process the sound mixture. The ICA process recognizes and separates a target sound signal using predetermined characteristics of a desired sound signal. The filter coefficients are adapted with learning rules and stabilize the filter weight update dynamics to assist in converging to a stable separated ICA signal result. The separated signals are aided using post-processing and pre-processing techniques and information to further reduce noise effects.

The invention patent number 201110117022.2 discloses a frequency domain blind separation ordering algorithm of a convolution voice signal. For convolutionally aliased speech signals, the time sequence is first converted to the frequency domain, blind separation is performed on each frequency band using the frequency domain ICA algorithm, and then the ordering is performed by the ordering algorithm. The sequencing method comprises the following steps: the reference frequency bands are firstly selected for alignment, then the rest frequency bands are ordered according to the already ordered reference frequency bands, the frequency bands with possibly wrong ordering are marked, and the DOA estimation based on the separation matrix is used for complementary alignment.

The invention patent number 201610866508.9 discloses a blind sound separation method, a blind sound separation structure, a voice control system and an electric appliance assembly. According to the method, a linear transient system assumption is made for an actual working environment, and an ICA-based blind source separation technology is provided for a linear transient mixing system, so that the noise reduction effect of a voice signal is achieved. The blind sound separation method comprises the following steps: noise reduction pretreatment is carried out on the detected voice signals, wherein the voice signals are linear superposition signals of voice information of a plurality of signal sources; constructing an objective function on the preprocessed voice signal by using a non-Gaussian metric; and estimating the separation matrix with the maximum expected objective function through an iterative algorithm.

Disclosure of Invention

The invention aims at providing a voice separation method with higher performance aiming at the characteristics of the existing blind source separation method, and reduces the residues of undesired signals.

The method comprises the steps that firstly, voice signals of N channels are processed through a blind source separation module, N paths of separated voice signals are obtained, and the separated voice signals contain voice signals remained in other channels, wherein N is more than or equal to 2; then, after being processed by a residual selection module, N paths of residual output signals are obtained; the residual output signal is filtered by a residual suppression module to obtain an expected residual signal, the residual voice signals which are output by a blind source separation module and contain the residual voice signals of other channels are subtracted with the expected residual signal to be used as residual signals, the nth path of residual signals are the final nth path of output voice signals, and n=1, 2, …, and N ',1 is less than or equal to N'.

The voice signals of the N channels are input into a blind source separation module, the blind source separation module outputs N paths of voice output signals, and N is a natural number larger than 1; the blind source separation module adopts a blind source separation algorithm including ICA to separate N paths of input voice signals and outputs N paths of voice signals containing other channel signal residues.

The residual selection module comprises N' path selection processing channels, the N path selection processing channels take all N paths of separated voice signals output by the blind source separation module as input, the N path of separated voice signals are taken as main signals, and the N path selection processing channels calculate the probability that the main signals are target voice signals at each moment: when the probability is larger than or equal to a set threshold value, outputting a voice signal with the amplitude of 0 at the moment; when the probability is smaller than a set threshold, outputting a main signal at the current moment as an nth path of residual output signal of the residual selection module; the N 'path selection processing channel outputs N' path residual output signals; the probability of the target voice is calculated by adopting a phase comparison or signal amplitude comparison method.

The residual suppression module comprises an N' path suppression processing channel, the N path suppression processing channel takes the N path of voice output signal output by the blind source separation module and the N path of residual output signal output by the residual selection module as inputs, performs residual suppression, and outputs the voice signal as a final N path of output voice signal.

The nth path of the inhibition processing channel in the residual inhibition module comprises an adaptive filter, and the adaptive filter filters the input corresponding residual output signal to obtain an expected residual signal; the n-th path of voice output signal of the blind source separation module is subtracted from the expected residual signal to be used as a residual signal, and the adaptive filter updates the filter coefficient by using the residual signal and adopting an iterative algorithm; when the residual output signal received by the residual suppression module is 0, the adaptive filter does not update the filter coefficients. The nth residual signal is the final nth output voice signal.

The existing post-processing algorithm focuses on reducing the residual noise signals of the blind source separation algorithm, and human voice residues of other channels of the residual blind source separation algorithm are not processed. The method reduces the signals of the human voice residues of other residual channels of the blind source separation algorithm through the self-adaptive filter, reduces the energy of the human voice residues heard by people, does not bring the damage of expected signals, and improves the voice separation quality.

Detailed Description

The following describes a voice separation method according to the present invention using three voice signals as an example.

The voice signals of the three channels are input into a blind source separation module, the blind source separation module carries out voice signal separation by adopting an ICA algorithm, three paths of separated voice signals B1, B2 and B3 are output, each path of separated voice signal contains residual voice signals of other channels, wherein the first path of voice signal B1 contains the residues of B2 and B3, the second path of voice signal B2 contains the residues of B1 and B3, and the third path of voice signal B3 contains the residues of B1 and B2.

The residual selection module comprises three paths of selection processing channels, each path of selection processing channel takes voice signals B1, B2 and B3 as input, the voice signals B1, B2 and B3 after the 1 st path, the 2 nd path and the 3 rd path are separated as main signals, each path of selection processing channel calculates the probability that the main signal is a target voice signal at each moment, and the probability is calculated through amplitude difference: for the first path selection processing channel, fourier transformation is performed on the voice signals B1, B2 and B3, wherein the voice signal B1 is used as a main signal, the amplitude difference between the voice signal B1 and the voice signal B2 and the amplitude difference between the voice signal B1 and the voice signal B3 are compared on each time frequency point, and if the two amplitude differences are greater than or equal to an amplitude difference threshold D1 (set to 0), the amplitude of the voice signal output at the moment is 0; and if the two amplitude differences are not more than or equal to the amplitude difference threshold D1, outputting a main signal at the current moment.

The residual suppression module comprises three suppression processing channels. Taking the first path as an example, the first path inhibition processing channel takes the first path voice output signal B1 output by the blind source separation module and the first path residual output signal of the residual selection module as inputs, carries out residual inhibition, and outputs a path voice signal as a final first path output voice signal.

The first path of inhibition processing channel in the residual inhibition module comprises an adaptive filter with the length of K (K is a natural number), and the adaptive filter filters the input corresponding residual output signal to obtain an expected residual signal; the first path of voice output signal B1 of the blind source separation module is subtracted from the expected residual signal to be used as a residual signal S1, the adaptive filter uses the residual signal S1, and an iterative algorithm (such as an LMS algorithm) is adopted to update the filter coefficient; when the residual output signal received by the residual suppression module is 0, the adaptive filter does not update the filter coefficients. The first path of residual signal S1 is the final first path of output voice signal.

Alternatively, if it is desired to separate only one or two speech signals, the residual selection module may include only one or two selection processing channels, and the residual suppression module may include only one or two suppression processing channels.

It should be understood that the foregoing examples are illustrative of the present invention and are not intended to be limiting, and that any invention that does not depart from the spirit and scope of the present invention falls within the scope of the present invention.

Claims

1. A method of speech separation, characterized by:

firstly, processing voice signals of N channels through a blind source separation module to obtain N paths of separated voice signals, wherein each path of separated voice signals contains voice signals remained in other channels, and N is more than or equal to 2; then, after being processed by a residual selection module, N paths of residual output signals are obtained; the residual output signals are filtered by a residual suppression module to obtain expected residual signals, and each path of separated voice signals output by the blind source separation module is subtracted from the corresponding expected residual signals to serve as residual signals, namely the final output voice signals of the path; the adaptive filter of the residual suppression module updates the filter coefficient by using a residual signal and adopting an iterative algorithm;

the residual selection module comprises N ' path selection processing channels, the N path selection processing channels take all N paths of separated voice signals output by the blind source separation module as input, the N path of separated voice signals are taken as main signals, the N path selection processing channels calculate the probability that the main signals are target voice signals at each moment, and n=1, 2, …, N ',1 is less than or equal to N ' isless than or equal to N: when the probability is larger than or equal to a set threshold value, outputting a voice signal with the amplitude of 0 at the moment; when the probability is smaller than a set threshold, outputting a main signal at the current moment as an nth path of residual output signal of the residual selection module;

2. A method of speech separation according to claim 1, wherein: the voice signals of N channels are input into a blind source separation module, a blind source separation algorithm adopted by the blind source separation module is used for separating N paths of input voice signals, and N paths of voice signals containing the residual signals of other channels are output.

3. A method of speech separation according to claim 2, wherein: the blind source separation algorithm adopts an ICA algorithm.

4. A method of speech separation according to claim 1, wherein: the probability that the main signal is the target voice signal at each moment is calculated by adopting a phase comparison method or a signal amplitude comparison method.

5. A method of speech separation according to claim 1, wherein: each path of inhibition processing channel in the residual inhibition module comprises an adaptive filter, and the adaptive filter filters the input corresponding residual output signal to obtain an expected residual signal; the n-th speech output signal of the blind source separation module is subtracted from the expected residual signal to be used as a residual signal, and the n-th residual signal is the final n-th output speech signal.

6. A method of speech separation according to claim 5 wherein: the adaptive filter updates the filter coefficient by using the residual signal and adopting an iterative algorithm; when the residual output signal received by the residual suppression module is 0, the adaptive filter does not update the filter coefficient.

7. The method of claim 6, wherein: the iterative algorithm adopts an LMS algorithm.