CN106782595B

CN106782595B - Robust blocking matrix method for reducing voice leakage

Info

Publication number: CN106782595B
Application number: CN201611218157.7A
Authority: CN
Inventors: 曹裕行
Original assignee: Unisound Shanghai Intelligent Technology Co Ltd
Current assignee: Unisound Shanghai Intelligent Technology Co Ltd
Priority date: 2016-12-26
Filing date: 2016-12-26
Publication date: 2020-06-09
Anticipated expiration: 2036-12-26
Also published as: CN106782595A

Abstract

The invention discloses a robust blocking matrix method for reducing voice leakage, which comprises the following steps: inputting a sound signal; acquiring a target voice signal from the voice signal by using a fixed beam module; eliminating a target voice signal from the voice signal by using a blocking matrix module to obtain a noise signal; estimating the prior probability of the target voice signal in the noise signal by using a fixed beam module; the blocking matrix module updates the noise signal according to the prior probability, eliminates a target voice signal existing in the noise signal and obtains an updated noise signal; and eliminating the noise signal output by the blocking matrix module from the target voice signal output by the fixed beam module by utilizing the elimination module to form an output signal and output the output signal. According to the invention, before the elimination module is used for eliminating the residual noise signal in the target voice signal, the blocking matrix parameter of the blocking matrix module is updated in advance so as to eliminate the target voice signal which is missed in the noise signal and enhance the function of eliminating the target voice signal of the blocking matrix module.

Description

Robust blocking matrix method for reducing voice leakage

Technical Field

The invention relates to the field of voice recognition, in particular to a robust blocking matrix method for reducing voice leakage.

Background

The speech enhancement technology based on the microphone array is widely applied to communication, man-machine interaction, speech recognition systems and the like, wherein the Generalized Sidelobe Cancellation (GSC) method is most widely applied, and the method is easy to implement and has good performance. The GSC is divided into an upper path and a lower path, the upper path is a fixed beam module (FBF) used for estimating a reference signal of target voice, and the lower path is a blocking matrix module (BM) and a eliminating Module (MC) used for eliminating residual noise in the fixed beam, wherein the blocking matrix module is used for eliminating the target voice signal to obtain a noise signal.

From many practical systems, the most vulnerable to performance degradation of the GSC is speech leakage in the BM module, i.e. the BM does not block the target speech signal, resulting in cancellation of the leaked speech signal by subtraction from the speech signal in the FBF. Conventional BM designs often use adaptive BMs or directly use differential matrices. Because of the error of the microphone array system or the error of the estimation of the target direction, the performance of the differential matrix is greatly reduced, and the adaptive BM is affected by the step size of the adaptive weight update, and the adaptive convergence is a relatively large problem.

Disclosure of Invention

The invention aims to solve the technical problem of providing a robust blocking matrix method for reducing voice leakage, which can greatly reduce the voice leakage condition.

In order to achieve the technical effect, the invention discloses a robust blocking matrix method for reducing voice leakage, which comprises the following steps:

providing a sound signal;

inputting the sound signal into a fixed beam module and a blocking matrix module of a generalized side lobe cancellation structure, wherein the generalized side lobe cancellation structure is provided with a first channel and a second channel which are connected in parallel, the fixed beam module is positioned on the first channel, and the blocking matrix module is positioned on the second channel; the second path is also provided with a cancellation module, the input of the cancellation module is connected with the output of the blocking matrix module, and the output of the cancellation module is connected with the output of the fixed beam module;

acquiring a target voice signal from the input voice signal by using the fixed beam module, and outputting the target voice signal;

eliminating a target voice signal from the input voice signal by using the blocking matrix module to obtain a noise signal;

estimating, with the fixed beam module, a prior probability of a target speech signal being present in the noise signal;

the blocking matrix module updates the noise signal according to the prior probability, eliminates a target voice signal existing in the noise signal, obtains an updated noise signal and outputs the updated noise signal;

and eliminating the noise signal output by the blocking matrix module from the target voice signal output by the fixed beam module by using the eliminating module to form an output signal and output the output signal.

Due to the adoption of the technical scheme, the invention has the following beneficial effects: the target voice signal output by the fixed beam module and the noise signal output by the blocking matrix module are offset by the aid of the eliminating module, before the residual noise signal in the target voice signal is eliminated, the probability prior of the target voice signal existing in the noise signal output by the blocking matrix module is carried out in advance, the blocking matrix parameter of the blocking matrix module is updated, the target voice signal omitted in the noise signal is eliminated, the function of the blocking matrix module for eliminating the target voice signal is enhanced, the phenomenon that the target voice signal is completely blocked by the blocking matrix module and is subtracted from the target voice signal in the fixed beam module to offset the leaked target voice signal is avoided, and the situation of voice leakage is greatly reduced.

The robust blocking matrix method for reducing the voice leakage is further improved in that the voice two-state model of the voice signal is as follows:

H₀：X＝N

H₁: x is S + N (one type)

Wherein H₀The state represents a state in which only noise is present, N represents a noise signal, H₁The state indicates a state where the noise signal and the target speech signal are present, and S is the target speech signal.

The robust blocking matrix method for reducing voice leakage is further improved in that the voice signal is a microphone input signal, and the fixed beam module acquires a target voice signal from the input microphone input signal and outputs the target voice signal; output Y of the fixed beam module_FBFComprises the following steps:

where M is the number of microphones, x_iIs the ith microphone input signal, w is the weight of the fixed beam module, w is the_iIs the weight of the ith fixed beam.

The robust blocking matrix method for reducing voice leakage is further improved in that the weight w of the fixed beam module is obtained by calculation through a delay summation method or a minimum sidelobe method.

The robust blocking matrix method for reducing voice leakage is further improved in that the blocking matrix module eliminates a target voice signal from the input microphone input signal to obtain a noise signal and outputs the noise signal; the output Z of the blocking matrix module is:

z is B X (III)

Wherein Z is [ Z ]₁z₂…z_N]Is the output signal of the blocking matrix module; x ═ X₁x₂…x_M]Is the microphone input signal; b is the blocking matrix of the blocking matrix module.

The robust blocking matrix method for reducing voice leakage is further improved in that the output Y of the fixed beam module is utilized_FBFThe method for estimating the prior probability of the target speech signal in the noise signal Z by the conditional prior probability comprises the following steps:

estimating Y by controlling recursive average algorithm_FBFProbability P (H1| Y) of target speech signal being present_FBF) To determine the prior probability P (H) of the target speech signal in Z₁)：

P(H₁)_k＝λP(H₁)_k-1+(1-λ)P(H1|Y_FBF) (formula IV)

Wherein the content of the first and second substances,

H₁is the speech existence state, λ is the smoothing coefficient, k is the frame number;

then there is no prior probability P (H) of the target speech signal in Z₀) Is obtained from the following equation

P(H₀)＝1-P(H₁). (type six)

The robust blocking matrix method for reducing voice leakage is further improved in that the blocking matrix module updates the noise signal according to the prior probability, eliminates a target voice signal existing in the noise signal, and obtains an updated noise signal, and the method comprises the following steps:

the method comprises the following steps: solving for the conditional prior probability P (H1| Z) of the presence of the target speech signal in Z

a. Solving the posterior signal-to-noise ratio gamma

Wherein the content of the first and second substances,

is an estimate of the noise signal;

b. solving the prior signal-to-noise ratio epsilon by adopting a decision-guiding method

Wherein η is a smoothing coefficient with a value of 0.92, γ_oldIs the posterior signal-to-noise ratio, GH, of the previous frame₁Is H₁The voice gain in the state, MAX is a large function;

c. solving speech existence likelihood GLR

Wherein the content of the first and second substances,

d. solving conditional prior probability P (H1| BM)

Step two: modifying signal-to-noise ratio and updating speech gain

a. Using a priori probability P (H)₁) Correcting signal-to-noise ratio

Wherein the content of the first and second substances,

is the corrected a posteriori signal-to-noise ratio,

is the corrected prior signal-to-noise ratio;

b. updating speech gain GH₁，

Wherein the content of the first and second substances,

exp is an exponential operator, e is a natural constant, and x is an integral variable;

step three: estimating dynamic noise smoothing coefficients

Wherein α is 0.92;

step four: estimating noise

Where E is the desired operation, estimated using the following equation:

where k is the number of frames, ε represents the prior signal-to-noise ratio, P (H)₀|BM)＝1-P(H1|BM)；

Step five: calculating speech gain

Estimation of updated speech Gain by using optimal modified logarithmic magnitude spectrum estimation method

Wherein Gmin is the lower limit constraint of gain when speech does not exist, the value of Gmin is 0.01,

is at H₁The gain of the speech at the time of the state,

is at H₀Speech gain at state;

step six: calculating to obtain an updated noise signal Z'

Z ═ Z (1-Gain). (seventeen formula)

Drawings

Fig. 1 is a functional block diagram of a robust blocking matrix method for reducing voice leakage according to the present invention.

Detailed Description

The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention.

The main task of speech enhancement techniques is to suppress background noise and interference, thereby enhancing the robustness of subsequent processing to input noise. In the traditional single-channel speech enhancement algorithm, only one-channel analog signals are input, no reference signal is provided, and noise can be suppressed and speech can be enhanced only by utilizing the statistical characteristics of noise-containing speech signals in time domain and frequency domain. However, speech signals are often submerged in noise and interference in time domain and frequency domain, and are difficult to accurately separate from the noise and the interference, so that the space for improving the algorithm effect is relatively small. The introduction of microphone arrays opens a new idea for speech enhancement technology, which utilizes the difference of target speech and interference in spatial position and the correlation between the signals of the microphones to suppress background noise and interference in the incoming wave direction and separated from speech by using a beamforming algorithm, thereby enhancing speech, and has gradually become a hot point of research in the field of speech enhancement.

In the existing beamforming algorithm, an adaptive beamforming algorithm adopting a Generalized Sidelobe Cancellation (GSC) structure plays an important role.

The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.

Referring to fig. 1, fig. 1 is a functional block diagram of a robust blocking matrix method for reducing voice leakage according to the present invention, and is also a diagram of a generalized sidelobe canceling structure.

The generalized sidelobe canceling structure (GSC) is divided into two paths, an upper path and a lower path: a first passage 101 and a second passage 102, the first passage 101 and the second passage 102 being connected in parallel with each other, the first passage 101 being located in an upper passage and the second passage 102 being located in an upper passage in the figure. The generalized sidelobe canceling structure mainly includes three functional modules, namely a fixed beam module (FBF)11, a Blocking Matrix module (BM)12, and a cancellation module (multi-input filter MC) 13. Wherein the fixed beam module (FBF)11 is located on the first path 101 and the blocking matrix module (BM)12 and the cancellation Module (MC)13 are located on the second path 102. The input of the fixed beam module (FBF)11 is connected to the input of the blocking matrix module (BM)12, the output of the blocking matrix module (BM)12 is connected to the input of the cancellation Module (MC)13, the output of the cancellation Module (MC)13 is connected to the output of the fixed beam module (FBF)11, and "+/-" (and/or logical operations) are performed at the intersection node of the output of the cancellation Module (MC)13 and the output of the fixed beam module (FBF) 11.

The fixed beam module (FBF) is used for estimating a reference signal of the target voice, the FBF adopts a filter with a fixed coefficient to filter original channel signals, and adds the filtered channel signals, so that interference and noise of an incoming wave direction different from the target voice signal are suppressed, and the primary enhancement of the target voice signal is realized.

The block matrix module (BM) is configured to eliminate a target speech signal to obtain a noise signal, and the BM performs adaptive filtering on each channel original signal by using the FBF output as a reference signal to remove a target speech component therein, so as to obtain N paths of noise signals (N is the number of microphones), where an adaptive filter in the process may adopt a CCAF (coefficient-defined adaptive filter).

Finally, a removing Module (MC) is used to remove the residual noise in the fixed beam, and the MC further performs adaptive noise reduction processing on the FBF output by using the N paths of noise signals, and enhances the target speech signal again to obtain the final output, and the adaptive filter in the process can adopt NCAF (adaptive filter with limited range).

The invention provides a robust blocking matrix method for reducing voice leakage, aiming at solving the problem that in the current Generalized Sidelobe Canceling (GSC) method, a Blocking Matrix (BM) module does not completely block a target voice signal, so that the leaked target voice signal is cancelled out by subtracting the target voice signal in a fixed beam module (FBF) of the Blocking Matrix (BM) module.

The specific implementation method of the robust blocking matrix method for reducing the voice leakage comprises the following steps:

s001: providing a sound signal, wherein the sound signal is a speech signal containing noise;

s002: inputting the sound signal into a fixed beam module 11(FBF) and a blocking matrix module 12(BM) of a generalized side lobe cancellation structure, where the generalized side lobe cancellation structure has a first path 101 and a second path 102 connected in parallel, the fixed beam module 11 is located in the first path 101, and the blocking matrix module 12 is located in the second path 102; the second path 102 is further provided with a cancellation Module (MC)13, an input of the cancellation module 13 is connected to an output of the blocking matrix module 12, and an output of the cancellation module 13 is connected to an output of the fixed beam module 11;

s003: acquiring a target voice signal from the input voice signal by using the fixed beam module 11, and outputting the target voice signal;

s004: eliminating a target voice signal from the input voice signal by using a blocking matrix module 12 to obtain a noise signal;

s004: estimating the prior probability of the target speech signal existing in the noise signal by using a fixed beam module 11;

s005: the blocking matrix module 12 updates the noise signal according to the prior probability, eliminates the target voice signal existing in the noise signal, obtains the updated noise signal and outputs the updated noise signal;

s006: the noise signal output from the blocking matrix module 12 is removed from the target speech signal output from the fixed beam module 11 by the removal module 13, and an output signal is formed and output.

Taking a microphone input signal as an example of a sound signal, inputting the microphone input signal into a generalized sidelobe canceling structure, and performing voice enhancement on the input microphone input signal by using the robust blocking matrix method of the present invention, specifically as follows:

inputting a microphone input signal;

the speech two-state model of the microphone input signal is as follows:

H₀：X＝N

H₁: x is S + N (one type)

Secondly, the fixed beam module 11(FBF) obtains a target voice signal from the input microphone input signal and outputs the target voice signal;

output Y of fixed beam module (FBF)_FBFComprises the following steps:

where M is the number of microphones, x_iIs the ith microphone input signal, w is the weight of the fixed beam module, w is the_iIs the weight of the ith fixed beam; the weight w of the fixed beam module can be calculated by adopting a delay summation method or a minimum sidelobe method.

(III) eliminating a target voice signal from an input microphone input signal by a blocking matrix module (BM) to obtain a noise signal and outputting the noise signal;

the output Z of the blocking matrix module (BM) is:

z is B X (III)

Wherein Z is [ Z ]₁z₂…z_N]Is the output signal (noise signal) of the blocking matrix module; x ═ X₁x₂…x_M]Is the microphone input signal; b is a blocking matrix of the blocking matrix module, and the blocking matrix is obtained by a common difference method.

(IV) output Y using fixed beam module (FBF)_FBFEstimating the prior probability P (H) of the presence of the target speech signal in the output signal Z (noise signal) of the blocking matrix module (BM)₁) The method comprises the following steps:

P(H₁)_k＝λP(H₁)_k-1+(1-λ)P(H1|Y_FBF) (formula IV)

Wherein the content of the first and second substances,

H₁is the speech existence state, and is the smooth coefficient, and k is the frame number;

the control recursive averaging algorithm can be seen in "Israel Cohen Noise Spectrum Estimation in addition Environments: improved minimum Controlled regenerative operating "-IEEETRANSACTIONS SPEECH AND AUDIO PROCESSING, VOL.11, NO.5, SEPTEMBER 2003/Page 466-475. The operational principles governing the recursive averaging algorithm are described in detail in the article.

At this time, the prior probability P (H) of the absence of the target speech signal in the output signal Z (noise signal) of the block matrix module (BM)₀) Is obtained from the following equation

P(H₀)＝1-P(H₁). (type six)

(V) the block matrix module (BM) estimates the prior probability P (H) according to the fixed beam module (FBF)₁) Updating the noise signal output by the blocking matrix module (BM) to eliminate the target speech signal still existing in the noise signal to obtain an updated noise signal, wherein the specific process is as follows:

a. Solving the posterior signal-to-noise ratio gamma

Wherein the content of the first and second substances,

is an estimate of the noise signal;

c. solving speech existence likelihood GLR

Wherein the content of the first and second substances,

exp is an index transport.

d. Solving conditional prior probability P (H1| BM)

Step two: modifying signal-to-noise ratio and updating speech gain

a. Using a priori probability P (H)₁) Correcting signal-to-noise ratio

Wherein the content of the first and second substances,

is the corrected a posteriori signal-to-noise ratio,

is the corrected prior signal-to-noise ratio;

b. updating speech gain GH₁，

Wherein the content of the first and second substances,

step three: estimating dynamic noise smoothing coefficients

Wherein α is 0.92;

step four: estimating noise

Where E is the desired operation, estimated using the following equation:

Step five: calculating speech gain

Estimating the updated speech Gain by using an optimally modified log-amplitude spectrum estimation (OM-LSA) method

Wherein Gmin is a gain lower limit constraint when no voice exists, and the value of Gmin is 0.01(-20dB), -20dB (10 × log10(0.01)) dB, and dB is a unit of decibel;

is at H₁The gain of the speech at the time of the state,

is at H₀Speech gain during state, but to prevent excessive attenuation, GH is usually applied₀Change to Gmin as H₀Lower gain bound of time

OM-LSA (logarithmic magnitude spectrum estimation with optimal modification of Log Spectral Amplitude) method can be referred to as "Irael Cohen, Baruch Berdgugospeech enhancement for non-stationary noise enhancement" -J.A cosmot. c Am 87(2) February 1990, academic Society of America/Page 820-857. The implementation principle of the OM-LSA method is described in detail in the article.

Step six: calculating to obtain an updated noise signal Z'

Z ═ Z (1-Gain). (seventeen formula)

By adopting the method, the blocking matrix module updates the noise signal according to the prior probability, eliminates the target voice signal existing in the noise signal and finally outputs the updated noise signal.

And (VI) eliminating the noise signal output by the blocking matrix module from the target voice signal output by the fixed beam module by utilizing the elimination module to form an output signal and output the output signal.

The robust blocking matrix method for reducing the voice leakage updates the blocking matrix parameters of the blocking matrix module by carrying out probability prior of the existence of the target voice signal on the noise signal output by the blocking matrix module before the target voice signal output by the fixed beam module and the noise signal output by the blocking matrix module are cancelled by the cancellation module to eliminate the residual noise signal in the target voice signal, so as to eliminate the target voice signal missed in the noise signal, enhance the function of the blocking matrix module for eliminating the target voice signal, avoid the situation that the target voice signal is subtracted from the target voice signal in the fixed beam module to cancel the leaked target voice signal due to the fact that the blocking matrix module does not completely block the target voice signal, and achieve the purpose of greatly reducing the voice leakage.

It should be noted that the structures, ratios, sizes, and the like shown in the drawings attached to the present specification are only used for matching the disclosure of the present specification, so as to be understood and read by those skilled in the art, and are not used to limit the conditions of the present invention, so that the present invention has no technical essence, and any structural modification, ratio relationship change, or size adjustment should still fall within the scope of the present invention without affecting the efficacy and the achievable purpose of the present invention. In addition, the terms "upper", "lower", "left", "right", "middle" and "one" used in the present specification are for clarity of description, and are not intended to limit the scope of the present invention, and the relative relationship between the terms and the terms is not to be construed as a scope of the present invention.

Although the present invention has been described with reference to a preferred embodiment, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A robust blocking matrix method for reducing voice leakage includes the following steps:

providing a sound signal;

eliminating the noise signal output by the blocking matrix module from the target voice signal output by the fixed beam module by using the eliminating module to form an output signal and outputting the output signal;

the speech two-state model of the sound signal is:

H₀:X＝N

H₁x is S + N (one type)

Wherein H₀The state represents a state in which only noise is present, N represents a noise signal, H₁The state indicates a state where a noise signal and a target speech signal are present, S is the target speech signal, and X ═ X₁x₂…x_M]Is the microphone input signal, M is the number of microphones;

the fixed beam module acquires a target voice signal from the input microphone input signal and outputs the target voice signal; output Y of the fixed beam module_FBFComprises the following steps:

wherein x is_iIs the ith microphone input signal, w is the weight of the fixed beam module, w is the_iIs the weight of the ith fixed beam;

the blocking matrix module eliminates a target voice signal from the input microphone input signal to obtain a noise signal and outputs the noise signal; the output Z of the blocking matrix module is:

z is B X (III)

Wherein Z is [ Z ]₁z₂…z_n]Is the output signal of the blocking matrix module; b is a blocking matrix of the blocking matrix module;

using the output Y of the fixed beam module_FBFThe method for estimating the prior probability of the target speech signal in the noise signal Z by the conditional prior probability comprises the following steps:

estimating Y by controlling recursive average algorithm_FBFIn which the target language existsProbability P (H1| Y) of a tone signal_FBF) To determine the prior probability P (H) of the target speech signal in Z₁)：

P(H₁)_k＝λP(H₁)_k-1+(1-λ)P(H1|Y_FBF) (formula IV)

Wherein the content of the first and second substances,

P(H₀)＝1-P(H₁) (formula six);

the process that the blocking matrix module updates the noise signal according to the prior probability, eliminates a target voice signal existing in the noise signal and obtains an updated noise signal comprises the following steps:

a. Solving the posterior signal-to-noise ratio gamma

Wherein the content of the first and second substances,

is an estimate of the noise signal;

c. solving speech existence likelihood GLR

Wherein the content of the first and second substances,

d. solving conditional prior probability P (H1| BM)

Step two: modifying signal-to-noise ratio and updating speech gain

a. Using a priori probability P (H)₁) Correcting signal-to-noise ratio

Wherein the content of the first and second substances,

is the corrected a posteriori signal-to-noise ratio,

is the corrected prior signal-to-noise ratio;

b. updating speech gain GH₁，

Wherein the content of the first and second substances,

step three: estimating dynamic noise smoothing coefficients

Wherein α is 0.92;

step four: estimating noise

Where E is the desired operation, estimated using the following equation:

Step five: calculating speech gain

is at H₁The gain of the speech at the time of the state,

is at H₀Speech gain at state;

step six: calculating to obtain an updated noise signal Z'

Z ═ Z (1-Gain) (formula seventeen).

2. A robust blocking matrix method for reducing speech leakage according to claim 1, characterized by: and calculating the weight w of the fixed beam module by adopting a delay summation method or a minimum sidelobe method.