CN117636894A - Voice dereverberation method based on multi-channel blind identification and multi-channel equalization - Google Patents

Voice dereverberation method based on multi-channel blind identification and multi-channel equalization Download PDF

Info

Publication number
CN117636894A
CN117636894A CN202311670403.2A CN202311670403A CN117636894A CN 117636894 A CN117636894 A CN 117636894A CN 202311670403 A CN202311670403 A CN 202311670403A CN 117636894 A CN117636894 A CN 117636894A
Authority
CN
China
Prior art keywords
vector
matrix
channel
calculating
filter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311670403.2A
Other languages
Chinese (zh)
Inventor
何宏森
邱志民
陈景东
喻翌
李小霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern Polytechnical University
Southwest University of Science and Technology
Original Assignee
Northwestern Polytechnical University
Southwest University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University, Southwest University of Science and Technology filed Critical Northwestern Polytechnical University
Priority to CN202311670403.2A priority Critical patent/CN117636894A/en
Publication of CN117636894A publication Critical patent/CN117636894A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Filters That Use Time-Delay Elements (AREA)

Abstract

The invention discloses a voice dereverberation method based on multi-channel blind identification and multi-channel equalization, which is used for solving the problem of multi-channel blind identification, designing a variable regularization function based on a normalized multi-channel frequency domain minimum mean square algorithm (NMCFLMS), and integrating signal to noise ratio, output signal energy and filter length information into the variable regularization function so that the algorithm has robustness to additive noise and non-stationarity of voice. In addition, in order to make the proposed method have better tracking performance under time-varying conditions, a mechanism for refreshing regularized parameters according to mean square error is proposed. In this way, a faster convergence speed and tracking speed can be obtained in a noise environment, a better dereverberation effect can be obtained by the dereverberation of the voice based on channel equalization, and particularly, the dereverberation performance is remarkably improved during the transient state of the adaptive filter in a low signal-to-noise ratio environment.

Description

Voice dereverberation method based on multi-channel blind identification and multi-channel equalization
Technical Field
The invention belongs to the technical field of voice dereverberation, and particularly relates to a voice dereverberation method based on multi-channel blind identification and multi-channel equalization.
Background
Blind recognition is a method of estimating impulse response of a system by using only system output signals, and plays an important role in speech processing technologies such as speech noise reduction, beam forming, speech dereverberation, sound source localization and the like. In recent years, scholars have conducted extensive research on batch processing algorithms and adaptive algorithms for this problem. Among these algorithms, the normalized multi-channel frequency domain least mean square algorithm (NMCFLMS) is implemented in the frequency domain using a Fast Fourier Transform (FFT), and has a characteristic of high computational efficiency, so that the algorithm has special attraction in a real-time processing system. Meanwhile, in order to accelerate the convergence speed of the adaptive filter and reduce the gradient noise amplification problem caused by large channel output amplitude, the algorithm uses a Newton iteration method. Newton's iteration is known as an optimization method, where regularization of the Hessian matrix is crucial. However, the NMCFLMS algorithm constructs the regularization factor only by using the first block signal output by the system, so that the regularization factors of the algorithm under different speech segments and different signal-to-noise environments are greatly different, and it is difficult to obtain a proper regularization parameter.
A number of solutions have been proposed by the scholars to the regularization problem of the Hessian matrix, the most classical approach being to bias the search direction of newton's method towards the steepest descent direction. This strategy can be implemented by adding a suitable number matrix to the Hessian matrix. In the self-adaptive filtering algorithm, the regularization parameter is introduced, so that the numerical stability can be ensured, and the convergence performance of the filter can be enhanced. In recent years, a large number of regularization methods have been proposed, and among these methods, a constant regularization method, that is, regularization parameters are not updated in the iterative process of the filter, and the optimal regularization parameters are determined by using the related information of the excitation signal and the information of the signal-to-noise ratio, so that the adaptive filter has better robustness in a noise environment, but in practice, the constant regularization method makes a compromise between the convergence speed and steady-state error of the filter. The other type is a variable regularization method, namely regularization parameters utilize relevant data information in real time in the filter iteration process, wherein the information comprises error signals, estimated noise, system input and the like, and the method has higher convergence speed than a constant regularization method due to the instantaneity of the variable regularization method. However, for blind system identification, the existing methods cannot be directly applied to the blind identification problem of the time-varying system due to the unavailability of the input signal and the time-varying nature of the system.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, provide a voice dereverberation method based on multi-channel blind identification and multi-channel equalization, design a variable regularization function, integrate signal to noise ratio, output signal energy and filter length information into the voice dereverberation method, enable the voice dereverberation method to have robustness to additive noise and non-stationarity of voice, have better tracking performance under time-varying conditions, enable the voice dereverberation method based on channel equalization to obtain faster convergence speed and tracking speed under noise environment, and enable the voice dereverberation method based on channel equalization to obtain better dereverberation effect, and particularly enable the dereverberation performance to be remarkably improved during transient state of a self-adaptive filter under low signal to noise ratio environment.
To achieve the above object, the present invention provides a speech dereverberation method based on multi-channel blind recognition and multi-channel equalization, which is characterized by comprising the steps of:
(1) Initializing
Initializing a length-L filter vector at time 0 of the kth channelThe method comprises the following steps:
wherein T represents transposition, M is the number of channels, and L is the length of the impulse response vector of the acoustic channel;
initializing a power spectrum matrix P at time 0 of the kth channel k (0) The method comprises the following steps:
P k (0)=0 L×L
initializing reverberant signal varianceThe method comprises the following steps:
(2) Collecting M microphone sound signals, wherein the sampling value of the ith channel is expressed as x i (n), i=1, 2, …, M, n=0, 1,2, …, n is the sampling point time index, and the acoustic signal vector x corresponding to the block time index M is constructed i (M), i=1, 2,., M is the output signal vector of the i-th microphone:
x i (m)=[x i (mL-L)x i (mL-L+1)…x i (mL+L-1)] T
(3) Starting from the block time index m=1, the dereverberated speech vector is obtained
3.1 Obtaining a variable regularization function delta (m)
3.1.1 Calculating microphone output signal variance)
Wherein I 2 Representing the 2-norm of the vector,is a smoothing factor;
3.1.2 Calculating reverberant signal variance)
Wherein lambda is 1 As forgetting factor, SNR is output signal-to-noise ratio;
3.1.3 Calculating the bias factor b (m) of the regularization function:
wherein v represents a coefficient factor;
3.1.4 Calculating a variable regularization function δ (m):
wherein alpha is a parameter of the whole range of the control function curve, K is a parameter of the steepness of the control curve, the steeper the curve abrupt change is, xi is a parameter of the lifting point position of the control curve, and the initial value is xi 0
3.2 Obtaining a filter vector)
3.2.1 Calculating a frequency domain expansion filter vector)
Wherein F is 2L×2L Is a fourier matrix of size 2l×2l, 0 is a zero column vector of length L;
3.2.2 A spectral matrix for calculating an input signal
Wherein diag [. Cndot. ] represents expanding the vector into a diagonal array;
3.2.3 A) calculating a power spectrum matrix P k (m):
Wherein lambda is 2 As forgetting factor, superscript indicates conjugate matrix;
3.2.4 Calculating inverse matrix of power spectrum
Wherein I is 2L×2L Is an identity matrix with the size of 2L multiplied by 2L;
3.2.5 Calculating frequency domain error vectors for the ith and j th channels
Wherein,F L×L is a Fourier matrix with the size of L multiplied by L, 0 L×L Is a zero matrix of size L×L, I L×L Is an identity matrix of size L x L, ">Is an inverse fourier matrix of size 2l×2l;
3.2.6 Calculating a frequency domain expansion filter vector)
Wherein ρ is the step factor of the adaptive filter, and the error vector is expanded in the frequency domain Is a Fourier matrix of size 2L×2L,/A>Is an inverse fourier matrix of size l×l;
3.2.7 Calculating a filter vector)
Wherein, subscript 1:L represents taking the first L values;
3.3 Obtaining a dereverberated speech vector)
3.3.1 Constructing an impulse response matrix)
Wherein:impulse responseMatrix->Is of the size L c ×L g Matrix of L c =L+L g -1,L g Is the equalization filter vector length;
3.3.2 Constructing a multi-channel impulse response matrix)
3.3.3 Calculating an equalization filter matrix g (m)
Wherein the method comprises the steps ofAn equalization filter vector for the kth channel, d being a desired equalization impulse response vector;
3.3.4 Calculating a dereverberated speech vector)
Wherein conv (·) represents a convolution function;
3.3.5 Detecting a mean square error MSE (m)
First, the mean square error MSE (m) is calculated:
wherein, the superscript H indicates a conjugate transpose, and then updating judgment:
if MSE (m) is greater than threshold upper limit gamma, then variable is initiatedRefreshing regularized function delta (m), let parameter ζ=m+ζ 0 Wherein, xi 0 Is an initial value, if MSE (m) is not greater than threshold upper limit gamma, parameter xi is maintained unchanged;
m=m+1, returning to step 3.1.1 for reverberant speech vector for next block timeIs calculated by the computer.
The invention aims at realizing the following steps:
the invention provides a voice dereverberation method based on multi-channel blind identification and multi-channel equalization, which aims at solving the problem of multi-channel blind identification, and designs a variable regularization function based on a normalized multi-channel frequency domain minimum mean square algorithm (NMCFLMS), wherein signal to noise ratio, output signal energy and filter length information are integrated, so that the algorithm has robustness to additive noise and non-stationarity of voice. In addition, in order to make the proposed method have better tracking performance under time-varying conditions, a mechanism for refreshing regularized parameters according to mean square error is proposed. In this way, a faster convergence speed and tracking speed can be obtained in a noise environment, a better dereverberation effect can be obtained by the dereverberation of the voice based on channel equalization, and particularly, the dereverberation performance is remarkably improved during the transient state of the adaptive filter in a low signal-to-noise ratio environment.
Drawings
FIG. 1 is a schematic diagram of speech dereverberation based on multi-channel equalization;
FIG. 2 is a schematic diagram of a variable regularization function;
FIG. 3 is a flow chart of a speech dereverberation method based on multi-channel blind recognition and multi-channel equalization in accordance with the present invention;
FIG. 4 is a graph of two sets of acoustic impulse responses measured in a real room, where (a) is the sound source location corresponding to the first set of impulse responses and (b) is the sound source location corresponding to the second set of impulse responses;
FIG. 5 is a plot of convergence performance of six acoustic channels with a white Gaussian sequence as excitation in a white Gaussian noise environment, unchanged at blind recognition by the proposed VR-NMCFLMS algorithm;
FIG. 6 is a graph comparing convergence performance of six acoustic channels for a blind recognition of NMCFLMS and the proposed VR-NMCFLMS algorithm when speech is used as an excitation signal in a white Gaussian noise environment;
FIG. 7 is a graph comparing NPM values after 10000 iterations of NMCFLMS and VR-NMCFLMS algorithm provided by the present invention under different SNR blind recognition of six acoustic channels with white Gaussian additive noise and speech as excitation condition
FIG. 8 is a graph comparing NPM values after 2000 iterations of NMCFLMS and VR-NMCFLMS algorithm of the present invention for blind identification of six acoustic channels at different SNR with white Gaussian additive noise and speech as excitation conditions;
FIG. 9 is a plot of MSE comparisons for six acoustic channels that vary with time for NMCFLMS and the VR-NMCFLMS algorithm of the present invention blindly identified with white Gaussian additive noise and speech as excitation conditions;
FIG. 10 is a graph comparing convergence performance for six acoustic channels blindly identified by NMCFLMS and the VR-NMCFLMS algorithm of the present invention under white Gaussian additive noise and speech as excitation conditions;
FIG. 11 is another comparison of convergence performance for a six acoustic channel blind identified by NMCFLMS and the VR-NMCFLMS algorithm of the present invention with white Gaussian additive noise and speech as excitation;
fig. 12 is dynamic performance metrics Δcd and Δstoi for use of the NMCFLMS and the proposed VR-NMCFLMS algorithm for speech dereverberation under time-invariant and snr=25 dB conditions, where (a) is dynamic performance metric Δcd and (b) is dynamic performance metric Δstoi;
fig. 13 is dynamic performance metrics Δfwsnr and Δllr for use of the NMCFLMS and the proposed VR-NMCFLMS algorithm in speech dereverberation, with time-invariant and snr=25 dB, where (a) is dynamic performance metric Δfwsnr and (b) is dynamic performance metric Δllr;
fig. 14 is dynamic performance metrics Δcd and Δstoi for use of the NMCFLMS and the proposed VR-NMCFLMS algorithm for speech dereverberation under time-invariant and snr=15 dB conditions, where (a) is dynamic performance metric Δcd and (b) is dynamic performance metric Δstoi;
fig. 15 is dynamic performance metrics Δfwsnr and Δllr for use of the NMCFLMS and the proposed VR-NMCFLMS algorithm in speech dereverberation under time-invariant and snr=15 dB conditions, where (a) is dynamic performance metric Δfwsnr and (b) is dynamic performance metric Δllr;
fig. 16 is dynamic performance metrics Δcd and Δstoi for use of the NMCFLMS and the proposed VR-NMCFLMS algorithm for speech dereverberation under time varying and snr=25 dB, where (a) is dynamic performance metric Δcd and (b) is dynamic performance metric Δstoi;
fig. 17 is dynamic performance metrics Δfwsnr and Δllr for use of the NMCFLMS and the proposed VR-NMCFLMS algorithm in speech dereverberation under time varying and snr=25 dB, where (a) is dynamic performance metric Δfwsnr and (b) is dynamic performance metric Δllr;
fig. 18 is dynamic performance metrics Δcd and Δstoi when NMCFLMS and the proposed VR-NMCFLMS algorithm are used for speech dereverberation under time varying and snr=15 dB, where (a) is dynamic performance metric Δcd and (b) is dynamic performance metric Δstoi;
fig. 19 is dynamic performance metrics Δfwsnr and Δllr for use of the NMCFLMS and the proposed VR-NMCFLMS algorithm in speech dereverberation under time varying and snr=15 dB, where (a) is dynamic performance metric Δfwsnr and (b) is dynamic performance metric Δllr;
Detailed Description
The following description of the embodiments of the invention is presented in conjunction with the accompanying drawings to provide a better understanding of the invention to those skilled in the art. It is to be expressly noted that in the description below, detailed descriptions of known functions and designs are omitted here as perhaps obscuring the present invention.
1. Speech dereverberation based on multichannel equalization
Assuming that a Single Input Multiple Output (SIMO) acoustic system is composed of one sound source and M microphones, as shown in fig. 1, the i (i=1, 2., M) microphone-only signals can be expressed as:
x i =s(n)*h ii (n)=y i (n)+υ i (n) (1)
where s (n) is the excitation speech signal, representing linearityConvolving, h i Is the impulse response between the sound source and the i-th microphone, which is typically modeled by a Finite Impulse Response (FIR) filter, y i (n) is reverberant voice, v i (n) is additive noise picked up by the ith microphone.
Assuming that there is no additive noise per acoustic channel, the output signal of the multi-channel equalization system can be expressed as:
in the method, in the process of the invention,is the speech after dereverberation g i Is an equalization filter, c is the equalization impulse response between the sound source and the system output.
To write (2) in vector/matrix form, a length L of channel i is defined g Is (are) equalization filter vectorsL and c ×L g dimensional impulse response matrix H i :
Wherein: l is the length of the acoustic channel impulse response vector, the output signal of the multi-channel equalizer can be expressed as:
wherein:
s(n)=[s(n) s(n-1) … s(n-Lc+1)]T, (5)
H=[H 1 H 2 … H M ], (7)
L c =L+L g -1, (9)
and:
c=Hg. (10)
as can be seen from (4), if the equalization impulse response vector c is a unit impulse response, the speech after the reverberation is estimated. To obtain the dereverberated speech, the equalization filter vector g needs to be estimated. According to the inverse multi-channel input-output theorem (MINT), the equalization filter g satisfies the equation:
in the middle ofIs an estimate of the multi-channel impulse response matrix H, usually obtained by estimation, d is the desired equalized impulse response vector, and is usually defined as:
τ is a delay parameter. If the additive noise is not equal to zero or the multi-channel impulse response matrix is not estimated accurately, the following least squares solution can be obtained according to (11):
in the method, in the process of the invention,is->If->Rank of full line/>
As can be seen from fig. 1, once the equalization filter g is estimated, the following dereverberated speech is obtained based on the MINT principle:
where conv (·) represents the convolution function, x i (n)=[x i (n) x i (n-1) … x i (n-L+1)] T Is the output signal vector of the i-th microphone.
2. NMCFLMS algorithm-based multi-channel acoustic system blind identification
According to the multi-channel equalizer principle, the impulse response of each acoustic channel needs to be estimated in order to achieve the purpose of speech dereverberation.
According to the signal model (1), there is the following relationship without taking noise into account:
x i (n)*h j =s(n)*h i *h j =x j (n)*h i ,i,j=1,2,...,M,i≠j. (15)
the method can be written in vector form:
h in i =[h i,0 h i,1 … h i,L-1 ] T I=1, 2,..m, is the i-th channel length L impulse response vector.
When noise is present or the estimated impulse response deviates from the true impulse response, (16) will not be equal to zero, whereby the a priori error signal between the i and j channels can be defined:
in the middle ofIs h i An estimate at time n.
According to the NMCFLMS algorithm, its cost function is defined as the sum of squares of instantaneous errors between different channels, namely:
where m is the block index.
e ij (m)=[e ij (mL) e ij (mL+1) … e ij (mL+L-1)] T , (20)
x i (n)=[x i (mL-L) x i (mL-L+1) … x i (mL+L-1)] T , (23)
0 L×L Is a zero matrix of size L×L, I L×L Is a unit array with the size of L multiplied by L, diag [. Cndot.]Representing the vector being spread into a diagonal matrix. According to the newton iterative method, a filter update equation of the NMCFLMS algorithm can be deduced as:
where ρ is a step size factor.
0 1×L Is a zero line vector of length L.
In order to estimate a more stable power spectrum matrix in practical applications, it can be expressed in a recursive form as follows:
where λ is a forgetting factor, and is typically λ= [ 1-1/(3L)] L . To prevent numerical instability caused by irreversible power spectrum matrix, a regularization factor is usually added when the power spectrum matrix is inverted, so that the filter update equation of the NMCFLMS algorithm can be modified as follows:
where δ is a regularization factor, which is typically set to one fifth of the sum of the power of the first block signal of all channels, i.e.:
as can be seen from (33), the regularization method of the NMCFLMS algorithm constructs the regularization factor only by using the power information of the first block signal output by the system, and this way will cause the regularization factors of the algorithm to be very different in different speech segments and different signal-to-noise environments, so it is difficult to set a suitable regularization parameter. In order to solve the problem, the invention designs a variable regularization function, which integrates the information such as signal-to-noise ratio, output signal energy, filter length and the like, so that the algorithm has robustness to additive noise and non-stationarity of voice. In order to make the proposed method have better tracking performance under time-varying conditions, we propose a mechanism to refresh regularized parameters according to mean square error.
3. Proposed variable regularization method
As can be seen from section 2, the optimization method used by the original NMCFLMS algorithm is newton's iteration method. In general, the filter vector of newton's iterative method satisfies a system of linear equations:
G(m)p Newton (m)=-θ(m), (34)
where G (m) is a Hessian matrix, θ (m) is a gradient vector, p Newton (m) =w (m+1) -w (m) is a filter search direction vector corresponding to newton's method, which can be obtained by solving the linear equation set. In order to solve the problem that the non-positive nature of the Hessian matrix G (m) or the pathological characteristic of the Hessian matrix G (m) (the characteristic value diffusion degree is very large) reduces the convergence performance and the numerical instability of the filter, an appropriate number matrix can be added to the Hessian matrix G (m) to realize the implementation, and the corresponding linear equation set is as follows:
[G(m)+δI]p(m)=-θ(m), (35)
where δ is a regularization factor and p (m) is a regularized filter search direction vector. It can be seen that by introducing the regularization factor delta, the filter searches for the direction vector p Newton And (n) is corrected to p (m), and the vector direction is between the filter searching direction corresponding to the Newton method and the negative gradient direction-theta (m) of the steepest descent method. We further found that:
1) When delta is smaller, the regularization filter searches the direction vector p (m) to the Newton method Newton (m) offset, where the convergence speed of the filter is fast, but the risk of filter divergence is greater under low signal-to-noise conditions.
2) When δ is large, the regularized filter search direction vector p (m) is shifted to the negative gradient direction- θ (m) of the steepest descent method, and the convergence speed of the filter is slow but the filter is more stable.
According to the principle, for NMCFLMS algorithm, only in the condition of filter convergence, when instantaneous estimation error is large or filter vector is largeWhen the difference from the real impulse response vector h is large, the pulse response vector h will be positiveThe transformation factor delta is set to be a smaller number, so that the rapid convergence of the Newton method can be effectively utilized; conversely, when the instantaneous estimation error is small or the filter vector +.>And the true impulse response vector->In the approach, δ is set to a large number, and the stability of the gradient method can be effectively utilized. For this, we devised a variable regularization function as follows:
where w is a weight coefficient, the overall range of the function curve can be controlled, the steepness of the kappa control curve is steeper at a curve abrupt change when the value is larger, the position of the lifting point of the zeta control curve is steeper, b is a bias factor of the function, and the initial convergence speed of the filter is related to the factor. The function curves when w=90, κ=0.01, ζ=1500, and b=10 are shown in fig. 2.
As can be seen from fig. 2, if the rising and falling points of the function curve can be reasonably controlled, a smaller δ (m) can be obtained at the initial stage of the filter update, thereby resulting in a faster convergence speed, and the δ (m) becomes larger when the filter is in steady state, and the filter can stably operate. However, the regularization function does not contain information such as signal-to-noise ratio, output signal energy, and filter length, i.e., the function cannot accommodate variations in these parameters. In order to solve the problem, a robust regularization method is provided, so that the adaptive filter can effectively and blindly identify the impulse response of the multichannel acoustic system under different signal-to-noise ratios, and the aim of suppressing the voice reverberation is fulfilled.
Aiming at the self-adaptive system identification problem, benesty designs an effective regularization parameter, which is defined as follows:
in the method, in the process of the invention,representing the variance of the input signal>Is the output signal to noise ratio, +.>Is->Variance of->Is the variance of the additive noise. In practical application, the signal-to-noise ratio can be obtained through estimation, however, as the problem studied by the invention is multi-channel blind identification, the input signal is not available, and thus +.>For this purpose, the invention makes use of->For->And performing approximation. Variance->Can be obtained by the following means:
to obtain a more stableThe estimation can be performed using the following recurrence relation:
wherein eta is one ofThe forgetting factor is used to determine, I.I. | 2 Representing the 2-norm of the vector,is a smoothing factor. The bias factor of the regularization function may then be defined as follows:
the bias factor mainly affects the initial convergence speed of the algorithm, the smaller the value, the faster the convergence. Introducing coefficient factorsvIn order to match the frequency domain adaptation algorithm. We further set the weight coefficient w of the regularization function to:
w(m)=αb(m), (42)
where α is a constant. Therefore, the regularization function is designed as follows:
according to the regularization function designed by the invention, when the adaptive filter tends to be stable, the regularization parameter of the adaptive filter is large, and the filter searching direction is biased to the negative gradient direction. If the acoustic system transfer function changes at this time, the larger regularization parameter slows down the convergence speed of the adaptive filter, and the time-varying system cannot be effectively tracked. Therefore, correction of regularization parameters is needed when the system is suddenly changed, so that the tracking capability of the algorithm on the time-varying system is improved. To this end, it may be determined whether to adjust the regularization factor by detecting an instantaneous value of a Mean Square Error (MSE). If the impulse response of the acoustic channel changes, the MSE is detected to be suddenly changed, the regularization function is refreshed, otherwise, the regularization function is unchanged, and the specific operation is as follows:
where γ represents the upper threshold of MSE, which may be set as a parameter related to microphone signal powerA number. Let only ζ=m+ζ when refresh is initiated 0 ,ξ 0 Is an initial value set.
4. Voice dereverberation method based on multi-channel blind identification and multi-channel equalization
In this embodiment, as shown in fig. 3, the speech dereverberation method based on multi-channel blind recognition and multi-channel equalization of the present invention comprises the following steps:
step S1: initialization of
Initializing a length-L filter vector at time 0 of the kth channelThe method comprises the following steps:
where T represents the transpose, M is the number of channels, and L is the length of the acoustic channel impulse response vector.
Initializing a power spectrum matrix P at time 0 of the kth channel k (0) The method comprises the following steps:
P k (0)=0 L×L
initializing reverberant signal varianceThe method comprises the following steps:
step S2: constructing an output signal vector x of a microphone i (m)
Collecting M microphone sound signals, wherein the sampling value of the ith channel is expressed as x i (n), i=1, 2, …, M, n=0, 1,2, …, n is the sampling point time index, and the acoustic signal vector x corresponding to the block time index M is constructed i (M), i=1, 2,., M is the output signal vector of the i-th microphone:
x i (m)=[x i (mL-L) x i (mL*L+1) … x i (mL+L*1)] T
step S3: starting from the block time index m=1, a dereverberated speech vector is obtained
Step S3.1: obtaining a variable regularization function delta (m)
Step S3.1.1: calculating microphone output signal variance
Wherein I 2 Representing the 2-norm of the vector,is a smoothing factor.
Step S3.1.2: calculating reverberant signal variance/>
Wherein lambda is 1 As a forgetting factor, SNR is the output signal-to-noise ratio.
Step S3.1.3: calculating a bias factor b (m) of the regularization function:
where v denotes a coefficient factor.
Step S3.1.4: calculating a variable regularization function delta (m):
wherein alpha is a parameter of the overall range of the control function curveK is a parameter for controlling the steepness of a curve, the steeper the curve abrupt change is, and xi is a parameter for controlling the position of the lifting point of the curve, and the initial value is xi 0
Step S3.2: obtaining a filter vector h k (m)
Step S3.2.1: calculating a frequency domain expansion filter vector
Wherein F is 2L×2L Is a fourier matrix of size 2L x 2L, 0 is a zero column vector of length L.
Step S3.2.2: calculating a spectral matrix of an input signal
Wherein diag [ ] represents the expansion of vectors into a diagonal array.
Step S3.2.3: calculating a power spectrum matrix P k (m):
Wherein lambda is 2 As forgetting factor, superscript denotes conjugate matrix.
Step S3.2.4: calculating the inverse matrix of the power spectrum
Wherein I is 2L×2L Is an identity matrix of size 2L by 2L.
Step S3.2.5: calculating frequency domain error vector of ith and j channels
Wherein,F L×L is a Fourier matrix with the size of L multiplied by L, 0 L×L Is a zero matrix of size L×L, I L×L Is an identity matrix of size L x L, ">Is an inverse fourier matrix of size 2l×2l.
Step S3.2.6: calculating a frequency domain expansion filter vector/>
Wherein ρ is the step factor of the adaptive filter, and the error vector is expanded in the frequency domain F 2L×2L Is a Fourier matrix of size 2L×2L,/A>Is an inverse fourier matrix of size l×l.
Step S3.2.7: calculating a filter vector
Where the subscript 1:L indicates that the first L values are taken.
Step S3.3: obtaining dereverberated speech vectors
Step S3.3.1: construction of impulse response matrix
Wherein:impulse response matrix->Is of the size L c ×L g Matrix of L c =L+L g -1,L g Is the equalization filter vector length.
Step S3.3.2: constructing a multi-channel impulse response matrix
Step S3.3.3: computing an equalization filter matrix g (m)
Wherein the method comprises the steps ofEqualization filter vector for the kth channel, d is the desired equalization impulse responseVector.
Step S3.3.4: computing a dereverberated speech vector
Where conv (·) represents the convolution function.
Step S3.3.5: detection of mean square error MSE (m)
First, the mean square error MSE (m) is calculated:
wherein, the superscript H indicates a conjugate transpose, and then updating judgment:
if MSE (m) is greater than threshold upper limit gamma, then starting to refresh variable regularization function delta (m) to make parameter xi=m+xi 0 Wherein, xi 0 Is an initial value, if MSE (m) is not greater than threshold upper limit gamma, parameter xi is maintained unchanged;
m=m+1, returning to step S3.1.1: reverberant speech vector for next block timeIs calculated by the computer.
4. Experiment
4.1, experimental Environment
To verify the effectiveness of the proposed algorithm, we use two sets of impulse responses acquired in a real room acoustic environment, as shown in fig. 4. The room size was 6.7mX6.1mX2.9 m and the reverberation time was 0.28ms. The acoustic signals were picked up using a linear array of 6 omni-directional microphones located at (2.537,0.5,1.4), (2.737,0.5,1.4), (2.937,0.5,1.4), (3.137,0.5,1.4), (3.337,0.5,1.4) and (3.537,0.5,1.4), respectively. The first set of impulse responses corresponds to a sound source position (0.337,3.938,1.6) and the second set of impulse responses corresponds to a sound source position (1.337,3.938,1.6). The sampling rate of the original impulse response 48kHz is reduced to 8kHz and truncated to 1024 sampling points. A first set of impulse response simulated time-invariant multi-channel acoustic systems is utilized, and another set of impulse response simulated time-variant multi-channel acoustic systems is added. A piece of female voice with a sampling rate of 8kHz in the librispech dataset was used as excitation signal.
4.2 experimental results
4.2.1 Blind identification experiment results of acoustic channel
In order to evaluate the performance of the acoustic channel blind recognition algorithm, a Normalized Projection Misalignment (NPM) was used as an evaluation index for the algorithm performance, which was defined as follows:
the smaller the NPM value, the closer the modeling filter is to the real impulse response.
FIG. 5 compares the convergence performance of NMCFLMS and the proposed VR-NMCFLMS algorithm for blind identification of six acoustic channels, where L=1024, μ, with a white Gaussian sequence as the excitation signal in a white Gaussian noise environment f =0.05,SNR=25dB。
In the proposed VR-NMCFLMS algorithm, the regularization parameters are set to:
wherein:
whereas the regularization parameters of the original NMCFLMS algorithm are:
/>
the rest of the parameters are set identically, i.e. mu f =0.05,λ=[1-1/(3L)] L
It can be seen that the VR-NMCFLMS algorithm has a faster convergence speed, because the regularization parameters are smaller when the filter just begins to work, and the regularized filter searching direction is closer to the filter searching direction corresponding to the newton method, so that the faster convergence speed is generated; when regularization parameters become large, the search direction of the regularized filter gradually approaches to the negative gradient direction, the convergence speed of the regularized filter becomes slow, and a stable state is easy to achieve.
If the regularization parameters of the original NMCFLMS algorithm are set to smaller values, the NMCFLMS algorithm will have a faster convergence speed, however, the NMCFLMS algorithm is prone to diverge due to the presence of additive noise and sensitivity of the NMCFLMS to the noise. For example, we set the regularization parameters of the VR-NMCFLMS algorithm to:
in the middle of
Whereas the regularization parameters of the NMCFLMS algorithm are set to:
the remaining parameters remain unchanged. As can be seen from fig. 5, the initial convergence speed of the NMCFLMS algorithm is increased, similar to the convergence speed of the VR-NMCFLMS algorithm. When the NPM value is reduced to about-9 dB, the convergence speed of the two algorithms is reduced, when the filter is iterated to about 5800 times, the NPM value of the two algorithms is reduced to about 22dB, the NMCFLMS algorithm starts to diverge, the VR-NMCFLMS algorithm is increased due to regularization parameters, the filter searching direction of the algorithm is deviated to the negative gradient direction, and the convergence speed of the filter is reduced, so that the filter stably works.
FIG. 6 compares the NMCFLMS and proposed VR-NMCFLMS algorithm blindly recognizing six acoustic passes when speech is used as an excitation signal in a white Gaussian noise environmentConvergence performance of the track, where l=1024, μ f =0.05, snr=25 dB. In order for the NMCFLMS algorithm to converge in a noisy environment, we set its regularization parameters to:
the regularization parameter of the VR-NMCFLMS algorithm is still set to δ 1 (m). It can be seen that the proposed VR-NMCFLMS algorithm has a faster convergence speed. Therefore, the provided regularization function can enable the VR-NMCFLMS algorithm to show better convergence performance under both white sequence excitation and voice excitation.
To verify the robustness of the proposed algorithm to noise, we evaluated the convergence performance of the NMCFLMS and VR-NMCFLMS algorithms under different signal-to-noise ratios. The excitation signal is a speech signal and the additive noise is white gaussian noise. The regularization parameters of the VR-NMCFLMS algorithm are set to:
wherein:
whereas the regularization parameters of the NMCFLMS algorithm are set to:
the remaining parameters remain unchanged.
Fig. 7 is a comparison plot of NPM values after 10000 iterations of NMCFLMS and the proposed VR-NMCFLMS algorithm under white gaussian additive noise and speech as excitation conditions for blind recognition of six acoustic channels at different SNRs, where l=1024, μ f =0.05. As can be seen from fig. 7, the VR-NMCFLMS algorithm provided by the present invention has better robustness to noise. Although the NPM values of the two algorithms are close when snr=15 dB, the NPM values of the two algorithms are closeThe NMCFLMS algorithm achieves lower NPM values at the expense of convergence speed. Fig. 8 comparison of NPM values after 2000 iterations of NMCFLMS and the proposed VR-NMCFLMS algorithm under white gaussian additive noise and speech as excitation conditions for blind recognition of six acoustic channels at different SNRs, where l=1024, μ f The result shows that the VR-NMCFLMS algorithm has higher convergence speed under different signal-to-noise ratios.
Fig. 9 compares the mean square error of NMCFLMS and the proposed VR-NMCFLMS algorithm blindly recognizing six time-varying acoustic channels with white gaussian additive noise and speech as excitation, where SNR = 25dB, the regularization parameter of the proposed VR-NMCFLMS algorithm is still set to δ 3 (m) the regularization parameters of the NMCFLMS algorithm are:
other parameters remain unchanged. It can be seen that when the acoustic channel transfer function changes, the MSE of the adaptive filtering algorithm will mutate (black box in the figure). Since the proposed VR-NMCFLMS algorithm refreshes the regularization parameters, the corresponding MSE can be reduced rapidly. The corresponding adaptive filter convergence performance is shown in fig. 10, where l=1024, μ f =0.05, snr=25 dB. As can be seen from fig. 10, the VR-NMCFLMS algorithm of the present invention exhibits better performance in both the convergence speed and the tracking speed.
Fig. 11 shows a performance comparison of the NMCFLMS and the proposed VR-NMCFLMS algorithm with snr=15 dB and experimental environment and algorithm parameters remaining unchanged. It can be seen that the regularization scheme provided by the invention enables the VR-NMCFLMS algorithm to be better adapted to different signal-to-noise ratio environments.
4.2.2 Speech dereverberation experiment results based on Acoustic channel Blind identification
Once the impulse responses of a plurality of sound channels are blindly identified by applying the VR-NMCFLMS method, an equalization filter can be estimated by using the MINT method, and the aim of removing reverberation of the voice can be achieved by deconvolution. The section experimentally evaluates the effectiveness of the proposed method in adaptive speech dereverberation. In experimental verification, we employ four widely used speech dereverberation performance metrics: cepstrum Distance (CD), short-time objective intelligibility (STOI), frequency-weighted segment signal-to-noise ratio (FWSNR), and log-likelihood ratio (LLR). For visual unification, we define CD, STOI, FWSNR and LLR as relative performance indicators compared to the original reverberant speech boost:
ΔCD=CD original -CD, (52)
ΔSTOI=STOI-STOI original , (53)
ΔFWSNR=FWSNR-FWSNR original , (54)
ΔLLR=LLR original -LLR, (55)
where the subscript "original" represents the corresponding indicator of the dereverberated front reverberant speech. It can be seen that the greater the values of these four relative performance indicators, the better the speech dereverberation performance.
Fig. 12, 13 show dynamic performance metrics for NMCFLMS and the proposed VR-NMCFLMS algorithm for speech dereverberation under time-invariant and snr=25 dB conditions, the convergence performance of both algorithms corresponding to fig. 6. The results show that the proposed VR-NMCFLMS algorithm has a better dereverberation effect during filter transients due to its fast convergence.
Fig. 14, 15 show dynamic performance metrics for the use of the NMCFLMS and the proposed VR-NMCFLMS algorithm for speech dereverberation under time-invariant and snr=15 dB conditions. Under the condition of low signal to noise ratio, in order to ensure that the NMCFLMS algorithm converges, the regularization parameters are larger, so that steady-state errors are smaller, but at the cost of sacrificing the convergence speed of the filter, the corresponding NMCFLMS algorithm does not converge in a shorter time, and the proposed VR-NMCFLMS algorithm can converge rapidly, so that better dereverberation performance is obtained.
Fig. 16 and 17 show dynamic performance metrics for NMCFLMS and the proposed VR-NMCFLMS algorithm for speech dereverberation under time-varying and snr=25 dB conditions, the convergence performance of both algorithms corresponding to fig. 10. It can be seen that at 90 seconds, both algorithms degrade in performance index when used for speech dereverberation due to the variation in the multi-channel system transfer function. At this time, compared with the NMCFLMS algorithm, the proposed VR-NMCFLMS algorithm does not achieve a larger performance improvement (black box in the figure) of the speech dereverberation algorithm, because after the multi-channel system is changed, the filter vector still has a certain similarity with the impulse response vector after the system is changed, which is also the reason why the NPM value is not reduced to 0dB after the system is changed in fig. 10, so that the speech dereverberation effect caused by the two algorithms is similar. Fig. 18 and 19 show dynamic performance indicators for NMCFLMS and VR-NMCFLMS algorithms proposed by the present invention for speech dereverberation under time-varying conditions with snr=15 dB, the convergence performance of these two algorithms corresponding to fig. 11. It can be seen that under the condition of low signal-to-noise ratio, the corresponding dereverberation algorithm obtains significant performance improvement due to the faster convergence speed and tracking speed of the VR-NMCFLMS algorithm provided by the invention.
5. Summary
The invention provides a voice dereverberation method based on multi-channel blind identification and multi-channel equalization. Based on the prior art, a variable regularization parameter is designed, and information such as signal-to-noise ratio, output signal energy, filter length and the like is integrated into the variable regularization parameter, so that the algorithm has robustness on additive noise, voice non-stationarity and the like. In order to enable the method to have better tracking performance under time-varying conditions, the invention provides a mechanism for refreshing regularized parameters according to mean square error. Experimental results show that the VR-NMCFLMS algorithm provided by the invention can obtain faster convergence speed and tracking speed in a noise environment. The method can enable the voice dereverberation algorithm based on channel equalization to obtain better dereverberation effect, and particularly the dereverberation performance is remarkably improved when the adaptive filter is in a transient state under a low signal-to-noise ratio environment.
While the foregoing describes illustrative embodiments of the present invention to facilitate an understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, but is to be construed as protected by the accompanying claims insofar as various changes are within the spirit and scope of the present invention as defined and defined by the appended claims.

Claims (1)

1. A speech dereverberation method based on multi-channel blind recognition and multi-channel equalization, comprising the steps of:
(1) Initializing
Initializing a length-L filter vector at time 0 of the kth channelThe method comprises the following steps:
wherein T represents transposition, M is the number of channels, and L is the length of the impulse response vector of the acoustic channel;
initializing a power spectrum matrix P at time 0 of the kth channel k (0) The method comprises the following steps:
P k (0)=0 L×L
initializing reverberant signal varianceThe method comprises the following steps:
(2) Collecting M microphone sound signals, wherein the sampling value of the ith channel is expressed as x i (n), i=1, 2, …, M, n=0, 1,2, …, n is the sampling point time index, and the acoustic signal vector x corresponding to the block time index M is constructed i (M), i=1, 2,., M is the output signal vector of the i-th microphone:
x i (m)=[x i (mL-L)x i (mL-L+1)…x i (mL+L-1)] T
(3) Starting from the block time index m=1, the dereverberated speech vector is obtained
3.1 Obtaining a variable regularization function delta (m)
3.1.1 Calculating microphone output signal variance)
Wherein I 2 Representing the 2-norm of the vector,is a smoothing factor;
3.1.2 Calculating reverberant signal variance)
Wherein lambda is 1 As forgetting factor, SNR is output signal-to-noise ratio;
3.1.3 Calculating the bias factor b (m) of the regularization function:
wherein v represents a coefficient factor;
3.1.4 Calculating a variable regularization function δ (m):
wherein alpha is controlThe whole range of the function curve is a parameter, K is a parameter for controlling the steepness of the curve, the steeper the curve abrupt change is, xi is a parameter for controlling the position of the lifting point of the curve, and the initial value is xi 0
3.2 Obtaining a filter vector)
3.2.1 Calculating a frequency domain expansion filter vector)
Wherein F is 2L×2L Is a fourier matrix of size 2l×2l, 0 is a zero column vector of length L;
3.2.2 A spectral matrix for calculating an input signal
Wherein diag [. Cndot. ] represents expanding the vector into a diagonal array;
3.2.3 A) calculating a power spectrum matrix P k (m):
Wherein lambda is 2 As forgetting factor, superscript indicates conjugate matrix;
3.2.4 Calculating inverse matrix of power spectrum
Wherein I is 2L×2L Is an identity matrix with the size of 2L multiplied by 2L;
3.2.5 Calculating frequency domain error vectors for the ith and j th channelse ij (m):
Wherein,F L×L is a Fourier matrix with the size of L multiplied by L, 0 L×L Is a zero matrix of size L×L, I L×L Is an identity matrix of size L x L, ">Is an inverse fourier matrix of size 2l×2l;
3.2.6 Calculating a frequency domain expansion filter vector)
Wherein ρ is the step factor of the adaptive filter, and the error vector is expanded in the frequency domain F 2L×2L Is a Fourier of size 2L×2LMatrix (S)>Is an inverse fourier matrix of size l×l;
3.2.7 Calculating a filter vector)
Wherein, subscript 1:L represents taking the first L values;
3.3 Obtaining a dereverberated speech vector)
3.3.1 Constructing an impulse response matrix)
Wherein:impulse response matrix->Is of the size L c ×L g Matrix of L c =L+L g -1,L g Is the equalization filter vector length;
3.3.2 Constructing a multi-channel impulse response matrix)
3.3.3 Calculating an equalization filter matrix g (m)
Wherein,an equalization filter vector for the kth channel, d being a desired equalization impulse response vector;
3.3.4 Calculating a dereverberated speech vector)
Wherein conv (·) represents a convolution function;
3.3.5 Detecting a mean square error MSE (m)
First, the mean square error MSE (m) is calculated:
wherein, the superscript H indicates a conjugate transpose, and then updating judgment:
if MSE (m) is greater than threshold upper limit gamma, then starting to refresh variable regularization function delta (m) to make parameter xi=m+xi 0 Wherein, xi 0 Is an initial value, if MSE (m) is not greater than threshold upper limit gamma, parameter xi is maintained unchanged;
m=m+1, returning to step 3.1.1 for reverberant speech vector for next block timeIs calculated by the computer.
CN202311670403.2A 2023-12-05 2023-12-05 Voice dereverberation method based on multi-channel blind identification and multi-channel equalization Pending CN117636894A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311670403.2A CN117636894A (en) 2023-12-05 2023-12-05 Voice dereverberation method based on multi-channel blind identification and multi-channel equalization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311670403.2A CN117636894A (en) 2023-12-05 2023-12-05 Voice dereverberation method based on multi-channel blind identification and multi-channel equalization

Publications (1)

Publication Number Publication Date
CN117636894A true CN117636894A (en) 2024-03-01

Family

ID=90033732

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311670403.2A Pending CN117636894A (en) 2023-12-05 2023-12-05 Voice dereverberation method based on multi-channel blind identification and multi-channel equalization

Country Status (1)

Country Link
CN (1) CN117636894A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117896467A (en) * 2024-03-14 2024-04-16 苏州大学 Echo cancellation method and system for stereo telephone communication

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117896467A (en) * 2024-03-14 2024-04-16 苏州大学 Echo cancellation method and system for stereo telephone communication
CN117896467B (en) * 2024-03-14 2024-05-31 苏州大学 Echo cancellation method and system for stereo telephone communication

Similar Documents

Publication Publication Date Title
US20210067867A1 (en) Signal processing apparatus and signal processing method
Jukić et al. Multi-channel linear prediction-based speech dereverberation with sparse priors
JP5587396B2 (en) System, method and apparatus for signal separation
CN117636894A (en) Voice dereverberation method based on multi-channel blind identification and multi-channel equalization
Mertins et al. Room impulse response shortening/reshaping with infinity-and $ p $-norm optimization
CN106875938B (en) Improved nonlinear self-adaptive voice endpoint detection method
Sankaran et al. Normalized LMS algorithm with orthogonal correction factors
Ghauri et al. System identification using LMS, NLMS and RLS
CN111261179A (en) Echo cancellation method and device and intelligent equipment
CN104158512B (en) The adaptive sparse system identifying method that a kind of shock resistance based on the independent activities factor is disturbed
CN110162739A (en) Based on the RFFKLMS algorithm right value update optimization method for becoming forgetting factor
JP2018531555A (en) Amplitude response equalization without adaptive phase distortion for beamforming applications
Jukić et al. Group sparsity for MIMO speech dereverberation
Wung et al. Robust multichannel linear prediction for online speech dereverberation using weighted householder least squares lattice adaptive filter
Yazdanpanah et al. Feature adaptive filtering: Exploiting hidden sparsity
CN110111802A (en) Adaptive dereverberation method based on Kalman filtering
Schmid et al. An expectation-maximization algorithm for multichannel adaptive speech dereverberation in the frequency-domain
CN102194463B (en) Voice processing apparatus, method of speech processing and program
Crocco et al. Estimation of TDOA for room reflections by iterative weighted l 1 constraint
CN112054973B (en) Minimum mean square error sparse underwater acoustic channel estimation method
EP2045620B1 (en) Acoustic propagation delay measurement
CN108039179B (en) Efficient self-adaptive algorithm for microphone array generalized sidelobe canceller
CN116052702A (en) Kalman filtering-based low-complexity multichannel dereverberation noise reduction method
Chinaev et al. A priori SNR Estimation Using a Generalized Decision Directed Approach.
Jin Adaptive reverberation cancelation for multizone soundfield reproduction using sparse methods

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination