CN118116402A

CN118116402A - Bilinear filtering-based multichannel voice noise reduction method

Info

Publication number: CN118116402A
Application number: CN202410241360.4A
Authority: CN
Inventors: 王向辉; 韩宗乐; 李梅; 王桂宝; 赵莹珂; 田旭华; 王姣; 郭晶; 陈晓屹
Original assignee: Shaanxi University of Science and Technology
Current assignee: Shaanxi University of Science and Technology
Priority date: 2024-03-04
Filing date: 2024-03-04
Publication date: 2024-05-31

Abstract

The present invention discloses a multi-channel speech denoising method based on bilinear filtering, including: collecting time-domain noisy speech signals, preprocessing the time-domain noisy speech signals; estimating the statistical characteristics of the time-domain noisy speech signals and additive noise signals; estimating the bilinear Wiener denoising filter based on the statistical characteristics; filtering and denoising the noisy speech signals based on the bilinear Wiener denoising filter, and obtaining an estimated value of the clean speech signal. The present invention decomposes the filter coefficients that play a denoising role in the time domain dimension and the spatial domain dimension, and converts the estimation problem of a long filter into the estimation problem of two shorter sub-filters. The shorter filter means that fewer parameters need to be estimated. Compared with the traditional method, the algorithm complexity in the present invention is significantly reduced, and only fewer observation samples are needed to estimate the filter coefficients, which improves the algorithm's tracking ability for non-stationary noise; compared with the frequency domain speech denoising method currently used in actual systems, there is no music noise.

Description

A multi-channel speech denoising method based on bilinear filtering

技术领域Technical Field

本发明属于语音降噪领域，特别是涉及一类基于双线性滤波的多通道语音降噪方法。The invention belongs to the field of speech noise reduction, and in particular relates to a multi-channel speech noise reduction method based on bilinear filtering.

背景技术Background technique

在日常环境中，噪声无处不在。噪声会降低语音信号的质量和可懂度，并且会导致听力疲劳。语音降噪技术致力于抑制噪声的影响，并从噪声中提取干净语音信号，进而提高语音的质量和可懂度，在语音通信中起着重要作用。Noise is everywhere in daily life. Noise can reduce the quality and intelligibility of speech signals and cause listening fatigue. Speech noise reduction technology is committed to suppressing the impact of noise and extracting clean speech signals from noise, thereby improving the quality and intelligibility of speech, and plays an important role in voice communication.

根据语音降噪算法执行域的不同，降噪算法可分为时域算法和变换域算法(如频域、小波域等)。目前，应用范围最广的语音降噪算法为频域降噪方法。相较于时域降噪方法，频域降噪方法需要的复杂度低，可以集成在嵌入式系统中完成实时降噪。但频域降噪方法的缺点是容易产生音乐噪声(musical noise)。经研究，人们对音乐噪声的忍耐程度比对噪声的忍耐程度更低。因此，如何降低频域降噪算法产生的音乐噪声一直是研究的热点。而时域降噪方法的优势恰恰是不会音乐噪声。但在时域语音降噪算法中，其滤波器通常较长，导致复杂度过高，这是限制其实际部署的最大瓶颈。尤其是对于多通道语音降噪算法，其复杂度会随着通道数的增多快速增加，使得很难在实际系统中部署时域语音降噪算法对带噪语音信号进行实时降噪处理。According to the different execution domains of speech noise reduction algorithms, noise reduction algorithms can be divided into time domain algorithms and transform domain algorithms (such as frequency domain, wavelet domain, etc.). At present, the most widely used speech noise reduction algorithm is the frequency domain noise reduction method. Compared with the time domain noise reduction method, the frequency domain noise reduction method requires low complexity and can be integrated in embedded systems to complete real-time noise reduction. However, the disadvantage of the frequency domain noise reduction method is that it is easy to generate musical noise. According to research, people's tolerance for musical noise is lower than their tolerance for noise. Therefore, how to reduce the musical noise generated by the frequency domain noise reduction algorithm has always been a hot topic of research. The advantage of the time domain noise reduction method is that it will not generate musical noise. However, in the time domain speech noise reduction algorithm, its filter is usually long, resulting in high complexity, which is the biggest bottleneck limiting its actual deployment. Especially for multi-channel speech noise reduction algorithms, its complexity will increase rapidly with the increase in the number of channels, making it difficult to deploy the time domain speech noise reduction algorithm in the actual system to perform real-time noise reduction processing on noisy speech signals.

本发明中，为降低时域语音降噪算法的复杂度，通过更新多通道信号向量的组织形式，提出一种双线性(bilinear)降噪方案。In the present invention, in order to reduce the complexity of the time-domain speech denoising algorithm, a bilinear denoising scheme is proposed by updating the organization form of multi-channel signal vectors.

发明内容Summary of the invention

本发明的目的是提供一类基于双线性滤波的多通道语音降噪方法，以解决上述现有技术存在的问题。The purpose of the present invention is to provide a multi-channel speech noise reduction method based on bilinear filtering to solve the problems existing in the above-mentioned prior art.

为实现上述目的，本发明提供了一类基于双线性滤波的多通道语音降噪方法，包括：To achieve the above object, the present invention provides a multi-channel speech noise reduction method based on bilinear filtering, comprising:

采集时域带噪语音信号，对所述时域带噪语音信号进行预处理；Collecting a time-domain noisy speech signal and preprocessing the time-domain noisy speech signal;

估计所述时域带噪语音信号与加性噪声信号的统计特性；estimating statistical characteristics of the time-domain noisy speech signal and the additive noise signal;

基于所述统计特性估计双线性维纳降噪滤波器；estimating a bilinear Wiener denoising filter based on the statistical characteristics;

基于所述双线性维纳降噪滤波器对所述时域带噪语音信号进行滤波降噪，获得干净语音信号的估计值。The time-domain noisy speech signal is filtered and denoised based on the bilinear Wiener denoising filter to obtain an estimated value of a clean speech signal.

可选的，采集时域带噪语音信号，并对所述时域带噪语音信号进行预处理的过程包括：Optionally, the process of collecting a time-domain noisy speech signal and preprocessing the time-domain noisy speech signal includes:

时域信号模型为：The time domain signal model is:

y_m(t)＝x_m(t)+v_m(t) (1)y _m (t) = x _m (t) + v _m (t) (1)

其中，t表示离散时间点，下标(·)_m表示第m个麦克风接收到的信号，设麦克风阵列共有M个麦克风，x_m(t)和v_m(t)分别表示第m个麦克风接收到的干净语音信号和加性噪声信号，y_m(t)表示第m个麦克风接收的带噪语音信号，x_m(t)和v_m(t)互不相关；选取麦克风阵列中的第1个麦克风作为参考麦克风，即x₁(t)作为期望信号。Wherein, t represents a discrete time point, the subscript (·) _m represents the signal received by the m-th microphone, and the microphone array has M microphones in total. x _m (t) and v _m (t) represent the clean speech signal and additive noise signal received by the m-th microphone, respectively, and y _m (t) represents the noisy speech signal received by the m-th microphone. x _m (t) and v _m (t) are independent of each other. The first microphone in the microphone array is selected as the reference microphone, that is, x ₁ (t) is taken as the expected signal.

通过将L个连续的样本点组合在一起，将第m个麦克风接收到的信号写成长度为L的向量：The signal received by the mth microphone is written as a vector of length L by grouping together L consecutive sample points:

其中，x_m(t)和v_m(t)的定义和y_m(t)相似，即：The definitions of x _m (t) and v _m (t) are similar to those of y _m (t), namely:

x_m(t)＝[x_m(t) x_m(t-1)…x_m(t-L+1)]^T _xm ₍ t)＝[ _xm (t)xm(t-1)… _xm (t-L+1)] ^T

v_m(t)＝[v_m(t) v_m(t-1)…v_m(t-L+1)]^T v _m (t) = [v _m (t) v _m (t-1)…v _m (t-L+1)] ^T

x_m(t)和v_m(t)分别表示第m个通道的期望信号向量和第m个通道的噪声信号向量，y_m(t)表示第m个通道的带噪信号向量，上标(·)^T表示转置。x _m (t) and v _m (t) represent the expected signal vector and the noise signal vector of the m th channel respectively, y _m (t) represents the noisy signal vector of the m th channel, and the superscript (·) ^T represents transposition.

将M个长度为L的带噪信号向量y_m(t)(m＝1,2,…,M)拼接在一起，可写成：By concatenating M noisy signal vectors y _m (t) (m = 1, 2, ..., M) of length L, we can write:

其中，x(t)和v(t)的定义与y(t)类似，即：The definitions of x (t) and v (t) are similar to those of y (t), namely:

y(t)、x(t)和v(t)分别表示整体带噪信号向量、整体干净语音信号向量以及整体噪声信号向量。 y (t), x (t) and v (t) represent the overall noisy signal vector, the overall clean speech signal vector and the overall noise signal vector respectively.

可选的，估计所述时域带噪语音信号与所述加性噪声信号的统计特性的过程包括：Optionally, the process of estimating the statistical characteristics of the time-domain noisy speech signal and the additive noise signal includes:

通过现有噪声估计算法估计所述整体噪声信号向量v(t)的相关矩阵R _v (t)，通过递归算法估计整体带噪信号向量y(t)的相关矩阵R _y (t)：R _y (t)＝αR _y (t-1)+(1-α)y(t)y ^T(t)，其中α为遗忘因子(0＜α＜1)；通过R _x (t)＝R _y (t)-R _v (t)估计整体干净语音信号向量x(t)的相关矩阵R _x (t)，基于语音信号相关矩阵R _x (t)确定向量ρ(t)，获得统计特性。The correlation matrix R _v (t) of the overall noise signal vector v (t) is estimated by an existing noise estimation algorithm, and the correlation matrix R _y (t) of the overall noisy signal vector y (t) is estimated by a recursive algorithm: R _y (t) = αR _y (t-1) + (1-α) y (t) y ^T (t), where α is a forgetting factor (0 < α <1); the correlation matrix R _x (t) of the overall clean speech signal vector x (t) is estimated by R _x (t) = R _y (t) - R _v (t), and the vector ρ (t) is determined based on the speech signal correlation matrix R _x (t) to obtain statistical characteristics.

可选的，基于语音信号相关矩阵R _x (t)确定向量ρ(t)的过程包括：Optionally, the process of determining the vector ρ (t) based on the speech signal correlation matrix R _x (t) includes:

提取所述语音信号相关矩阵R _x (t)第一行第一列的元素与第一列的元素，第一列的元素除以第一行第一列的元素获得向量ρ(t)。The elements of the first row and first column and the elements of the first column of the speech signal correlation matrix R _x (t) are extracted, and the elements of the first column are divided by the elements of the first row and first column to obtain a vector ρ (t).

可选的，基于所述统计特性估计双线性维纳降噪滤波器的过程包括：Optionally, the process of estimating a bilinear Wiener denoising filter based on the statistical characteristics includes:

将整体带噪信号向量进行重组，根据重组获得的带噪信号矩阵Y(t)对传统降噪方案中的滤波器进行拆分获得包含两个子滤波器的双线性降噪方案，基于矩阵的向量化操作与克罗内克积对双线性降噪方案进行等价变形，获得基于两个子滤波器的双线性降噪方案；The whole noisy signal vector is reorganized, and the filter in the traditional denoising scheme is split according to the noisy signal matrix Y(t) obtained by the reorganization to obtain a bilinear denoising scheme containing two sub-filters. The bilinear denoising scheme is equivalently transformed based on the vectorization operation of the matrix and the Kronecker product to obtain a bilinear denoising scheme based on two sub-filters;

基于所述带噪语音信号的统计特性、两个子滤波器h₁(t)和h₂(t)的克罗内克积、带噪信号矩阵Y(t)获得期望信号估计值均方误差的定义，根据两个子滤波器的关系对定义进行改写，获得最终的期望信号估计值均方误差的定义表达式；Obtaining a definition of a mean square error of an expected signal estimate based on the statistical characteristics of the noisy speech signal, the Kronecker product of the two sub-filters h ₁ (t) and h ₂ (t), and the noisy signal matrix Y(t), rewriting the definition according to the relationship between the two sub-filters to obtain a final definition expression of the mean square error of an expected signal estimate;

基于最终的期望信号估计值均方误差的定义表达式估计双线性维纳降噪滤波器。The bilinear Wiener denoising filter is estimated based on the definition expression of the mean square error of the final expected signal estimate.

可选的，获得基于两个子滤波器的双线性降噪方案的过程包括：Optionally, the process of obtaining a bilinear denoising scheme based on two sub-filters includes:

将整体带噪信号向量按如下形式进行重组：The overall noisy signal vector is reorganized as follows:

其中，y(t)＝vec[Y(t)]，干净语音信号矩阵X(t)＝[x₁(t) x₂(t) … x_M(t)](x(t)＝vec[X(t)]),噪声信号矩阵V(t)＝[v₁(t) v₂(t) … v_M(t)](v(t)＝vec[V(t)])，符号vec[·]表示矩阵的向量化操作，矩阵Y(t)、X(t)和V(t)的维数均为L×M；Wherein, y (t)=vec[Y(t)], clean speech signal matrix X(t)=[ _x1 (t) _x2 (t)… _xM (t)]( x (t)=vec[X(t)]), noise signal matrix V(t)=[ _v1 (t) _v2 (t)… _vM (t)]( v (t)=vec[V(t)]), symbol vec[·] indicates matrix vectorization operation, and dimensions of matrices Y(t), X(t) and V(t) are all L×M;

基于式(5)，对传统降噪方案进行修改，获得如下双线性降噪方案：Based on formula (5), the traditional noise reduction scheme is modified to obtain the following bilinear noise reduction scheme:

其中，两个子滤波器h₁(t)和h₂(t)分别在时域维度和空域维度进行降噪，h₁(t)长为L，h₂(t)长为M，z(t)为期望信号x₁(t)的估计值，为滤波后的语音信号，/>表示滤波后的残留噪声；Among them, the two sub-filters h ₁ (t) and h ₂ (t) perform noise reduction in the time domain and spatial domain respectively. The length of h ₁ (t) is L, the length of h ₂ (t) is M, and z(t) is the estimated value of the expected signal x ₁ (t). is the filtered speech signal, /> represents the residual noise after filtering;

将式(6)变形获得如下公式：Transform equation (6) to obtain the following formula:

其中，符号vec[·]表示矩阵的向量化操作，符号/>表示克罗内克积，符号tr[·]表示矩阵的迹；in, The symbol vec[·] represents the vectorization operation of the matrix, and the symbol /> represents the Kronecker product, and the symbol tr[·] represents the trace of the matrix;

利用式(7)，式(6)可写为Using formula (7), formula (6) can be written as

其中，x(t)＝vec[X(t)]，v(t)＝vec[V(t)]。Where x (t)=vec[X(t)], v (t)=vec[V(t)].

可选的，获得最终的期望信号估计值均方误差的定义表达式的过程包括：Optionally, the process of obtaining a definition expression of a final expected signal estimate mean square error includes:

两个子滤波器h₁(t)和h₂(t)有如下关系：The two sub-filters h ₁ (t) and h ₂ (t) have the following relationship:

其中，I_L和I_M分别为维数为L×L和M×M的单位矩阵；Among them, _IL and _IM are identity matrices with dimensions of L×L and M×M respectively;

基于带噪语音信号的统计特性与噪声信号的统计特性、两个子滤波器h₁(t)和h₂(t)的克罗内克积、带噪信号矩阵Y(t)获得期望信号估计值的均方误差的定义如下：The definition of the mean square error of the expected signal estimate obtained based on the statistical characteristics of the noisy speech signal and the noise signal, the Kronecker product of the two sub-filters h ₁ (t) and h ₂ (t), and the noisy signal matrix Y(t) is as follows:

其中，为位于矩阵R _x (t)第一行第一列的元素。in, is the element located in the first row and first column of the matrix R _x (t).

根据式(10)将式(12)改写为：According to formula (10), formula (12) can be rewritten as:

其中，in,

可选的，基于最终的期望信号估计值均方误差的定义表达式估计双线性维纳降噪滤波器的过程如下：Optionally, the process of estimating the bilinear Wiener denoising filter based on the definition expression of the final expected signal estimate mean square error is as follows:

步骤一：根据式初始化子滤波器h₁(t)，其中，/>为第一个通道带噪语音信号向量的自相关矩阵，为矩阵R _y (t)的前L行前L列，向量/>为向量ρ(t)的前L个元素组成的向量，上标(·)⁽ⁿ⁾表示第n次迭代的结果；Step 1: According to the formula Initialize sub-filter h ₁ (t), where /> is the autocorrelation matrix of the noisy speech signal vector of the first channel, is the first L rows and first L columns of the matrix R _y (t), vector/> is the vector consisting of the first L elements of the vector ρ (t), and the superscript (·) ⁽ⁿ⁾ indicates the result of the nth iteration;

步骤二：将带入至公式中，得到和/>上标(·)⁽ⁿ⁾表示第n次迭代的结果；Step 2: Substitute into the formula In, get and/> The superscript (·) ⁽ⁿ⁾ indicates the result of the nth iteration;

步骤三：将和/>带入至公式/>中，得到 Step 3: and/> Substitute into the formula/> In, get

步骤四：将带入至公式和/>中，得到/>和 Step 4: Substitute into the formula and/> In, get/> and

步骤五：将和/>带入至公式/>中，得到 Step 5: and/> Substitute into the formula/> In, get

重复步骤二至步骤五N次，获得和/> Repeat steps 2 to 5 N times to obtain and/>

根据公式获得双线性维纳降噪滤波器h _bW(t)。According to the formula Obtain the bilinear Wiener denoising filter h _bW (t).

本发明的技术效果为：The technical effects of the present invention are:

本发明将在时域维度和空域维度起降噪作用的滤波器系数分解开来，从而将一个长滤波器的估计问题转换成两个较短子滤波器的估计问题、具有较低计算复杂度、且较高非平稳噪声处理能力的多通道双线性语音降噪滤波器；适用于时域，还可非常直观地将本发明的核心思路推广至频域降噪框架中；既可用于智能语音、人机交互等系统，也可用于音视频会议、车载、临境通信等系统；可单独使用，也可和回声消除、声源定位、去混响、语音分离等模块配合使用。本发明中的方法具有以下优势：1)算法复杂度显著降低；2)需要更少的观测样本来估计滤波器系数，从而提高了算法对非平稳噪声的跟踪能力。另外，本发明提出的方法为低复杂度时域语音降噪方法，相比目前在实际系统中应用的频域语音降噪方法，本发明的另一个优势为不存在音乐噪声。The present invention decomposes the filter coefficients that play a role in noise reduction in the time domain dimension and the spatial domain dimension, thereby converting the estimation problem of a long filter into the estimation problem of two shorter sub-filters, a multi-channel bilinear speech noise reduction filter with low computational complexity and high non-stationary noise processing capability; it is applicable to the time domain, and the core idea of the present invention can be very intuitively extended to the frequency domain noise reduction framework; it can be used for intelligent voice, human-computer interaction and other systems, as well as audio and video conferencing, vehicle-mounted, immersive communication and other systems; it can be used alone or in conjunction with modules such as echo cancellation, sound source localization, dereverberation, and speech separation. The method in the present invention has the following advantages: 1) the algorithm complexity is significantly reduced; 2) fewer observation samples are required to estimate the filter coefficients, thereby improving the algorithm's tracking ability for non-stationary noise. In addition, the method proposed in the present invention is a low-complexity time domain speech noise reduction method. Compared with the frequency domain speech noise reduction method currently used in actual systems, another advantage of the present invention is that there is no music noise.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

构成本申请的一部分的附图用来提供对本申请的进一步理解，本申请的示意性实施例及其说明用于解释本申请，并不构成对本申请的不当限定。在附图中：The drawings constituting a part of this application are used to provide a further understanding of this application. The illustrative embodiments and descriptions of this application are used to explain this application and do not constitute an improper limitation on this application. In the drawings:

图1为本发明实施例中的方法流程图；FIG1 is a flow chart of a method in an embodiment of the present invention;

图2为本发明实施例中的系统结构图。FIG. 2 is a system structure diagram of an embodiment of the present invention.

具体实施方式Detailed ways

需要说明的是，在不冲突的情况下，本申请中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本申请。It should be noted that, in the absence of conflict, the embodiments and features in the embodiments of the present application can be combined with each other. The present application will be described in detail below with reference to the accompanying drawings and in combination with the embodiments.

需要说明的是，在附图的流程图示出的步骤可以在诸如一组计算机可执行指令的计算机系统中执行，并且，虽然在流程图中示出了逻辑顺序，但是在某些情况下，可以以不同于此处的顺序执行所示出或描述的步骤。It should be noted that the steps shown in the flowcharts of the accompanying drawings can be executed in a computer system such as a set of computer executable instructions, and that, although a logical order is shown in the flowcharts, in some cases, the steps shown or described can be executed in an order different from that shown here.

实施例一Embodiment 1

如图1-2所示，本实施例中提供一类基于双线性滤波的多通道语音降噪方法，包括：As shown in FIG1-2, this embodiment provides a multi-channel speech noise reduction method based on bilinear filtering, including:

步骤1、采集带噪语音信号；Step 1: Collect noisy speech signals;

步骤2、估计带噪语音信号及噪声信号的统计特性；Step 2, estimating the statistical characteristics of the noisy speech signal and the noise signal;

步骤3、估计双线性维纳降噪滤波器；Step 3, estimating the bilinear Wiener denoising filter;

步骤4、对带噪语音信号进行滤波降噪，得到干净语音信号的估计值。Step 4: Filter and reduce noise on the noisy speech signal to obtain an estimated value of the clean speech signal.

在语音降噪中，时域信号模型为：In speech noise reduction, the time domain signal model is:

y_m(t)＝x_m(t)+v_m(t) (1)y _m (t) = x _m (t) + v _m (t) (1)

这里，t表示离散时间点，下标(·)_m表示第m个麦克风接收到的信号(本发明中设麦克风阵列共有M个麦克风)，x_m(t)和v_m(t)分别表示第m个麦克风接收到的干净语音信号和加性噪声信号，y_m(t)表示第m个麦克风接收到的带噪语音信号。本发明中设所有信号都是零均值、宽带实信号。本发明中选取麦克风阵列中的第1个麦克风作为参考麦克风，即选取x₁(t)作为期望信号(需要恢复的信号)。但理论上，任何麦克风都可以作为参考麦克风。Here, t represents a discrete time point, the subscript (·) _m represents the signal received by the m-th microphone (in the present invention, the microphone array is assumed to have a total of M microphones), x _m (t) and v _m (t) represent the clean speech signal and the additive noise signal received by the m-th microphone, respectively, and y _m (t) represents the noisy speech signal received by the m-th microphone. In the present invention, all signals are assumed to be zero-mean, broadband real signals. In the present invention, the first microphone in the microphone array is selected as the reference microphone, that is, x ₁ (t) is selected as the desired signal (the signal to be restored). However, in theory, any microphone can be used as a reference microphone.

通过将L个连续的样本点组合在一起，第m个麦克风接收到的信号可以写成长度为L的向量：By grouping together L consecutive sample points, the signal received by the mth microphone can be written as a vector of length L:

x_m(t)＝[x_m(t) x_m(t-1) … x_m(t-L+1)]^T _xm ₍ t)＝[ _xm (t)xm(t-1) … _xm (t-L+1)] ^T

v_m(t)＝[v_m(t) v_m(t-1) … v_m(t-L+1)]^T v _m (t) = [v _m (t) v _m (t-1) … v _m (t-L+1)] ^T

在传统的时域多通道语音增强中，通常将M个长度为L的带噪信号向量y_m(t)(m＝1,2,...,M)拼接在一起，写成：In traditional time-domain multi-channel speech enhancement, M noisy signal vectors y _m (t) (m = 1, 2, ..., M) of length L are usually concatenated together and written as:

所述步骤2中估计带噪语音信号及噪声信号的统计特性的过程为：通过现有噪声估计算法估计噪声信号向量v(t)的相关矩阵R _v (t)；通过递归方法估计带噪语音信号向量y(t)的相关矩阵R _y (t)：R _y (t)＝αR _y (t-1)+(1-α)y(t)y ^T(t)，其中α为遗忘因子(0＜α＜1)；通过R _x (t)＝R _y (t)-R _v (t)估计干净语音信号向量x(t)的相关矩阵R _x (t)；基于语音信号相关矩阵R _x (t)，即可确定向量ρ(t)(位于矩阵R _x (t)中第1行第1列的元素即为矩阵R _x (t)的第1列除以/>即为向量ρ(t))。The process of estimating the statistical characteristics of the noisy speech signal and the noise signal in step 2 is: estimating the correlation matrix R _v (t) of the noise signal vector v (t) by an existing noise estimation algorithm; estimating the correlation matrix R _y (t) of the noisy speech signal vector y (t) by a recursive method: R _y (t) = αR _y (t-1) + (1-α) y (t) y ^T (t), where α is a forgetting factor (0 < α <1); estimating the correlation matrix R _x (t) of the clean speech signal vector x (t) by R _x (t) = R _y (t) - R _v (t); based on the speech signal correlation matrix R _x (t), the vector ρ (t) (the element located in the first row and first column of the matrix R _x (t)) can be determined Divide the first column of the matrix R _x (t) by /> That is the vector ρ (t).

所述步骤3的具体方法为：The specific method of step 3 is:

在传统方法中，通常将长为ML的带噪信号向量y(t)通过一个线性滤波器h(t)，即In traditional methods, the noisy signal vector y (t) of length ML is usually passed through a linear filter h(t), that is,

其中，z(t)为期望信号x₁(t)的估计值。所以在传统方法中，需要估计一个长为ML的滤波器h(t)。Where z(t) is the estimated value of the desired signal x ₁ (t). Therefore, in the traditional method, a filter h (t) with a length of ML needs to be estimated.

在本发明中，为构造双线性降噪方案，将带噪信号向量按如下形式重组：In the present invention, in order to construct a bilinear noise reduction scheme, the noisy signal vector is reorganized in the following form:

其中，矩阵X(t)和V(t)的定义与Y(t)类似，矩阵X(t)＝[x₁(t) x₂(t) … x_M(t)](x(t)＝vec[X(t)]),V(t)＝[v₁(t) v₂(t) … v_M(t)](v(t)＝vec[V(t)])，矩阵Y(t)、X(t)和V(t)的维数均为L×M，Y(t)为带噪信号矩阵、X(t)为干净语音信号矩阵，V(t)为噪声信号矩阵，符号vec[·]表示矩阵的向量化操作。Among them, the definitions of matrices X(t) and V(t) are similar to those of Y(t), matrix X(t)＝[ _x1 (t) _x2 (t)… _xM (t)]( x (t)＝vec[X(t)]), V(t)＝[ _v1 (t) _v2 (t)… _vM (t)]( v (t)＝vec[V(t)]), the dimensions of matrices Y(t), X(t) and V(t) are all L×M, Y(t) is the noisy signal matrix, X(t) is the clean speech signal matrix, V(t) is the noise signal matrix, and the symbol vec[·] represents the vectorization operation of the matrix.

基于式(5)，可将传统降噪方案修改为如下的双线性降噪方案：Based on formula (5), the traditional noise reduction scheme can be modified into the following bilinear noise reduction scheme:

其中，两个较短的子滤波器h₁(t)(长为L)和h₂(t)(长为M)分别在时域维度和空域维度起到降噪作用，z(t)为期望信号x₁(t)的估计值，为滤波后的语音信号，/>表示滤波后的残留噪声。Among them, the two shorter sub-filters h ₁ (t) (length is L) and h ₂ (t) (length is M) play a role in noise reduction in the time domain and spatial domain respectively. z(t) is the estimated value of the desired signal x ₁ (t). is the filtered speech signal, /> represents the residual noise after filtering.

式(6)可做如下变形：Formula (6) can be transformed as follows:

其中，符号vec[·]表示矩阵的向量化操作，例如vec[Y(t)]＝y(t)，符号/>表示克罗内克积(Kronecker product)。需要注意的是，此处的滤波器h _b(t)和直接用传统方法导出的滤波器h(t)不同。in, The symbol vec[·] represents a vectorized operation of a matrix, for example, vec[Y(t)] = y (t), the symbol/> It should be noted that the filter h _b (t) here is different from the filter h (t) derived directly by traditional methods.

利用式(7)，式(6)可写为Using formula (7), formula (6) can be written as

由于语音信号和噪声信号不相关，所以z(t)中的两部分是互不相关的，z(t)的方差可以写成Since the speech signal and the noise signal are uncorrelated, the two parts of z(t) are uncorrelated, and the variance of z(t) can be written as

其中，in,

R _y (t)＝E[y(t)y ^T(t)]，R _x (t)＝E[x(t)x ^T(t)]，R _v (t)＝E[v(t)v ^T(t)](R _v (t)为满秩矩阵)。R _y ( t ) = E [ y ( t ) y ^T ( t ) ] , R _x ( t ) = E [ x ( t ) x ^T ( t ) ] , R _v ( t ) = E [ v ( t ) v ^T ( t ) ] ( R _v ( t ) is a full rank matrix).

在双线性降噪方案中，两个子滤波器h₁(t)和h₂(t)有如下关系：In the bilinear denoising scheme, the two sub-filters h ₁ (t) and h ₂ (t) have the following relationship:

其中，I_L和I_M分别为维数为L×L和M×M的单位矩阵。Among them, _IL and _IM are unit matrices with dimensions of L×L and M×M respectively.

为了推导出双线性迭代降噪滤波器，需导出期望信号估计值z(t)的均方误差。定义z(t)的误差为In order to derive the bilinear iterative denoising filter, it is necessary to derive the mean square error of the expected signal estimate z(t). The error of z(t) is defined as

ε(t)＝z(t)-x₁(t) (11)ε(t)＝z(t)-x ₁ (t) (11)

基于式(11)，z(t)的均方误差可定义为Based on equation (11), the mean square error of z(t) can be defined as

其中， in,

为充分利用双线性降噪方案的优势，应用式(10)可将式(12)改写为In order to fully utilize the advantages of the bilinear noise reduction scheme, equation (10) can be applied to rewrite equation (12) as

其中，in,

从上面几个式子可以看出，在双线性降噪方案中，所需矩阵R _y _,1(t)(维数为M×M)和R _y _,2(t)(维数为L×L)的维数远小于传统降噪方案中所需矩阵R _y (t)(维数为ML×ML)的维数。因此，通常需要更少的观测样本来估计相关矩阵R _y _,1(t)和R _y _,2(t)。所以，双线性降噪滤波器可以更好地跟踪信号统计特性的变化，更加适合处理非平稳噪声。另外，在求解子滤波器h₁(t)和h₂(t)的过程中，需要对相关矩阵R _y _,1(t)和R _y _,2(t)进行求逆。而求解传统降噪滤波器h(t)时需要对相关矩阵R _y (t)进行求逆。由于矩阵R _y _,1(t)和R _y _,2(t)的维数远远小于矩阵R _y (t)的维数，所以求解子滤波器h₁(t)和h₂(t)的复杂度远远小于传统降噪滤波器h(t)的复杂度。From the above formulas, it can be seen that in the bilinear denoising scheme, the dimensions of the required matrices R _y _,1 (t) (dimension is M×M) and R _y _,2 (t) (dimension is L×L) are much smaller than the dimension of the required matrix R _y (t) (dimension is ML×ML) in the traditional denoising scheme. Therefore, fewer observation samples are usually required to estimate the correlation matrices R _y _,1 (t) and R _y _,2 (t). Therefore, the bilinear denoising filter can better track the changes in the statistical characteristics of the signal and is more suitable for processing non-stationary noise. In addition, in the process of solving the sub-filters h ₁ (t) and h ₂ (t), the correlation matrices R _y _,1 (t) and R _y _,2 (t) need to be inverted. When solving the traditional denoising filter h (t), the correlation matrix R _y (t) needs to be inverted. Since the dimensions of the matrices R _y _,1 ( t ) and R _y _,2 ( t ) are much smaller than the dimension of the matrix R _y ( t ), the complexity of solving the sub-filters h ₁ ( t ) and h ₂ ( t ) is much smaller than the complexity of the traditional denoising filter h ( t ).

为推导双线性迭代维纳滤波器，分别固定h₂(t)和h₁(t)，将式(13)写为To derive the bilinear iterative Wiener filter, h ₂ (t) and h ₁ (t) are fixed respectively and equation (13) is written as

将h₁(t)的初始值设为Set the initial value of h ₁ (t) to

为第一个通道的维纳滤波器，长度为L。其中，/> 将/>带入至式(14)和(15)中，可以得到 is the Wiener filter of the first channel, with a length of L. Among them,/> Will/> Substituting into equations (14) and (15), we can obtain

然后，将式(20)和(21)带入至式(19)中，可以得到Then, substituting equations (20) and (21) into equation (19), we can obtain

其中，上标(·)⁽ⁿ⁾表示第n次迭代的结果。将式(22)对求导，并令结果置零，可得到The superscript (·) ⁽ⁿ⁾ represents the result of the nth iteration. Taking the derivative and setting the result to zero, we get

将式(23)带入至式(16)和(17)中，可得Substituting equation (23) into equations (16) and (17), we can obtain

利用和/>可以将式(18)写为use and/> Formula (18) can be written as

将式(26)对求导，并将结果置零，可得到Reverse equation (26) Taking the derivative and setting the result to zero, we get

按上述过程，继续迭代n次后，可得According to the above process, after iterating n times, we can get

其中，in,

基于式(28)和(29)，可以得到在第N次迭代之后的双线性迭代维纳滤波器：Based on equations (28) and (29), the bilinear iterative Wiener filter after the Nth iteration can be obtained:

将带噪信号通过所设计的双线性维纳降噪滤波器h_bW(t)，即可得到降噪后的语音信号或者，可通过公式/>得到降噪后的语音信号z(t)，两种方法等价。The noisy signal is passed through the designed bilinear Wiener denoising filter h _bW (t) to obtain the denoised speech signal Alternatively, the formula can be used The denoised speech signal z(t) is obtained. The two methods are equivalent.

推导双线性迭代维纳滤波器的过程可简述为：The process of deriving the bilinear iterative Wiener filter can be briefly described as:

步骤二：将带入至公式和/>中，得到和/>上标(·)⁽ⁿ⁾表示第n次迭代的结果；Step 2: Substitute into the formula and/> In, get and/> The superscript (·) ⁽ⁿ⁾ indicates the result of the nth iteration;

根据公式获得双线性维纳降噪滤波器h_bW(t)。According to the formula Obtain the bilinear Wiener denoising filter h _bW (t).

以上所述，仅为本申请较佳的具体实施方式，但本申请的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本申请揭露的技术范围内，可轻易想到的变化或替换，都应涵盖在本申请的保护范围之内。因此，本申请的保护范围应该以权利要求的保护范围为准。The above is only a preferred specific implementation of the present application, but the protection scope of the present application is not limited thereto. Any changes or substitutions that can be easily thought of by a person skilled in the art within the technical scope disclosed in the present application should be included in the protection scope of the present application. Therefore, the protection scope of the present application should be based on the protection scope of the claims.

Claims

1. A multi-channel voice noise reduction method based on bilinear filtering is characterized by comprising the following steps:

Collecting a time domain voice signal with noise, and preprocessing the time domain voice signal with noise;

Estimating the statistical characteristics of the time domain noisy speech signal and the additive noise signal;

Estimating a bilinear dimension accept the enemy's surrender noise filter based on the statistical characteristics;

and filtering and denoising the time domain noisy speech signal based on the bilinear dimension accept the enemy's surrender noise filter to obtain an estimated value of the clean speech signal.

2. The method of claim 1, wherein the method is characterized in that,

The process for collecting the time domain noisy speech signal and preprocessing the time domain noisy speech signal comprises the following steps:

The time domain signal model is:

y_m(t)＝x_m(t)+v_m(t) (1)

Wherein t represents a discrete time point, a subscript (·) _m represents a signal received by an mth microphone, the microphone array is provided with M microphones in total, x _m (t) and v _m (t) respectively represent a clean voice signal and an additive noise signal received by the mth microphone, y _m (t) represents a noisy voice signal received by the mth microphone, and x _m (t) and v _m (t) are mutually uncorrelated; selecting the 1 st microphone in the microphone array as a reference microphone, namely x ₁ (t) as a desired signal;

By combining L consecutive sample points together, the signal received by the mth microphone is written as a vector of length L:

y_m(t)＝[y_m(t) y_m(t-1) … y_m(t-L+1)]^T

＝x_m(t)+v_m(t),m＝1,2,...,M (2)

Wherein x _m (t) and v _m (t) are defined as y _m (t), i.e.:

x_m(t)＝[x_m(t) x_m(t-1) … x_m(t-L+1)]^T

v_m(t)＝[v_m(t) v_m(t-1) … v_m(t-L+1)]^T

x _m (t) and v _m (t) represent the desired signal vector of the mth channel and the noise signal vector of the mth channel, respectively, y _m (t) represents the noisy signal vector of the mth channel, and the superscript (. Cndot.) ^T represents the transpose;

m noisy signal vectors y _m (t) (m=1, 2,) of length L are spliced together, M can be written as:

Wherein x (t) and v (t) are defined similarly to y (t), i.e.:

y (t), x (t), and v (t) represent the overall noisy signal vector, the overall clean speech signal vector, and the overall noise signal vector, respectively.

3. The bilinear filter-based multi-channel speech noise reduction method of claim 2, wherein,

The process of estimating statistical properties of the time domain noisy speech signal and the additive noise signal comprises:

Estimating a correlation matrix R _v (t) of the overall noise signal vector v (t) through an existing noise estimation algorithm, and estimating a correlation matrix R _y(t)：R_y(t)＝αR_y(t-1)+(1-α)y(t)y^T (t) of the overall noise signal vector y (t) through a recursion algorithm, wherein alpha is a forgetting factor (0 < alpha < 1); the correlation matrix R _x (t) of the overall clean speech signal vector x (t) is estimated through R _x(t)＝R_y(t)-R_v (t), the vector ρ (t) is determined based on the speech signal correlation matrix R _x (t), and statistical characteristics are obtained.

4. The bilinear filter-based multi-channel speech noise reduction method of claim 2, wherein,

The process of determining the vector ρ (t) based on the speech signal correlation matrix R _x (t) comprises:

and extracting the elements of the first row and the first column of the voice signal correlation matrix R _x (t) and the elements of the first column, and dividing the elements of the first column by the elements of the first row and the first column to obtain a vector rho (t).

5. The bilinear filter-based multi-channel speech noise reduction method of claim 2, wherein,

The process of estimating the bilinear dimension accept the enemy's surrender noise filter based on the statistical characteristics includes:

Recombining the whole noisy signal vector, splitting the filter in the traditional noise reduction scheme according to the recombined noisy signal matrix Y (t) to obtain a bilinear noise reduction scheme comprising two sub-filters, and carrying out equivalent deformation on the bilinear noise reduction scheme based on vectorization operation of the matrix and Croneck product to obtain the bilinear noise reduction scheme based on the two sub-filters;

Obtaining the definition of the mean square error of the estimated value of the expected signal based on the statistical characteristics of the voice signal with noise, the Kronecker product of the two sub-filters h ₁ (t) and h ₂ (t) and the matrix Y (t) of the signal with noise, and rewriting the definition according to the relation of the two sub-filters to obtain the definition expression of the mean square error of the estimated value of the final expected signal;

a bilinear dimension accept the enemy's surrender noise filter is estimated based on a defined expression of the mean square error of the final desired signal estimate.

6. The method of bilinear filter-based multi-channel speech noise reduction according to claim 5,

The process of obtaining a bilinear noise reduction scheme based on two sub-filters includes:

the overall noisy signal vector is recombined as follows:

Wherein Y (t) =vec [ Y (t) ], clean speech signal matrix X (t) = [ X ₁(t) x₂(t) … x_M (t) ] (X (t) =vec [ X (t) ]), noise signal matrix V (t) = [ V ₁(t) v₂(t) … v_M (t) ] (V (t) =vec [ V (t) ]), symbol vec [ · ] represents the vectorization operation of the matrix, and the dimensions of the matrices Y (t), X (t) and V (t) are all lxm;

based on equation (5), the conventional noise reduction scheme is modified to obtain the following bilinear noise reduction scheme:

Wherein the two sub-filters h ₁ (t) and h ₂ (t) respectively perform noise reduction in the time domain dimension and the space domain dimension, the length of h ₁ (t) is L, the length of h ₂ (t) is M, and z (t) is an estimated value of the desired signal x ₁ (t), In order to filter the speech signal after it has been processed,Representing the filtered residual noise;

deforming the formula (6) to obtain the following formula:

wherein, Symbol vec [. Cndot. ] represents the vectorization operation of the matrix, symbol/>Representing the kronecker product, the symbol tr [ · ] represents the trace of the matrix;

Using formula (7), formula (6) may be written as

Where X (t) =vec [ X (t) ], V (t) =vec [ V (t) ].

7. The method of claim 6, wherein the method is characterized in that,

The process of obtaining a defined expression of the mean square error of the final expected signal estimate value comprises:

the two sub-filters h ₁ (t) and h ₂ (t) have the following relationship:

wherein, I _L and I _M are unit matrixes with dimensions of L×L and M×M respectively;

The mean square error for obtaining the desired signal estimate based on the statistical properties of the noisy speech signal and the statistical properties of the noise signal, the kronecker product of the two sub-filters h ₁ (t) and h ₂ (t), the noisy signal matrix Y (t) is defined as follows:

wherein, Is the element located in the first row and first column of the first row of matrix R _x (t);

Formula (12) is rewritten as follows according to formula (10):

wherein,

8. The method of bilinear filter-based multi-channel speech noise reduction according to claim 7,

The process of estimating the bilinear dimension accept the enemy's surrender noise filter based on the final desired signal estimate mean square error defined expression is as follows:

step one: according to Initializing a sub-filter h ₁ (t), wherein/>The autocorrelation matrix of the first channel noisy speech signal vector is the first L rows and the first L columns of matrix R _y (t), vector/>The superscript (·) ⁽ⁿ⁾ denotes the result of the nth iteration, which is a vector made up of the first L elements of vector ρ (t);

Step two: will be Carry to formulaAnd/>In (1) to obtainAnd/>The superscript (·) ⁽ⁿ⁾ denotes the result of the nth iteration;

Step three: will be And/>Carry to formula/>In, get/>

Step four: will beCarry to formula/>And/>In, get/>And/>

Step five: will beAnd/>Carry to formula/>In, get/>Repeating the steps from the second step to the fifth step for N times to obtain/>And/>

According to the formulaA bilinear dimension accept the enemy's surrender noise filter h _bW (t) is obtained.